# `melt` Procedure Example

This notebook shows an example of using the `stemmer` function. It assumes basic working knowledge of MLDB. Take a look at the [demos and tutorials](../../../../doc/#builtin/Demos.md.html) to get started with MLDB. [Back to melt documentation](../../../../doc/#builtin/procedures/MeltProcedure.md.html)

In [1]:
from pymldb import Connection
mldb = Connection("http://localhost")

## Example with a JSON array

Let's start by creating a toy dataset:

In [5]:
mldb.put('/v1/datasets/melt_proc', { "type":"sparse.mutable" })

mldb.post('/v1/datasets/melt_proc/rows', {
    "rowName": "row_0",
    "columns": [
        ["name", "bill", 0],
        ["age", 20, 0],
        ["friends", '[{"name": "mich", "age": 20}, {"name": "jean", "age": 18}]', 0]
    ]
})

mldb.post("/v1/datasets/melt_proc/commit")

The dataset looks like this:

In [6]:
mldb.query("""
    SELECT * FROM melt_proc
""")

Unnamed: 0_level_0,age,friends,name
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
row_0,20,"[{""name"": ""mich"", ""age"": 20}, {""name"": ""jean"",...",bill


We may want to perform operations on the contents of the JSON object in the friends column. To do so, we can perform a melt operation on the output of the `parse_json()` function.

To break it down, let's look at the output of the `parse_json()` function:

In [22]:
mldb.query("select parse_json(friends, {arrays: 'encode'}) as * from melt_proc")

Unnamed: 0_level_0,0,1
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1
row_0,"{""age"":20,""name"":""mich""}","{""age"":18,""name"":""jean""}"


We can now run the `melt` procedure like this:

In [31]:
print mldb.post("/v1/procedures", {
    "type": "melt",
    "params": {
        "inputData": """
                    SELECT {name, age} as to_fix,
                           {friends*} as to_melt
                    FROM (
                        SELECT name, age, parse_json(friends, {arrays: 'encode'}) AS friends from melt_proc
                    )""",
        "outputDataset": "melted_data",
        "runOnCreation": True
    }
})

<Response [201]>


The melted dataset will look like this:

In [32]:
mldb.query("select * from melted_data")

Unnamed: 0_level_0,age,key,name,value
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
row_0.friends.1,20,friends.1,bill,"{""age"":18,""name"":""jean""}"
row_0.friends.0,20,friends.0,bill,"{""age"":20,""name"":""mich""}"


## Example with bags of words

In [29]:
mldb.put('/v1/datasets/melt_proc_bow', { "type":"sparse.mutable" })

mldb.post('/v1/datasets/melt_proc_bow/rows', {
    "rowName": "row1",
    "columns": [
        ["text", "hello my friend", 0]
    ]
})
mldb.post('/v1/datasets/melt_proc_bow/rows', {
    "rowName": "row2",
    "columns": [
        ["text", "hello it's me", 0]
    ]
})

mldb.post("/v1/datasets/melt_proc_bow/commit")

Our dataset looks like this:

In [30]:
mldb.query("SELECT * FROM melt_proc_bow")

Unnamed: 0_level_0,text
_rowName,Unnamed: 1_level_1
row2,hello it's me
row1,hello my friend


By running a `melt` procedure and using the `tokenize` function on the text, we can obtain a new dataset with one row per *(rowName, word)* pair:

In [35]:
print mldb.post("/v1/procedures", {
    "type": "melt",
    "params": {
        "inputData": """
            SELECT {rowName() as rowName} as to_fix,
                   {tokenize(text, {splitchars: ' '}) as *} as to_melt
            FROM melt_proc_bow
        """,
        "outputDataset": "melted_data_bow",
        "runOnCreation": True
    }
})

<Response [201]>


This gives us the following dataset:

In [36]:
mldb.query("SELECT * FROM melted_data_bow")

Unnamed: 0_level_0,key,rowName,value
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
row2.me,me,row2,1
row2.hello,hello,row2,1
row1.my,my,row1,1
row1.friend,friend,row1,1
row1.hello,hello,row1,1
row2.it's,it's,row2,1
