# `Stemmer` Function Example

This notebook shows an example of using the `stemmer` function. It assumes basic working knowledge of MLDB. Take a look at the [demos and tutorials](../../../../doc/#builtin/Demos.md.html) to get started with MLDB. [Back to stemmer documentation](../../../../doc/#builtin/functions/Stemmer.md.html)

In [1]:
from pymldb import Connection
mldb = Connection("http://localhost")

Let's start by creating a stemmer function:

In [3]:
print mldb.put("/v1/functions/my_stemmer", {
    "type": "stemmer",
    "params": {
        "language": "english"
    }
})

<Response [201]>


Let's also create a toy dataset:

In [5]:
mldb.put('/v1/datasets/example', { "type":"sparse.mutable" })

mldb.post('/v1/datasets/example/rows', {
    "rowName": "row_0",
    "columns": [
        ["potato", 1, 0],
        ["potatoes", 2, 0],
        ["carrot", 3, 0]
    ]
})

mldb.post('/v1/datasets/example/rows', {
    "rowName": "row_1",
    "columns": [
        ["potato", "crips", 0],
        ["potatoes", "chips", 0],
        ["carrot", 0, 0],
        ["carrots", "hi mom", 0]
    ]
})

mldb.post("/v1/datasets/example/commit")

This is what the dataset looks like:

In [6]:
mldb.query("select * from example")

Unnamed: 0_level_0,carrot,carrots,potato,potatoes
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
row_1,0,hi mom,crips,chips
row_0,3,,1,2


The following query will apply the stemmer to the columns, summing the resulting counts:

In [11]:
mldb.query("SELECT my_stemmer({words: {*}})[words] as * FROM example")

Unnamed: 0_level_0,carrot,potato
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1
row_1,1,2
row_0,3,3


Note that strings are coerced to the integer value 1.

We can also nicely use it in conjunction with the tokenize function:

In [13]:
mldb.query("""
    SELECT my_stemmer({words: {
                        tokenize('I have liked having carrots', 
                                 {splitchars:' '}) as *
                      }}) as *
""")

Unnamed: 0_level_0,words.I,words.carrot,words.have,words.like
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
result,1,1,2,1
