# Foreword

One of the first step of any Machine Learning project is importing data into your system. One of the strenghts of 
MLDB is the ease with which you can do that with various type of sources. File importation from CSV and JSON, locally or not via many protocols (http, https, sftp, git, s3). Direct access to SentiWordNet lexical resource, Word2Vec tool and PostgreSQL. To follow up with the same trend, we just added support to one more system: MongoDB.

# MLDB MongoDB Plugin Demo

The present notebook demoes the 4 functionalities provided by the MLDB MongoDB Plugin.

* Procedure mongodb.import
* Dataset mongodb.dataset
* Dataset mongodb.record
* Function mongodb.query

## Setup

### MongoDB Database

Our MongoDB database URI is mongodb://khan.datacratic.com:11712/zips and the collection name is zips.

⚠ The MongoDB database used throughout this demo is not publicly available.

### Data

For this demo, we will use a MongoDB database populated with data provided by the book [MongoDB In Action](https://www.manning.com/books/mongodb-in-action). The zipped json file is available at http://mng.bz/dOpd.

Here is the first record:

```
{
    "city": "ACMAR",
    "zip": "35004",
    "loc": {
        "y": 33.584131999999997,
        "x": 86.515569999999997
    },
    "pop": 6055,
    "state": "AL"
}
```

Before continuing, let's start with our MLDB obligatory step.

In [2]:
from pymldb import Connection
mldb = Connection()

## Procedure mongodb.import

The MongoDB Import Procedure type is used to import a MongoDB collection into a dataset.

See: [mongodb.import](../../../../doc/#/v1/plugins/mongodb/doc/MongoImport.md.html)


Here we import the zips collection into an MLDB dataset called mongodb_zips.

In [6]:
print mldb.post('/v1/procedures', {
    'type' : 'mongodb.import',
    'params' : {
        'connectionScheme': 'mongodb://khan.datacratic.com:11712/zips',
        'collection': 'zips',
        'outputDataset' : {
            'id' : 'mongodb_zips',
            'type' : 'sparse.mutable'
        }
    }
})

<Response [201]>


We can now query the imported data as we would any other MLDB Dataset.

In [7]:
mldb.query("SELECT * FROM mongodb_zips LIMIT 5")

Unnamed: 0_level_0,_id,city,loc.x,loc.y,pop,state,zip
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
57d2f5eb21af5ee9c4e27f08,57d2f5eb21af5ee9c4e27f08,BONDURANT,110.335287,43.223798,116,WY,82922
57d2f5eb21af5ee9c4e27f07,57d2f5eb21af5ee9c4e27f07,KAYCEE,106.56323,43.723625,876,WY,82639
57d2f5eb21af5ee9c4e27f05,57d2f5eb21af5ee9c4e27f05,CLEARMONT,106.458071,44.66101,350,WY,82835
57d2f5eb21af5ee9c4e27f03,57d2f5eb21af5ee9c4e27f03,ARVADA,106.109191,44.689876,107,WY,82831
57d2f5eb21af5ee9c4e27f01,57d2f5eb21af5ee9c4e27f01,COKEVILLE,110.916419,42.057983,905,WY,83114


Here we did not provide any `named` parameter so oid() was used. This is why _rowName and _id have the same values.

Another element to note is how the `loc` object was imported. The sub object was disassembled and imported as `loc.x` and `loc.y` into MLDB.

## Dataset mongodb.dataset

The MongoDB Dataset is a read only dataset based on a MongoDB collection. It is meant to be used as a bridge between MLDB and MongoDB by allowing MLDB SQL queries to run over a MongoDB collection.

See: [mongodb.dataset](../../../../doc/#/v1/plugins/mongodb/doc/MongoDataset.md.html)

Here we create a dataset named "mongodb_zips_bridge".

In [10]:
print mldb.put('/v1/datasets/mongodb_zips_bridge', {
    'type' : 'mongodb.dataset',
    'params' : {
        'connectionScheme': 'mongodb://khan.datacratic.com:11712/zips',
        'collection': 'zips'
    }
})

<Response [201]>


We can directly query it.

In [14]:
mldb.query("SELECT * NAMED zip FROM mongodb_zips_bridge ORDER BY pop DESC LIMIT 5")

Unnamed: 0_level_0,_id,city,loc.x,loc.y,pop,state,zip
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
60623,57d2f5eb21af5ee9c4e22302,CHICAGO,87.7157,41.849015,112047,IL,60623
11226,57d2f5eb21af5ee9c4e24f28,BROOKLYN,73.956985,40.646694,111396,NY,11226
10021,57d2f5eb21af5ee9c4e24e7f,NEW YORK,73.958805,40.768476,106564,NY,10021
10025,57d2f5eb21af5ee9c4e24e4f,NEW YORK,73.968312,40.797466,100027,NY,10025
90201,57d2f5eb21af5ee9c4e21258,BELL GARDENS,118.17205,33.969177,99568,CA,90201


## Dataset mongodb.record

The MongoDB Record Dataset is a write-only dataset that writes to a MongoDB collection. It can be used to export your data.

See: [mongodb.record](../../../../doc/#/v1/plugins/mongodb/doc/MongoRecord.md.html)

Here we create the dataset named "mldb_to_mongodb" which will write to mongodb database "zips" collection "mldb_coll".

In [10]:
print mldb.put("/v1/datasets/mldb_to_mongodb", {
    "type": "mongodb.record",
    "params": {
        "connectionScheme": 'mongodb://khan.datacratic.com:11712/zips',
        "collection": 'mldb_coll'
    }
})

<Response [201]>


Then we record a row with 2 columns.

In [18]:
print mldb.post('/v1/datasets/mldb_to_mongodb/rows', {
    'rowName' : 'row1',
    'columns' : [
        ['colA', 'valA', 0],
        ['colB', 'valB', 0]
    ]
})

<Response [200]>


--TODO-- ça fait un peu clown non?

mongodb.record dataset records the row name's into MongoDB _id field. MLDB cannot read collections having non ObjectID values for the _id field so we are going to use pymongo to look at what was inserted into MongoDB.

See: [PyMongo tutorial](https://api.mongodb.com/python/current/tutorial.html)

In [7]:
import pymongo
from pymongo import MongoClient
client = MongoClient('mongodb://khan.datacratic.com:11712')
zips = client.zips
zips.mldb_coll.find_one()

{u'_id': u'row1', u'colB': u'valB', u'colA': u'valA'}


## Function mongodb.query

The MondoDB query function allows the creation of a function to perform an MLDB SQL query against a MongoDB collection. It acts similarly to the SQL Query function.

See: [mongodb.query](../../../../doc/#/v1/plugins/mongodb/doc/MongoQuery.md.html)
    
Here we create the query function on the MongoDB zips database zips collection.

In [17]:
print mldb.put("/v1/functions/mongo_query", {
    "type": "mongodb.query",
    "params": {
        "connectionScheme": 'mongodb://khan.datacratic.com:11712/zips',
        "collection": 'zips'
    }
})

<Response [201]>


A direct call to the function looks like

In [33]:
import json
mldb.get('/v1/functions/mongo_query/application',
    input={'query' : json.dumps({'zip' : {'$eq' : '60623'}})}
).json()

{u'output': {u'_id': u'57d2f5eb21af5ee9c4e22302',
  u'city': u'CHICAGO',
  u'loc': [[u'x', [87.7157, u'2016-09-09T17:48:27Z']],
   [u'y', [41.849015, u'2016-09-09T17:48:27Z']]],
  u'pop': 112047,
  u'state': u'IL',
  u'zip': u'60623'}}

Here is an example of the function beign used within a query.

In [32]:
mldb.query("""
    SELECT mongo_query({query: '{"loc.x" : {"$eq" : 73.968312}}'}) AS *
""")

Unnamed: 0_level_0,_id,city,loc.x,loc.y,pop,state,zip
_rowName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
result,57d2f5eb21af5ee9c4e24e4f,NEW YORK,73.968312,40.797466,100027,NY,10025


# Conclusion

That covers the 4 functionalities made available by the MongoDB MLDB Plugin. To learn more about MLDB and even try it freely, head to http://mldb.ai. There are also more demoes availables at https://docs.mldb.ai/doc/#builtin/Demos.md.html.