# Example usage and testing of intake-mongo plugin

First step is having a mongodb server. We will use the following uri to access the server:

In [1]:
MONGODB_URI='mongodb://localhost:32768/intake-mongo-test-db'
DATABASE = 'intake-mongo-test-db'

Let's import some packages that we will use later:

In [19]:
import intake
import intake_mongo
import pymongo
import pandas
import numpy

As an example, we can access the database using pymongo in a regular way. Let's check a simple dataset:

In [3]:
client = pymongo.MongoClient(MONGODB_URI)

In [4]:
list(client[DATABASE].collection1.find())

[{'_id': ObjectId('5a97dfaf546a39039e5154cd'),
  'name': 'Alice',
  'rank': 1,
  'score': 100.5},
 {'_id': ObjectId('5a97dfaf546a39039e5154ce'),
  'name': 'Bob',
  'rank': 2,
  'score': 50.3},
 {'_id': ObjectId('5a97dfaf546a39039e5154cf'),
  'name': 'Charlie',
  'rank': 3,
  'score': 25.0},
 {'_id': ObjectId('5a97dfaf546a39039e5154d0'),
  'name': 'Eve',
  'rank': 3,
  'score': 25.0}]

In a more readable way, we can make a dataframe out of it.

In [5]:
df = pandas.DataFrame(list(client[DATABASE].collection1.find()))
del df['_id'] # remove the pandas _id
df

Unnamed: 0,name,rank,score
0,Alice,1,100.5
1,Bob,2,50.3
2,Charlie,3,25.0
3,Eve,3,25.0


## Going with intake

With intake the datasource will be described in a catalog, so in order to use it, it will be enough to have the catalog configured and open in by name.

For demo purposes this notebook will go to a lower level layer, creating the dataset from the plugin programmatically. Note that most of the code shown here is not necessary when using the catalog.

In [6]:
plugin = intake_mongo.Plugin() # create the plugin
datasource = plugin.open(MONGODB_URI, 'collection1', ['name', 'rank', 'score'])

In [7]:
datasource.discover()

{'datashape': None,
 'dtype': dtype([('name', '<u8'), ('rank', '<u8'), ('score', '<u8')]),
 'metadata': {},
 'npartitions': 1,
 'shape': (None,)}

In [57]:
data = datasource.read()
data

Unnamed: 0,name,rank,score
0,Alice,1,100.5
1,Bob,2,50.3
2,Charlie,3,25.0
3,Eve,3,25.0


In [16]:
list(data.dtypes)

[dtype('O'), dtype('uint64'), dtype('float64')]

In [21]:
numpy.dtype(list(data.dtypes))

TypeError: data type not understood

In [23]:
data.dtypes

name      object
rank      uint64
score    float64
dtype: object

In [25]:
list(data.dtypes)

[dtype('O'), dtype('uint64'), dtype('float64')]

In [58]:
list(data.columns)

['name', 'rank', 'score']

In [28]:
foo = {'names': list(data.columns), 'formats': list(data.dtypes)}

In [29]:
foo

{'formats': [dtype('O'), dtype('uint64'), dtype('float64')],
 'names': ['name', 'rank', 'score']}

In [39]:
dt = numpy.dtype(foo)

In [45]:
dt.names

('name', 'rank', 'score')

In [48]:
data.columns == dt.names

array([False, False, False])

In [49]:
data.columns

Index(['name', 'rank', 'score'], dtype='object')

In [59]:
data.columns

Index(['name', 'rank', 'score'], dtype='object')

In [60]:
list(data.dtypes)

[dtype('O'), dtype('uint64'), dtype('float64')]

In [61]:
data.dtypes

name      object
rank      uint64
score    float64
dtype: object

In [64]:
for i in data.dtypes:
    print(type(i))

<class 'numpy.dtype'>
<class 'numpy.dtype'>
<class 'numpy.dtype'>


In [66]:
dt.fields['name'][0]

dtype('O')

In [67]:
dt

dtype([('name', 'O'), ('rank', '<u8'), ('score', '<f8')])

In [68]:
data.dtypes

name      object
rank      uint64
score    float64
dtype: object

In [75]:
all(x == y for x, y in zip(dt.names, data.columns))

True