# MongodB - data exploration, part 1
I mean technically it's not really part 1. Maybe 1.5 or something. But it's the first notebook.

The tools I'm using in this notebook are being used for a specific purpose - accessing a MongoDB database with type 1 diabetes Looping data in it. But the pymongo and pymongo-arrow tools I'm using are really general - so maybe this will be a helpful example beyond this specific use-case. 

In [9]:
# Load the module for accessing the Mongo database.
from mdb_tools import load_data as ld
from pymongoarrow.api import Schema
import pyarrow as pa

You can get help on any function, including this one that I just made, using the "help".

In [3]:
help(ld.get_collections)

Help on function get_collections in module mdb_tools.load_data:

get_collections(yml_secrets_file)
    Using the URI for the mongodb database, load a set of collections (the selection is currently hard-coded
    
    Example usage:
    col_entries, col_treatments, col_profile, col_device_status = ld.get_collections(yml_secrets_file)
    
    Args:
        yml_secrets_file (str): path to a yml file containing the URI and mongodB name.
    
    Returns: A tuple containing a specific set of collections (basically, tables, from the mongo database: entries, treatments, profile, and device status, in that order.



You can only access a Mongo database if you have a URI with a password and database name in it. That needs to be a secret, so of course I'm not putting it into a public github repository! There are a many approaches you could choose, but I decided to go with a yml file that lives outside the repository.

In [13]:
yml_secrets_file = '../secrets/mdb_secrets.yml'

# Access the database using the yml secrets file, and get a specific set of "collections"
col_entries, col_treatments, col_profile, col_devicestatus = ld.get_collections(yml_secrets_file)

Each collection contains a number of "documents". This next code cell shows how you can quickly access one document in a collection. You can see that it looks a bit like a python dictionary. This is an example of a document from the "entries" collection, and it shows information from the continuous glucose monitor (CGM). The blood glucose value is listed as "sgv". 

In [5]:
col_entries.find_one()

{'_id': ObjectId('640f53dedda681546abb714e'),
 'sgv': 163,
 'date': 1678724324000.0,
 'dateString': '2023-03-13T16:18:44.000Z',
 'trend': 4,
 'direction': 'Flat',
 'device': 'share2',
 'type': 'sgv',
 'utcOffset': 0,
 'sysTime': '2023-03-13T16:18:44.000Z'}

Now we can take some of the data in those collections and convert to pandas dataframes. In order to do this, however, we need to define a schema. The schema basically tells pymongo which data types to expect. We'll start by defining the schemas here. Note that I'm not actually grabbing every possible item from the documents, just a few that I think might be useful. 

In [17]:
entries_schema = Schema({
    'sgv': float,
    'dateString': str,
})

treatments_schema = Schema({
    'duration': float,
    'amount': float,
    'absolute': float,
    'timestamp': str,
    'created_at': str,
    'rate': float,
    'temp': str,
    'automatic': bool,
    'eventType': str,
})

devicestatus_schema = Schema({
    'created_at': str,
    'override': {'active': bool},
    'loop': {
        'predicted': {'values': pa.list_(pa.float64())},
        'enacted': {'duration': float,
                    'rate': float,
                    'bolusVolume': float,
                    'received': bool},
        'recommendedBolus': float,
        'automaticDoseRecommendation': {'bolusVolume': float},
        'cob': float,
        'iob': {'iob': float}
    }
})

Next, we can use pymongo arrow to convert to pandas:

In [18]:
df_entries = col_entries.find_pandas_all({}, schema=entries_schema)
df_treatments = col_treatments.find_pandas_all({}, schema=treatments_schema)
df_devicestatus = col_devicestatus.find_pandas_all({}, schema=devicestatus_schema)

df_entries

Unnamed: 0,sgv,dateString
0,163.0,2023-03-13T16:18:44.000Z
1,160.0,2023-03-13T16:13:43.000Z
2,182.0,2023-03-13T16:43:43.000Z
3,178.0,2023-03-13T16:38:43.000Z
4,178.0,2023-03-13T16:33:43.000Z
...,...,...
52344,207.0,2023-10-18T12:24:21.000Z
52345,228.0,2023-10-18T12:29:21.000Z
52346,247.0,2023-10-18T12:34:21.000Z
52347,265.0,2023-10-18T12:39:21.000Z


So there we have it - data! Let the fun begin :D