In this repository, you will find JSON representing a <i>subset</i> of the data for the <a href="http://modeldb.yale.edu">ModelDB</a> repository of computational neuroscience models.

<h1>Getting started</h1>

Begin by cloning this repository. Create a private repository on github and push your local copy to there.<br/><br/>Connect to MongoDB and create a database for this assignment.

In [60]:
from pymongo import MongoClient
from pprint import pprint
mongodb = MongoClient() 
hw1DB = mongodb.hw1DB
#drop them so we dont have repeats...
hw1DB.drop_collection('models')
hw1DB.drop_collection('papers')

models = hw1DB.models
papers = hw1DB.papers

Using the <tt>json</tt> module and Python file operations, load the data from <tt>modelcollection.json</tt> and <tt>papercollection.json</tt> into Python.

In [61]:
import json

with open('modelcollection.json') as json_data:
    mc = json.load(json_data)
    
models.insert_many(mc)
#insert papers
with open('papercollection.json') as json_data:
    pc = json.load(json_data)

papers.insert_many(pc)

<pymongo.results.InsertManyResult at 0x7f960df0f8c0>

Put the loaded data into two collections in your database. I recommend calling them <tt>models</tt> and <tt>papers</tt>.

<h1>Explore the database</h1>

Use MongoDB to answer the following questions. Run your code in the spaces provided.

<b>Q: How many models are there?</b>

In [57]:
models.count()

1114

<b>What are the field names (keys) for the model entry with <tt>_id</tt> = 87284?</b>

In [7]:
for doc in models.find({'_id': 87284}):
    pprint(doc.keys())

[u'transmitters',
 u'title',
 u'text',
 u'genes',
 u'simenvironment',
 u'celltypes',
 u'brainregions',
 u'channels',
 u'references',
 u'modeltype',
 u'receptors',
 u'_id',
 u'modelconcepts']


Note: this data is not completely denormalized: references in both collections are given in terms of the <tt>_id</tt> field of the paper collection.<br/><br/><b>How many distinct cell types are in the models collection?</b>

In [16]:
len(models.find().distinct('celltypes'))

188

<b>Find the list of model ids for models that contain a Hippocampus CA3 pyramidal cell.</b>

In [8]:
for doc in models.find({'celltypes': 'Hippocampus CA3 pyramidal cell'}, {'_id':1}):
    pprint(doc)

{u'_id': 101629}
{u'_id': 114337}
{u'_id': 118098}
{u'_id': 120907}
{u'_id': 126814}
{u'_id': 129067}
{u'_id': 135902}
{u'_id': 135903}
{u'_id': 137259}
{u'_id': 137505}
{u'_id': 138421}
{u'_id': 139421}
{u'_id': 142104}
{u'_id': 143148}
{u'_id': 146499}
{u'_id': 147756}
{u'_id': 147867}
{u'_id': 148035}
{u'_id': 150288}
{u'_id': 151282}
{u'_id': 168314}
{u'_id': 168874}
{u'_id': 181967}
{u'_id': 184139}
{u'_id': 185512}
{u'_id': 186768}
{u'_id': 189088}
{u'_id': 20007}
{u'_id': 3263}
{u'_id': 35358}
{u'_id': 7907}
{u'_id': 84606}
{u'_id': 87216}
{u'_id': 87762}
{u'_id': 98003}


<b>What other cells appear in models with a Hippocampus CA3 pyramidal cell? Sort them in alphabetical order. How many such cells are there?</b>

In [33]:
doc = (models.find({'celltypes': 'Hippocampus CA3 pyramidal cell'}, {'celltypes':1, '_id':0}).distinct('celltypes'))
doc.remove('Hippocampus CA3 pyramidal cell')
pprint (sorted(doc))
print 'There are ' + str(len(doc)) + ' cells'

[u'Abstract Izhikevich neuron',
 u'Abstract integrate-and-fire adaptive exponential (AdEx) neuron',
 u'Cerebellum purkinje cell',
 u'Dentate gyrus basket cell',
 u'Dentate gyrus granule cell',
 u'Dentate gyrus hilar cell',
 u'Dentate gyrus mossy cell',
 u'Entorhinal cortex stellate cell',
 u'Hippocampus CA1 axo-axonic cell',
 u'Hippocampus CA1 basket cell',
 u'Hippocampus CA1 bistratified cell',
 u'Hippocampus CA1 interneuron oriens alveus',
 u'Hippocampus CA1 pyramidal cell',
 u'Hippocampus CA3 axo-axonic cells',
 u'Hippocampus CA3 basket cell',
 u'Hippocampus CA3 stratum oriens lacunosum-moleculare interneuron',
 u'Hippocampus septum medial GABAergic neuron',
 u'Hodgkin-Huxley neuron',
 u'Neocortex fast spiking (FS) interneuron',
 u'Neocortex layer 2-3 pyramidal cell',
 u'Neocortex layer 4 pyramidal cell',
 u'Neocortex layer 5-6 pyramidal cell',
 u'Neocortex spiking regular (RS) neuron',
 u'Neocortical pyramidal cortical-thalamic cell',
 u'Pinsky-Rinzel CA1/3 pyramidal cell ']
There 

<h1>Use aggregation</h1>

How many models are there for each cell type? Display the results in a formatted table, sorted from most commonly appearing cell type to least commonly appearing.

In [34]:
pipeline = [
    {'$unwind': '$celltypes'},
    {'$group' : 
        {'_id' : '$celltypes', 'num_models' : 
                {'$sum' : 1}
        }
    }, 
    {'$project': 
            {'celltypes':1, 'num_models':1}
    },
    { "$sort": { "num_models": -1 } }]
for doc in (models.aggregate(pipeline)):
    pprint (doc)

{u'_id': u'Neocortex layer 5-6 pyramidal cell', u'num_models': 108}
{u'_id': u'Hippocampus CA1 pyramidal cell', u'num_models': 104}
{u'_id': u'Neocortex layer 2-3 pyramidal cell', u'num_models': 60}
{u'_id': u'Hippocampus CA3 pyramidal cell', u'num_models': 35}
{u'_id': u'Neocortex fast spiking (FS) interneuron', u'num_models': 30}
{u'_id': u'Olfactory bulb main mitral cell', u'num_models': 30}
{u'_id': u'Hodgkin-Huxley neuron', u'num_models': 29}
{u'_id': u'Thalamus geniculate nucleus (lateral) principal neuron',
 u'num_models': 26}
{u'_id': u'Abstract integrate-and-fire leaky neuron', u'num_models': 25}
{u'_id': u'Cerebellum purkinje cell', u'num_models': 24}
{u'_id': u'Dentate gyrus granule cell', u'num_models': 24}
{u'_id': u'Neocortex spiking regular (RS) neuron', u'num_models': 22}
{u'_id': u'Neostriatum spiny direct pathway neuron', u'num_models': 22}
{u'_id': u'Neocortex spiking low threshold (LTS) neuron', u'num_models': 21}
{u'_id': u'Neocortex layer 4 pyramidal cell', u'num_

Find the model titles (not paper titles) for models that (1) involve a Hippocampus CA3 pyramidal cell, and (2) have an associated reference where one of the authors is "Migliore M".

In [35]:
pipeline = [{'$lookup': 
                {'from' : 'models',
                 'localField' : '_id',
                 'foreignField' : 'references',
                 'as' : 'cellmodels'}},
            {'$unwind': '$cellmodels'},
             {'$match':
                 {'authors' : 'Migliore M', 'cellmodels.celltypes' : 'Hippocampus CA3 pyramidal cell'}},
            {'$project': 
                {'cellmodels.title':1, '_id' : 0}} 
             ]
             
for doc in (papers.aggregate(pipeline)):
    pprint (doc)

{u'cellmodels': {u'title': u'CA3 Pyramidal Neuron (Migliore et al 1995)'}}
{u'cellmodels': {u'title': u'CA3 pyramidal neuron (Lazarewicz et al 2002)'}}
{u'cellmodels': {u'title': u'CA3 pyramidal neuron: firing properties (Hemond et al. 2008)'}}
{u'cellmodels': {u'title': u'Ca3 pyramidal neuron: membrane response near rest (Hemond et al. 2009)'}}
{u'cellmodels': {u'title': u'CA3 pyramidal neuron (Safiulina et al. 2010)'}}
{u'cellmodels': {u'title': u'A model of unitary responses from A/C and PP synapses in CA3 pyramidal cells (Baker et al. 2010)'}}


Find all the authors who were on a paper associated with a model that involved a Hippocampus CA3 pyramidal cell. Sort them in alphabetical order; give this list and state its length.

In [36]:
pipeline = [{'$lookup': 
                {'from' : 'models',
                 'localField' : '_id',
                 'foreignField' : 'references',
                 'as' : 'cellmodels'}},
            {'$unwind': '$cellmodels'},
             {'$match':
                 {'cellmodels.celltypes' : 'Hippocampus CA3 pyramidal cell'}},
            {'$project': 
                {'authors':1, '_id' : 0, 'cellmodels.celltypes':1}} 
             ]
             
for doc in (papers.aggregate(pipeline)):
    pprint (doc)

{u'authors': [u'Roth A', u'Hausser M', u'Vetter P'],
 u'cellmodels': {u'celltypes': [u'Hippocampus CA1 pyramidal cell',
                                u'Hippocampus CA3 pyramidal cell',
                                u'Neocortex layer 5-6 pyramidal cell',
                                u'Cerebellum purkinje cell']}}
{u'authors': [u'Migliore M',
              u'Cook EP',
              u'Jaffe DB',
              u'Johnston D',
              u'Turner DA'],
 u'cellmodels': {u'celltypes': [u'Hippocampus CA3 pyramidal cell']}}
{u'authors': [u'Rinzel J', u'Pinsky PF'],
 u'cellmodels': {u'celltypes': [u'Hippocampus CA3 pyramidal cell']}}
{u'authors': [u'Migliore M', u'Lazarewicz MT', u'Ascoli GA'],
 u'cellmodels': {u'celltypes': [u'Hippocampus CA3 pyramidal cell']}}
{u'authors': [u'Rinzel J', u'Pinsky PF'],
 u'cellmodels': {u'celltypes': [u'Hippocampus CA3 pyramidal cell']}}
{u'authors': [u'Chattarji S', u'Narayanan R', u'Narayan A'],
 u'cellmodels': {u'celltypes': [u'Hippocampus CA3 pyrami

<h1>Modify the database</h1>

Rename the Hippocampus CA1 pyramidal cell to be the Hippocampus CA1 pyramidal neuron. (Note: here we're using CA1 instead of CA3.) Make sure that this is consistent across all documents in the models collection.

In [62]:
print 'We should have a count of 104 of the \" Hippocampus CA1 pyramidal cell\" and 0 of \"Hippocampus CA1 pyramidal neuron\"'
print "Hippocampus CA1 pyramidal cell count: " + str(models.find({'celltypes' : 'Hippocampus CA1 pyramidal cell'}).count())
print "Hippocampus CA1 pyramidal neuron count: " + str(models.find({'celltypes' : 'Hippocampus CA1 pyramidal neuron'}).count())

models.update_many({'celltypes':'Hippocampus CA1 pyramidal cell'},
              {'$set':
                  {'celltypes':'Hippocampus CA1 pyramidal neuron'}})
print 'We should now have a count of 104 of the \" Hippocampus CA1 pyramidal neuron\" and 0 of \"Hippocampus CA1 pyramidal cell\"'
print "Hippocampus CA1 pyramidal cell count: " + str(models.find({'celltypes' : 'Hippocampus CA1 pyramidal cell'}).count())
print "Hippocampus CA1 pyramidal neuron count: " + str(models.find({'celltypes' : 'Hippocampus CA1 pyramidal neuron'}).count())

We should have a count of 104 of the " Hippocampus CA1 pyramidal cell" and 0 of "Hippocampus CA1 pyramidal neuron"
Hippocampus CA1 pyramidal cell count: 104
Hippocampus CA1 pyramidal neuron count: 0
We should now have a count of 104 of the " Hippocampus CA1 pyramidal neuron" and 0 of "Hippocampus CA1 pyramidal cell"
Hippocampus CA1 pyramidal cell count: 0
Hippocampus CA1 pyramidal neuron count: 104


Add a new entry (make up the data, but keep it appropriate) to the models collection. Associate it with two references, one that already exists and one that you also add to the papers collection.

In [63]:
papers.insert_one(    {
        "missing_references": "all done", 
        "doi": "10.1111/j.1549-8719.2001.tb00156.x", 
        "title": "Simulating the spread of memes through the use of 4chan.", 
        "journal": "4chan.org", 
        "year": "2017", 
        "month": "Feb", 
        "volume": "2", 
        "first_page": "1", 
        "last_page": "999", 
        "references": [
            4461, 
        ], 
        "pubmed_id": "11296851", 
        "authors": [
            "Crane GJ", 
            "Hines ML", 
            "Neild TO"
        ], 
        "_id": 5432112345, 
        "type": "M"
    })

models.insert_one({
                  'name': 'Hippocampus MEME Cell',
                  'channels': ['Rare_PePe', 'Salt_Bae', 'How_Bout_Dat?', 'Cash_Me_Outside'],
                  'transmitters': ['NO', 'Glutamate'],
                  'average dendrite length': 4586,
                  'references' : [5432112345, 3860]
              })

for doc in models.find({'name':'Hippocampus MEME Cell'}):
    print doc

pipeline = [{'$lookup': 
                {'from' : 'models',
                 'localField' : '_id',
                 'foreignField' : 'references',
                 'as' : 'cellmodels'}},
            {'$unwind': '$cellmodels'},
             {'$match':
                 {'_id' : 5432112345}},
            {'$project': 
                {'authors':1, '_id' : 0, 'cellmodels.celltypes':1}} 
             ]
for doc in papers.aggregate(pipeline):
    pprint (doc)

{u'transmitters': [u'NO', u'Glutamate'], u'name': u'Hippocampus MEME Cell', u'channels': [u'Rare_PePe', u'Salt_Bae', u'How_Bout_Dat?', u'Cash_Me_Outside'], u'references': [5432112345L, 3860], u'average dendrite length': 4586, u'_id': ObjectId('5893b77ff0a43d0fd4557d2c')}
{u'authors': [u'Crane GJ', u'Hines ML', u'Neild TO'], u'cellmodels': {}}
