In this repository, you will find JSON representing a <i>subset</i> of the data for the <a href="http://modeldb.yale.edu">ModelDB</a> repository of computational neuroscience models.

<h1>Getting started</h1>

Begin by cloning this repository. Create a private repository on github and push your local copy to there.<br/><br/>Connect to MongoDB and create a database for this assignment.

In [9]:
import pymongo
from pymongo import MongoClient
mongodb = MongoClient()   
hw1 = mongodb.hw1

Using the <tt>json</tt> module and Python file operations, load the data from <tt>modelcollection.json</tt> and <tt>papercollection.json</tt> into Python.

In [10]:
import json
x = open('modelcollection.json')
y = open('papercollection.json')
model_data = json.load(x)
paper_data = json.load(y)

Put the loaded data into two collections in your database. I recommend calling them <tt>models</tt> and <tt>papers</tt>.

In [11]:
models = hw1.models
models.insert_many(model_data)
papers = hw1.papers
papers.insert_many(paper_data)

<pymongo.results.InsertManyResult at 0x104b95fa0>

<h1>Explore the database</h1>

Use MongoDB to answer the following questions. Run your code in the spaces provided.

<b>Q: How many models are there?</b>

In [16]:
models.count()

1114

<b>What are the field names (keys) for the model entry with <tt>_id</tt> = 87284?</b>

In [296]:
document = hw1.papers.find_one()
pprint.pprint(document)

{u'_id': 3860,
 u'authors': [u'Crane GJ', u'Hines ML', u'Neild TO'],
 u'doi': u'10.1111/j.1549-8719.2001.tb00156.x',
 u'first_page': u'33',
 u'journal': u'Microcirculation',
 u'last_page': u'43',
 u'missing_references': u'all done',
 u'month': u'Feb',
 u'pubmed_id': u'11296851',
 u'references': [4461,
                 4755,
                 4757,
                 4758,
                 4764,
                 4770,
                 4773,
                 4777,
                 4781,
                 4782,
                 4786,
                 4790,
                 4792,
                 4793,
                 4797,
                 4799,
                 4800,
                 4805,
                 4806,
                 4810,
                 4815,
                 4818,
                 4819,
                 4820,
                 4823,
                 4825,
                 17148,
                 17150,
                 17152,
                 17154,
                 86943],
 

In [12]:
import pprint
doc = models.find_one({"_id": 87284})
pprint.pprint(doc.keys())

[u'transmitters',
 u'title',
 u'text',
 u'genes',
 u'simenvironment',
 u'celltypes',
 u'brainregions',
 u'channels',
 u'references',
 u'modeltype',
 u'receptors',
 u'_id',
 u'modelconcepts']


Note: this data is not completely denormalized: references in both collections are given in terms of the <tt>_id</tt> field of the paper collection.<br/><br/><b>How many distinct cell types are in the models collection?</b>

In [13]:
 models.find().distinct('celltypes')

 

 

[u'Substantia nigra pars compacta dopaminergic cell',
 u'Hippocampus CA3 pyramidal cell',
 u'Leech S cell',
 u'Hippocampus CA1 interneuron oriens alveus',
 u'Heart cell',
 u'Globus pallidus neuron',
 u'Skeletal muscle cell',
 u'Hippocampus CA1 pyramidal cell',
 u'Neocortex layer 5-6 pyramidal cell',
 u'Thalamus geniculate nucleus (lateral) principal neuron',
 u'Thalamus reticular nucleus cell',
 u'Neuroblastoma',
 u'Hippocampus dissociated neuron',
 u'Leech heart interneuron',
 u'Squid axon',
 u'Astrocyte',
 u'Cerebellum golgi cell',
 u'Nucleus accumbens spiny projection neuron',
 u'Neocortex layer 2-3 pyramidal cell',
 u'Cerebellum purkinje cell',
 u'Medial Nucleus of the Trapezoid Body (MNTB) neuron',
 u'GnRH neuron',
 u'Abstract Wang-Buzsaki neuron',
 u'Hippocampus CA1 basket cell',
 u'NG108-15 neuronal cell',
 u'Abstract integrate-and-fire adaptive exponential (AdEx) neuron',
 u'Spinal cord motor neuron',
 u'Spinal lamprey neuron',
 u'Neocortex spiny stellate cell',
 u'Olfactory bu

<b>Find the list of model ids for models that contain a Hippocampus CA3 pyramidal cell.</b>

In [127]:
for doc in models.find({"celltypes": "Hippocampus CA3 pyramidal cell"}, {"_id":1}):
    pprint.pprint(doc)

{u'_id': 101629}
{u'_id': 114337}
{u'_id': 118098}
{u'_id': 120907}
{u'_id': 126814}
{u'_id': 129067}
{u'_id': 135902}
{u'_id': 135903}
{u'_id': 137259}
{u'_id': 137505}
{u'_id': 138421}
{u'_id': 139421}
{u'_id': 142104}
{u'_id': 143148}
{u'_id': 146499}
{u'_id': 147756}
{u'_id': 147867}
{u'_id': 148035}
{u'_id': 150288}
{u'_id': 151282}
{u'_id': 168314}
{u'_id': 168874}
{u'_id': 181967}
{u'_id': 184139}
{u'_id': 185512}
{u'_id': 186768}
{u'_id': 189088}
{u'_id': 20007}
{u'_id': 3263}
{u'_id': 35358}
{u'_id': 7907}
{u'_id': 84606}
{u'_id': 87216}
{u'_id': 87762}
{u'_id': 98003}


<b>What other cells appear in models with a Hippocampus CA3 pyramidal cell? Sort them in alphabetical order. How many such cells are there?</b>

In [230]:

for doc in models.find({"celltypes": "Hippocampus CA3 pyramidal cell"}):
    results.append(doc["celltypes"])

results.sort()

pprint.pprint(results)
 
 

 

 

 
    
 
  
  
 
  
   
    
 
    
    

[[u'Dentate gyrus granule cell',
  u'Hippocampus CA1 pyramidal cell',
  u'Hippocampus CA3 pyramidal cell',
  u'Hippocampus CA3 basket cell',
  u'Dentate gyrus mossy cell',
  u'Dentate gyrus basket cell',
  u'Dentate gyrus hilar cell',
  u'Hippocampus CA1 basket cell',
  u'Hippocampus CA3 stratum oriens lacunosum-moleculare interneuron',
  u'Hippocampus CA1 bistratified cell',
  u'Hippocampus CA1 axo-axonic cell',
  u'Hippocampus CA3 axo-axonic cells'],
 [u'Dentate gyrus granule cell',
  u'Hippocampus CA1 pyramidal cell',
  u'Hippocampus CA3 pyramidal cell',
  u'Hippocampus CA3 basket cell',
  u'Dentate gyrus mossy cell',
  u'Dentate gyrus basket cell',
  u'Dentate gyrus hilar cell',
  u'Hippocampus CA1 basket cell',
  u'Hippocampus CA3 stratum oriens lacunosum-moleculare interneuron',
  u'Hippocampus CA1 bistratified cell',
  u'Hippocampus CA1 axo-axonic cell',
  u'Hippocampus CA3 axo-axonic cells'],
 [u'Dentate gyrus granule cell',
  u'Hippocampus CA1 pyramidal cell',
  u'Hippocampus 

NameError: name 'count' is not defined

<h1>Use aggregation</h1>

How many models are there for each cell type? Display the results in a formatted table, sorted from most commonly appearing cell type to least commonly appearing.

In [285]:
from bson.son import SON
query = [
            {
                '$unwind': '$celltypes'
            },
            {
                '$group': {
                    '_id': '$celltypes',
                    'count': {'$sum':1}
                }
            },
            {'$sort': SON([('count',-1)])},
    
            
        ]

pprint.pprint(list(hw1.models.aggregate(query)))

[{u'_id': u'Neocortex layer 5-6 pyramidal cell', u'count': 108},
 {u'_id': u'Hippocampus CA1 pyramidal cell', u'count': 104},
 {u'_id': u'Neocortex layer 2-3 pyramidal cell', u'count': 60},
 {u'_id': u'Hippocampus CA3 pyramidal cell', u'count': 35},
 {u'_id': u'Olfactory bulb main mitral cell', u'count': 30},
 {u'_id': u'Neocortex fast spiking (FS) interneuron', u'count': 30},
 {u'_id': u'Hodgkin-Huxley neuron', u'count': 29},
 {u'_id': u'Thalamus geniculate nucleus (lateral) principal neuron',
  u'count': 26},
 {u'_id': u'Abstract integrate-and-fire leaky neuron', u'count': 25},
 {u'_id': u'Dentate gyrus granule cell', u'count': 24},
 {u'_id': u'Cerebellum purkinje cell', u'count': 24},
 {u'_id': u'Neocortex spiking regular (RS) neuron', u'count': 22},
 {u'_id': u'Neostriatum spiny direct pathway neuron', u'count': 22},
 {u'_id': u'Neocortex spiking low threshold (LTS) neuron', u'count': 21},
 {u'_id': u'Neocortex layer 4 pyramidal cell', u'count': 20},
 {u'_id': u'Globus pallidus neu

Find the model titles (not paper titles) for models that (1) involve a Hippocampus CA3 pyramidal cell, and (2) have an associated reference where one of the authors is "Migliore M".

In [342]:
myquery = [
            
    
            {'$lookup': 
                {'from' : 'models',
                 'localField' : '_id',
                 'foreignField' : 'references',
                 'as' : 'matched'}
            },
    
            {'$unwind': '$matched'},
    
            
            {'$match': {'authors' : 'Migliore M', 'matched.celltypes' : 'Hippocampus CA3 pyramidal cell'}},
            
            {'$project': {'authors':1, 'matched.title':1}} 
        ]

for doc in (papers.aggregate(myquery)):
    pprint.pprint (doc)
            




{u'_id': 4307,
 u'authors': [u'Migliore M',
              u'Cook EP',
              u'Jaffe DB',
              u'Johnston D',
              u'Turner DA'],
 u'matched': {u'title': u'CA3 Pyramidal Neuron (Migliore et al 1995)'}}
{u'_id': 20013,
 u'authors': [u'Migliore M', u'Lazarewicz MT', u'Ascoli GA'],
 u'matched': {u'title': u'CA3 pyramidal neuron (Lazarewicz et al 2002)'}}
{u'_id': 105513,
 u'authors': [u'Migliore M',
              u'Jaffe DB',
              u'Ascoli GA',
              u'Hemond P',
              u'Boley A'],
 u'matched': {u'title': u'CA3 pyramidal neuron: firing properties (Hemond et al. 2008)'}}
{u'_id': 118099,
 u'authors': [u'Migliore M', u'Jaffe DB', u'Ascoli GA', u'Hemond P'],
 u'matched': {u'title': u'Ca3 pyramidal neuron: membrane response near rest (Hemond et al. 2009)'}}
{u'_id': 126815,
 u'authors': [u'Migliore M',
              u'Cherubini E',
              u'Sivakumaran S',
              u'Safiulina VF',
              u'Caiati MD',
              u'Bisson

Find all the authors who were on a paper associated with a model that involved a Hippocampus CA3 pyramidal cell. Sort them in alphabetical order; give this list and state its length.

In [43]:
myquery2 = [
            {'$lookup': 
                {'from' : 'papers',
                 'localField' : 'references',
                 'foreignField' : '_id',
                 'as' : 'matched2'}},
    
            {'$unwind': '$matched2'},
    
            {'$match': {'celltypes' : 'Hippocampus CA3 pyramidal cell'}},
    
            {'$project': {'matched2.authors':1}},
    
            {'$sort' : { 'matched2.authors' : 1} }
        ]

for doc in  (models.aggregate(myquery2)):
    pprint.pprint (doc)


{u'_id': 129067, u'matched2': {}}
{u'_id': 148035, u'matched2': {}}
{u'_id': 151282, u'matched2': {}}
{u'_id': 137505, u'matched2': {u'authors': u'Murray TA'}}
{u'_id': 87216,
 u'matched2': {u'authors': [u'Bi GQ',
                            u'Wang YT',
                            u'Gerkin RC',
                            u'Nauen DW',
                            u'Lau PM']}}
{u'_id': 146499, u'matched2': {u'authors': [u'Campbell SA', u'Nicola W']}}
{u'_id': 147867, u'matched2': {u'authors': [u'Chattarji S', u'Narayanan R']}}
{u'_id': 147756,
 u'matched2': {u'authors': [u'Chattarji S', u'Narayanan R', u'Narayan A']}}
{u'_id': 142104,
 u'matched2': {u'authors': [u'Ditto WL',
                            u'Zhou J',
                            u'Talathi SS',
                            u'Carney PR',
                            u'Stanley DA',
                            u'Parekh MB',
                            u'Cordiner DJ',
                            u'Mareci TH']}}
{u'_id': 168314,
 u'm

<h1>Modify the database</h1>

Rename the Hippocampus CA1 pyramidal cell to be the Hippocampus CA1 pyramidal neuron. (Note: here we're using CA1 instead of CA3.) Make sure that this is consistent across all documents in the models collection.

In [44]:
models.update_many({'celltypes': 'Hippocampus CA1 pyramidal cell'},
                  {'$set': {'celltypes.$': 'Hippocampus CA1 pyramidal neuron'}}
)

<pymongo.results.UpdateResult at 0x1064418c0>

Add a new entry (make up the data, but keep it appropriate) to the models collection. Associate it with two references, one that already exists and one that you also add to the papers collection.

In [52]:
models.insert_one({
                  u'_id': 555555,
                  u'title': 'NZT Responsive Neuron',
                  u'text': 'Neurons responsive to NZT from the movie limitless',
                  u'genes': '[]',
                  u'channels': [u'I L high threshold', u'I h', u'I K,Ca', u'I Calcium',u'I_HERG'],
                  u'modelconcepts': [u'Ion Channel Kinetics', u'Oscillations', u'Calcium dynamics'],
                 u'modeltype': [u'Neuron or other electrically excitable cell'],
                 u'receptors': [],
                 u'references': [100604,666666],
                 u'simenvironment': [u'C or C++ program', u'FORTRAN'],
              })




DuplicateKeyError: E11000 duplicate key error collection: hw1.models index: _id_ dup key: { : 555555 }

In [53]:
papers.insert_one(
{u'_id': 888888,
 u'authors': [u'Brar RS', u'Hines ML', u'Neild TO'],
 u'doi': u'10.1111/j.1549-8719.2001.tb00156.x',
 u'first_page': u'33',
 u'journal': u'Informatics',
 u'last_page': u'43',
 u'missing_references': u'all done',
 u'month': u'Feb',
 u'pubmed_id': u'19296859',
 u'references': [
                666666,
                4461,
                 4755,
                 4757,
                 4758,
                 4764,
                 4770,
                 4773,
                 4777,
                 4781,
                 4782,
                 4786,
                 4790,
                 4792,
                 4793,
                 4797,
                 4799,
                 4800,
                 4805,
                 4806,
                 4810,
                 4815,
                 4818,
                 4819,
                 4820,
                 4823,
                 4825,
                 17148,
                 17150,
                 17152,
                 17154,
                 86943],
 u'title': u'NZT',
 u'type': u'M',
 u'volume': u'8',
 u'year': u'2001'})

<pymongo.results.InsertOneResult at 0x105c34690>