In this repository, you will find JSON representing a <i>subset</i> of the data for the <a href="http://modeldb.yale.edu">ModelDB</a> repository of computational neuroscience models.

<h1>Getting started</h1>

Begin by cloning this repository. Create a private repository on github and push your local copy to there.<br/><br/>Connect to MongoDB and create a database for this assignment.

In [43]:
from pymongo import MongoClient
mongodb = MongoClient()

mongodb.drop_database('hw1')

#creating database
hw1 = mongodb.hw1

Using the <tt>json</tt> module and Python file operations, load the data from <tt>modelcollection.json</tt> and <tt>papercollection.json</tt> into Python.

In [44]:
# opening json dictionaries
import json
with open ('modelcollection.json') as f:
    models_dict = json.load(f)
with open ('papercollection.json') as g:
    papers_dict = json.load(g)

Put the loaded data into two collections in your database. I recommend calling them <tt>models</tt> and <tt>papers</tt>.

In [50]:
hw1.models.count()

1114

In [55]:
hw1.papers.count()

1211

In [63]:
pprint(models_dict[0])

{u'_id': 100603,
 u'brainregions': [],
 u'celltypes': [u'Substantia nigra pars compacta dopaminergic cell'],
 u'channels': [u'I L high threshold',
               u'I h',
               u'I K,Ca',
               u'I Calcium',
               u'I_HERG'],
 u'genes': [],
 u'modelconcepts': [u'Ion Channel Kinetics',
                    u'Oscillations',
                    u'Calcium dynamics'],
 u'modeltype': [u'Neuron or other electrically excitable cell'],
 u'receptors': [],
 u'references': [100604],
 u'simenvironment': [u'C or C++ program', u'FORTRAN'],
 u'text': u'"Blocking the small-conductance (SK) calcium-activated potassium channel promotes burst firing in dopamine neurons both in vivo and in vitro. ... We focus on the underlying plateau potential oscillation generated in the presence of both apamin and TTX, so that action potentials are not considered.   We find that although the plateau potentials are mediated by a voltage-gated Ca2+ current, they do not depend on the accumulation o

In [296]:
pprint(papers_dict[0])

{u'_id': 3860,
 u'authors': [u'Crane GJ', u'Hines ML', u'Neild TO'],
 u'doi': u'10.1111/j.1549-8719.2001.tb00156.x',
 u'first_page': u'33',
 u'journal': u'Microcirculation',
 u'last_page': u'43',
 u'missing_references': u'all done',
 u'month': u'Feb',
 u'pubmed_id': u'11296851',
 u'references': [4461,
                 4755,
                 4757,
                 4758,
                 4764,
                 4770,
                 4773,
                 4777,
                 4781,
                 4782,
                 4786,
                 4790,
                 4792,
                 4793,
                 4797,
                 4799,
                 4800,
                 4805,
                 4806,
                 4810,
                 4815,
                 4818,
                 4819,
                 4820,
                 4823,
                 4825,
                 17148,
                 17150,
                 17152,
                 17154,
                 86943],
 

In [49]:
models = hw1.models
models.insert_many(models_dict)

<pymongo.results.InsertManyResult object at 0x106f31370>

In [54]:
papers = hw1.papers
papers.insert_many(papers_dict)

<pymongo.results.InsertManyResult object at 0x1046b07d0>

<h1>Explore the database</h1>

Use MongoDB to answer the following questions. Run your code in the spaces provided.

<b>Q: How many models are there?</b>

In [58]:
models.count()

1114

<b>What are the field names (keys) for the model entry with <tt>_id</tt> = 87284?</b>

In [71]:
from pprint import pprint
for doc in models.find({'_id':87284}):
    a=doc
print "The keys are:"
pprint(a.keys())


The keys are:
[u'transmitters',
 u'title',
 u'text',
 u'genes',
 u'simenvironment',
 u'celltypes',
 u'brainregions',
 u'channels',
 u'references',
 u'modeltype',
 u'receptors',
 u'_id',
 u'modelconcepts']


Note: this data is not completely denormalized: references in both collections are given in terms of the <tt>_id</tt> field of the paper collection.<br/><br/><b>How many distinct cell types are in the models collection?</b>

In [72]:
ct = models.find().distinct('celltypes')
count_ct = len(ct)
print count_ct

188


<b>Find the list of model ids for models that contain a Hippocampus CA3 pyramidal cell.</b>

In [85]:
for doc in models.find({'celltypes':'Hippocampus CA3 pyramidal cell'}).sort([('_id',1)]):
    pprint (doc['_id'])

3263
7907
20007
35358
84606
87216
87762
98003
101629
114337
118098
120907
126814
129067
135902
135903
137259
137505
138421
139421
142104
143148
146499
147756
147867
148035
150288
151282
168314
168874
181967
184139
185512
186768
189088


<b>What other cells appear in models with a Hippocampus CA3 pyramidal cell? Sort them in alphabetical order. How many such cells are there?</b>

In [180]:
hcpccell = models.find({'celltypes':'Hippocampus CA3 pyramidal cell'})
hcpccelld = hcpccell.distinct('celltypes')
hcpccelld.sort()
pprint (hcpccelld)
    

        
        

[u'Abstract Izhikevich neuron',
 u'Abstract integrate-and-fire adaptive exponential (AdEx) neuron',
 u'Cerebellum purkinje cell',
 u'Dentate gyrus basket cell',
 u'Dentate gyrus granule cell',
 u'Dentate gyrus hilar cell',
 u'Dentate gyrus mossy cell',
 u'Entorhinal cortex stellate cell',
 u'Hippocampus CA1 axo-axonic cell',
 u'Hippocampus CA1 basket cell',
 u'Hippocampus CA1 bistratified cell',
 u'Hippocampus CA1 interneuron oriens alveus',
 u'Hippocampus CA1 pyramidal cell',
 u'Hippocampus CA3 axo-axonic cells',
 u'Hippocampus CA3 basket cell',
 u'Hippocampus CA3 pyramidal cell',
 u'Hippocampus CA3 stratum oriens lacunosum-moleculare interneuron',
 u'Hippocampus septum medial GABAergic neuron',
 u'Hodgkin-Huxley neuron',
 u'Neocortex fast spiking (FS) interneuron',
 u'Neocortex layer 2-3 pyramidal cell',
 u'Neocortex layer 4 pyramidal cell',
 u'Neocortex layer 5-6 pyramidal cell',
 u'Neocortex spiking regular (RS) neuron',
 u'Neocortical pyramidal cortical-thalamic cell',
 u'Pinsky-R

<h1>Use aggregation</h1>

How many models are there for each cell type? Display the results in a formatted table, sorted from most commonly appearing cell type to least commonly appearing.

In [148]:
mct = list(models.aggregate(
   [
      {'$unwind' : '$celltypes' },
      {'$group' : {'_id' : '$celltypes','counts': { '$sum': 1 }}},
      {'$sort' : {'counts':-1}}
   ]    
))
    
for item in mct:
    print '{:3d} {:55s}'.format(item['counts'], item['_id'])

108 Neocortex layer 5-6 pyramidal cell                     
104 Hippocampus CA1 pyramidal cell                         
 60 Neocortex layer 2-3 pyramidal cell                     
 35 Hippocampus CA3 pyramidal cell                         
 30 Olfactory bulb main mitral cell                        
 30 Neocortex fast spiking (FS) interneuron                
 29 Hodgkin-Huxley neuron                                  
 26 Thalamus geniculate nucleus (lateral) principal neuron 
 25 Abstract integrate-and-fire leaky neuron               
 24 Dentate gyrus granule cell                             
 24 Cerebellum purkinje cell                               
 22 Neocortex spiking regular (RS) neuron                  
 22 Neostriatum spiny direct pathway neuron                
 21 Neocortex spiking low threshold (LTS) neuron           
 20 Neocortex layer 4 pyramidal cell                       
 19 Globus pallidus neuron                                 
 19 Olfactory bulb main interneuron gran

Find the model titles (not paper titles) for models that (1) involve a Hippocampus CA3 pyramidal cell, and (2) have an associated reference where one of the authors is "Migliore M".

In [199]:
for model in models.aggregate([
        {
            '$match':
            {
                'celltypes': 'Hippocampus CA3 pyramidal cell'
            }
        },
        {
            '$lookup':
            {
                'from': 'papers',
                'localField': 'references',
                'foreignField': '_id',
                'as': 'paperinfo'
            }
        },
         {
            '$match':
            {
                'paperinfo.authors': 'Migliore M'
            }
        },
         {
            '$project':
            {
                '_id': False,
                'title': True,
            }
        }
    ]):
    pprint(model['title'])

u'CA3 pyramidal neuron: firing properties (Hemond et al. 2008)'
u'Ca3 pyramidal neuron: membrane response near rest (Hemond et al. 2009)'
u'CA3 pyramidal neuron (Safiulina et al. 2010)'
u'A model of unitary responses from A/C and PP synapses in CA3 pyramidal cells (Baker et al. 2010)'
u'CA3 pyramidal neuron (Lazarewicz et al 2002)'
u'CA3 Pyramidal Neuron (Migliore et al 1995)'


Find all the authors who were on a paper associated with a model that involved a Hippocampus CA3 pyramidal cell. Sort them in alphabetical order; give this list and state its length.

In [280]:
aphc = list(models.aggregate([
        {'$match':{'celltypes': 'Hippocampus CA3 pyramidal cell'}},
        {'$lookup':{
                'from': 'papers',
                'localField': 'references',
                'foreignField': '_id',
                'as': 'paperinfo'
            }},
        {'$unwind': '$paperinfo'},
        {'$unwind': '$paperinfo.authors'},
        {'$project':{
              '_id': False,
                'paperinfo.authors': True,
            }},
        {"$group" : {"_id" : "$paperinfo.authors"}},
        {'$sort' : {'_id':1}},

    ]))

for item in aphc:
    print '{:50s}'.format(item['_id'])

print len(aphc)  

Ascoli GA                                         
Atherton LA                                       
Avermann M                                        
Bae JY                                            
Bae YC                                            
Baker JL                                          
Barreto E                                         
Barrionuevo G                                     
Bi GQ                                             
Bisson G                                          
Boley A                                           
Borgers C                                         
Caiati MD                                         
Campbell SA                                       
Carney PR                                         
Chattarji S                                       
Cheng JT                                          
Cherubini E                                       
Clancy CE                                         
Contreras D                    

<h1>Modify the database</h1>

Rename the Hippocampus CA1 pyramidal cell to be the Hippocampus CA1 pyramidal neuron. (Note: here we're using CA1 instead of CA3.) Make sure that this is consistent across all documents in the models collection.

In [291]:
pprint (models.find_one({'celltypes':'Hippocampus CA1 pyramidal cell'}))


None


In [290]:
models.update_many({'celltypes': 'Hippocampus CA1 pyramidal cell'},
                  {'$set': {'celltypes.$': 'Hippocampus CA1 pyramidal neuron'}})

<pymongo.results.UpdateResult object at 0x105f24370>

In [292]:
pprint (models.find_one({'celltypes':'Hippocampus CA1 pyramidal neuron'}))

{u'_id': 106551,
 u'brainregions': [],
 u'celltypes': [u'Hippocampus CA1 pyramidal neuron'],
 u'channels': [u'I Na,t', u'I A', u'I K', u'I h'],
 u'genes': [],
 u'modelconcepts': [u'Active Dendrites',
                    u'Detailed Neuronal Models',
                    u'Extracellular Fields'],
 u'modeltype': [u'Neuron or other electrically excitable cell'],
 u'receptors': [u'AMPA'],
 u'references': [107765, 112925],
 u'simenvironment': [u'NEURON'],
 u'text': u'NEURON mod files from the paper: Cassar\xe0 AM, Hagberg GE, Bianciardi M, Migliore M, Maraviglia B. Realistic simulations of neuronal activity: A contribution to the debate on direct detection of neuronal currents by MRI. Neuroimage. 39:87-106 (2008).  In this paper, we use a detailed calculation of the magnetic field produced by the neuronal  currents propagating over a hippocampal CA1 pyramidal neuron placed inside a cubic MR voxel of  length 1.2 mm to estimate the Magnetic Resonance signal.',
 u'title': u'CA1 pyramidal neuron:

Add a new entry (make up the data, but keep it appropriate) to the models collection. Associate it with two references, one that already exists and one that you also add to the papers collection.

In [294]:
models.insert_one({
 u'_id': 10655191,
 u'brainregions': [],
 u'celltypes': [u'Al Powers insula cell'],
 u'channels': [u'I Na,t', u'I K,Ca', u'I Cl', u'I h'],
 u'genes': [],
 u'modelconcepts': [u'Active Dendrites',
                    u'Detailed Neuronal Models',
                    u'Extracellular Fields'],
 u'modeltype': [u'Neuron or other electrically excitable cell'],
 u'receptors': [u'NMDA'],
 u'references': [107765, 999199],
 u'simenvironment': [u'NEURON'],
 u'text': u' Using a combination of fMRI and insula biospies of people with schizophrenia in comarison to healthy controls, a specific cell type that predicts risk of schizophrenia has been identified. See paper for more.',
 u'title': u'Al Powers insula cell: risk of schizophrenia (Powers et al. 2008)',
 u'transmitters': [u'Glutamate']})

<pymongo.results.InsertOneResult object at 0x105f24050>

In [295]:
pprint (papers.find_one({'_id':'999199'}))

None


In [297]:
papers.insert_one ({
 u'_id': 999199,
 u'authors': [u'Powers AR', u'Powers EM', u'{Powers AA}', u'{Powers MR}'],
 u'doi': u'10.1111/j.1549-8719.2001.tb00156.x',
 u'first_page': u'33',
 u'journal': u'Schizophrenia Bulletin',
 u'last_page': u'100',
 u'missing_references': u'all done',
 u'month': u'Mar',
 u'pubmed_id': u'112968433',
 u'references': [4819,
                 4820,
                 4823,
                 4825,
                 17148,
                 17150,
                 17152,
                 17154,
                 86943],
 u'title': u'Predicting Schizophrenia: the golden insula neuron.',
 u'type': u'M',
 u'volume': u'8',
 u'year': u'2016'
})                   

<pymongo.results.InsertOneResult object at 0x105f24230>