<h1 align="center">Data mining the Allen Brain Atlases with Python</h1>
<h3 align="center">by [Alex Williams](http://alexhwilliams.info)</h3>

###Summary:

This is a short tutorial covering some basic tools for *programmatically* accessing the [Allen Brain Atlases](http://www.brain-map.org/). Doing this *programmatically* means you are accessing the data automatically with code, rather than with direct user interaction (i.e. going to the [ABA webportal](http://www.brain-map.org/), and clicking on each experiment of interest). This is obviously advantageous if you want to consider the expression pattern of many genes at once, rather than a handful. The power of large-scale neuroanatomical analyses is exemplified by this recent paper: [(Ramsden et al., 2015)](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004032).

**We will use modules from the [standard python library](https://docs.python.org/2/library/) for this tutorial. But also check out our notebooks for using [Tortilla](#) and [Flotilla](#) for your analysis.**

In [5]:
from __future__ import division
import urllib
import json
import numpy as np
import pylab as plt
import warnings
warnings.filterwarnings('ignore') # ignore annoying divide by zero warning
%matplotlib inline

##Accessing the [Connectivity Atlas](http://connectivity.brain-map.org/):

To see how this works, try going to the following link: 

This retrieves all experiments in which axonal projections were traced to the lateral entorhinal cortex (`ENTl`). We coded this with the option `[target_domain$eqENTl]` which specifies the target (post-synaptic) neuroanatomical structure (here, `$eq` means "equals"). The data is given to us in [json format](http://en.wikipedia.org/wiki/JSON), which we specified by typing in `query.json`.

The code below defines a function that pulls all connectivity experiments that resulting in a projection to a target area of our choice.

In [6]:
base_url = 'http://api.brain-map.org/api/v2/data/query.json?' + \
           'criteria=service::mouse_connectivity_injection_structure'

def targ_data(targ):
    s = 0    # index for starting row
    done = False
    data = []
    
    # The Allen Institute doesn't give us everything at once,
    # we need to keep asking for more "rows" (experimental datasets)
    # until there are none left, we do this by passing the
    # [start_row$eq(s)] option, and keep incrementing s until there
    # are no rows left.
    while not done:
        paged_url = base_url+'[target_domain$eq'+targ+'][start_row$eq'+str(s)+']'
        new_data = json.loads(urllib.urlopen(paged_url).read())
        if new_data['num_rows'] == 0:
            done = True
        else:
            data += new_data['msg']
            s += new_data['num_rows']
    return data

### As an example, lets look at the medial entorhinal cortex.

In [11]:
# Pull data for all projections to medial entorhinal cortex
raw_data = targ_data('ENTm')

print type(raw_data) # we are given a list of experiments

<type 'list'>


### Looking through the data
Let's look at the first experiment in the list we retrieved. Each experiment is a [python dictionary](https://docs.python.org/2/tutorial/datastructures.html#dictionaries). What's nice about python dictionaries is that they can more or less be read like json data format. Both are pretty easy to read by eye.

In [13]:
# let's look at the first experiment (element #0 in the list)
print type(raw_data[0]) # It's a dictionary!
print raw_data[0] 

<type 'dict'>
{u'num-voxels': 2984, u'name': u'Slc17a6-IRES-Cre-2850', u'structure-name': u'Postsubiculum', u'transgenic-line': u'Slc17a6-IRES-Cre', u'gender': u'F', u'injection-volume': u'0.165434', u'structure-abbrev': u'POST', u'strain': u'B6.129', u'injection-coordinates': [9500, 2900, 8000], u'injection-structures': [{u'abbreviation': u'VISp', u'color': u'08858C', u'id': 385, u'name': u'Primary visual area'}, {u'abbreviation': u'POST', u'color': u'48C83C', u'id': 1037, u'name': u'Postsubiculum'}, {u'abbreviation': u'PRE', u'color': u'59B947', u'id': 1084, u'name': u'Presubiculum'}], u'sum': u'0.68568', u'structure-color': u'48C83C', u'structure-id': 1037, u'id': 167654019}


###Cool, so let's parse this out a bit more.
One of the first things we'd be interested in is the structure that *projects to* the medial entorhinal cortex. (We know that ENTm received projections in all experiments, but from a different source in each experiment.) We can do this by accessing the `'structure-name'` field in the dictionary

In [15]:
print raw_data[0]['structure-name']

Postsubiculum


In [19]:
## Here are some other things that might be interesting
print raw_data[0]['gender']                 # gender of the mouse
print raw_data[0]['sum']                    # total fluorescence in ENTm
print raw_data[0]['injection-coordinates']  # exactly where they injected tracer

F
0.68568
[9500, 2900, 8000]
