# Accessing Allen Institute Database using API

### Goals
1. Understand what is Application Programming Interface (API).
2. Learn how the API package provided by Allen Institude retrieves data.
3. Write a program to retrieve a cell from Allen Institute database.

### Introduction
The [allensdk.api](https://allensdk.readthedocs.io/en/latest/allensdk.api.html#module-allensdk.api) package is designed to help retrieve data from the [Allen Brain Atlas API](http://help.brain-map.org/display/api/Allen+Brain+Atlas+API) contains methods to help formulate API queries and parse the returned results. There are several pre-made subclasses available that provide pre-made queries specific to certain data sets. We will use following subclasses in Allen SDK for this project:  
&emsp;__CellTypesApi:__ data related to the Allen Cell Types Database  
&emsp;__RmaApi:__ general-purpose HTTP interface to the Allen Institute API data model and services

__Some useful links to the Allen website:__  
&emsp;__[Install guide](https://allensdk.readthedocs.io/en/latest/install.html)__  
&emsp;__[Introduction to the API Access](https://allensdk.readthedocs.io/en/latest/data_api_client.html)__  
&emsp;__[Allen Brain Atlas API - Allen Cell Types Database](http://help.brain-map.org/display/celltypes/API)__  
&emsp;__[Example jupyter notebook - Introduction to the Cell Types Database](https://allensdk.readthedocs.io/en/latest/_static/examples/nb/cell_types.html)__  
&emsp;__[Source documentation of allensdk.api.queries.cell_types_api module](https://allensdk.readthedocs.io/en/latest/allensdk.api.queries.cell_types_api.html)__  

### Procedure
#### 1. Install the API package.  
If you have Anaconda installed, just enter ```pip install allensdk``` in an Anaconda Prompt or in a terminal.To uninstall, enter ```pip uninstall allensdk```.

#### 2. Download all cells
Import ```CellTypeApi``` class. If you want to download Cell Types Database data to a standard directory structure on your hard drive, use ```CellTypeCache``` class instead. See the example jupyter notebook given in the links.  
It may take a few seconds to download the data and return a list of all cells.

In [1]:
from allensdk.api.queries.cell_types_api import CellTypesApi

cta = CellTypesApi() # the CellTypesApi instance

cells = cta.list_cells_api() # Query the API for a list of all cells in the Cell Types Database.

In [15]:
len(cells)

2331

```specimen__id``` is a unique id for each cell recorded in the database. 

In [2]:
specimen_id_list = [cell['specimen__id'] for cell in cells] # store the specimen IDs in a list
print(specimen_id_list[:10]) # displace the first 10 IDs

idx = 0  # select an index for an ID from the list
cell_id = specimen_id_list[idx] # the ID of the first cell from the list
print(cell_id)

[525011903, 565871768, 469801138, 528642047, 605889373, 537256313, 485909730, 323865917, 583836069, 504615116]
525011903


#### 2. Get a single cell
Several ways to get a cell according to its ID.

In [3]:
cell = cells[idx]            # Get the selected cell from the list of all cells
cell = cta.get_cell(cell_id) # A method to retrieve a single cell from the database, returning the same cell as in the line above
cell

{'cell_reporter_status': None,
 'csl__normalized_depth': None,
 'csl__x': 273.0,
 'csl__y': 354.0,
 'csl__z': 216.0,
 'donor__age': '25 yrs',
 'donor__disease_state': 'epilepsy',
 'donor__id': 524848408,
 'donor__name': 'H16.03.003',
 'donor__race': 'White or Caucasian',
 'donor__sex': 'Male',
 'donor__species': 'Homo Sapiens',
 'donor__years_of_seizure_history': '9',
 'ef__adaptation': 0.0278459596639436,
 'ef__avg_firing_rate': 13.5725111407696,
 'ef__avg_isi': 73.6783333333333,
 'ef__f_i_curve_slope': 0.1671875,
 'ef__fast_trough_v_long_square': -53.8750038146973,
 'ef__peak_t_ramp': 4.10410666666667,
 'ef__ri': 159.531131386757,
 'ef__tau': 21.1810256736186,
 'ef__threshold_i_long_square': 90.0,
 'ef__upstroke_downstroke_ratio_long_square': 2.89546090494073,
 'ef__vrest': -70.56103515625,
 'ephys_inst_thresh_thumb_path': '/api/v2/well_known_file_download/529903142',
 'ephys_thumb_path': '/api/v2/well_known_file_download/529903140',
 'erwkf__id': 618211597,
 'line_name': '',
 'm__bi

Convert cell list into a pandas DataFrame and get the cell from it.

In [4]:
import pandas as pd
cells_df = pd.DataFrame(cells)
cell_df = cells_df[cells_df.specimen__id==cell_id] # get a cell according to its ID
cell_df

Unnamed: 0,cell_reporter_status,csl__normalized_depth,csl__x,csl__y,csl__z,donor__age,donor__disease_state,donor__id,donor__name,donor__race,...,specimen__id,specimen__name,structure__acronym,structure__id,structure__layer,structure__name,structure_parent__acronym,structure_parent__id,tag__apical,tag__dendrite_type
0,,,273.0,354.0,216.0,25 yrs,epilepsy,524848408,H16.03.003,White or Caucasian,...,525011903,H16.03.003.01.14.02,FroL,12113,3,"""frontal lobe""",FroL,12113,intact,spiny


#### 3. Get electrophysiology features
Download electrophysiology features for all cells. This may take a few seconds to download.

In [5]:
ephys_features = cta.get_ephys_features()

Convert it into pandas DataFrame and get the features for a cell according to its ID.

In [6]:
import pandas as pd
ef_df = pd.DataFrame(ephys_features)

cell_ef = ef_df[ef_df.specimen_id==cell_id]
cell_ef

Unnamed: 0,adaptation,avg_isi,electrode_0_pa,f_i_curve_slope,fast_trough_t_long_square,fast_trough_t_ramp,fast_trough_t_short_square,fast_trough_v_long_square,fast_trough_v_ramp,fast_trough_v_short_square,...,trough_t_ramp,trough_t_short_square,trough_v_long_square,trough_v_ramp,trough_v_short_square,upstroke_downstroke_ratio_long_square,upstroke_downstroke_ratio_ramp,upstroke_downstroke_ratio_short_square,vm_for_sag,vrest
1564,0.027846,73.678333,8.172499,0.167187,1.11848,4.105853,1.025173,-53.875004,-58.833337,-56.635418,...,4.134987,1.375253,-53.968754,-59.51042,-71.197919,2.895461,2.559876,3.099787,-88.843758,-70.561035


Another way to download the electrophysiology features for a single cell is by using ```rma_api``` which are the bases of ```cell_types_api```.

In [7]:
from allensdk.api.queries.rma_api import RmaApi

rma = RmaApi() # the RmaApi instance

data = rma.model_query(model='EphysFeature',criteria='[specimen_id$eq'+str(cell_id)+']')[0] # or use criteria='[specimen_id$eq525011903]'
data

{'adaptation': 0.0278459596639436,
 'avg_isi': 73.6783333333333,
 'electrode_0_pa': 8.17249913576124,
 'f_i_curve_slope': 0.1671875,
 'fast_trough_t_long_square': 1.11848,
 'fast_trough_t_ramp': 4.10585333333333,
 'fast_trough_t_short_square': 1.02517333333333,
 'fast_trough_v_long_square': -53.8750038146973,
 'fast_trough_v_ramp': -58.8333371480306,
 'fast_trough_v_short_square': -56.6354179382324,
 'has_burst': False,
 'has_delay': False,
 'has_pause': False,
 'id': 525097092,
 'input_resistance_mohm': 232.352528,
 'latency': 0.0417000000000001,
 'peak_t_long_square': 1.11668,
 'peak_t_ramp': 4.10410666666667,
 'peak_t_short_square': 1.02354,
 'peak_v_long_square': 42.1562538146973,
 'peak_v_ramp': 39.2187538146973,
 'peak_v_short_square': 41.0208358764648,
 'rheobase_sweep_id': 525031831,
 'rheobase_sweep_number': 40,
 'ri': 159.531131386757,
 'sag': 0.128440782427788,
 'seal_gohm': 1.125671168,
 'slow_trough_t_long_square': 1.14262,
 'slow_trough_t_ramp': 4.13498666666667,
 'slow_t

Following line use RmaApi to find the IDs of cells whose electrophysioloy recording somehow failed.

In [8]:
noEFcells = rma.model_query(model='EphysFeature',criteria='specimen(ephys_result[failed$eqtrue])',num_rows='all')

noEF_id = [cell['specimen_id'] for cell in noEFcells]
print(noEF_id) # display the IDs
noEFcell = rma.model_query(model='EphysFeature',criteria='[specimen_id$eq'+str(noEF_id[0])+']')[0]
noEFcell # display an example of the electrophysiology features from a failed recording.

[478307928, 488222410, 474980317]


{'adaptation': 0.00740522549424404,
 'avg_isi': 36.1484,
 'electrode_0_pa': 1.00812489866459,
 'f_i_curve_slope': 0.275,
 'fast_trough_t_long_square': 0.0190350000000001,
 'fast_trough_t_ramp': None,
 'fast_trough_t_short_square': 0.00507999999999997,
 'fast_trough_v_long_square': -58.5625038146973,
 'fast_trough_v_ramp': None,
 'fast_trough_v_short_square': -59.1250038146973,
 'has_burst': False,
 'has_delay': False,
 'has_pause': True,
 'id': 478317914,
 'input_resistance_mohm': 262.444656,
 'latency': 2.40499999999999,
 'peak_t_long_square': 0.0176400000000001,
 'peak_t_ramp': None,
 'peak_t_short_square': 0.00324999999999998,
 'peak_v_long_square': 25.5625,
 'peak_v_ramp': None,
 'peak_v_short_square': 26.0312519073486,
 'rheobase_sweep_id': 478315765,
 'rheobase_sweep_number': 35,
 'ri': 363.05,
 'sag': 0.0857734918509824,
 'seal_gohm': 1.764365056,
 'slow_trough_t_long_square': 0.0226600000000001,
 'slow_trough_t_ramp': None,
 'slow_trough_t_short_square': 0.00826000000000016,
 '

#### 4. Get cells according to species
Check the key ```donor__species``` for all cells in the data. We can see that there are two species that the cells are collected from, human and mouse.

In [9]:
print(set([cell['donor__species'] for cell in cells]))

{'Mus musculus', 'Homo Sapiens'}


We can use the following method ```filter_cells_api``` to get a desired subset from the database. Keyword argument ```require_morphology``` should be ```Ture``` if you want to filter out cells that have no morphological images.Similarly keyword argument ```require_reconstruction``` is for filtering out cells with no morphological reconstruction. Keyword argument ```species``` specify the filter for cells that belong to one or more species.

In [10]:
mousecells = cta.filter_cells_api(cells,require_morphology=True,require_reconstruction=True,species=[CellTypesApi.MOUSE]) # mouse cells with morphological images
print(len(mousecells))
humancells = cta.filter_cells_api(cells,require_morphology=False,require_reconstruction=False,species=[CellTypesApi.HUMAN]) # all human cells
print(len(humancells))

mousecells[0] # display the first cell in the filtered list of mouse cells for example

485
411


{'reporter_status': 'positive',
 'cell_soma_location': [8881.0, 953.839501299405, 7768.22695782726],
 'species': 'Mus musculus',
 'id': 485909730,
 'name': 'Cux2-CreERT2;Ai14-205530.03.02.01',
 'structure_layer_name': '5',
 'structure_area_id': 385,
 'structure_area_abbrev': 'VISp',
 'transgenic_line': 'Cux2-CreERT2',
 'dendrite_type': 'spiny',
 'apical': 'intact',
 'reconstruction_type': 'dendrite-only',
 'disease_state': '',
 'donor_id': 485250100,
 'structure_hemisphere': 'right',
 'normalized_depth': 0.478343598387418}

In the returned cell, the key ```id``` denotes the specimen ID mentioned above. Using ```id``` value will help you find a particular cell from the filtered list.

In [11]:
ID = mousecells[0]['id'] # the specimen id of the example mouse cell
ID

485909730

#### 5. For a single cell save its electrophysology recordings as an NWB file and its morphological reconstruction as a SWC file in current directory

In [12]:
cta.save_ephys_data(specimen_id=cell_id,file_name='ephys_'+str(cell_id)+'.nwb')

2020-04-17 22:02:23,753 allensdk.api.api.retrieve_file_over_http INFO     Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/618211597


In [13]:
cta.save_reconstruction(specimen_id=ID,file_name='reconstruction_'+str(ID)+'.swc')

2020-04-17 22:02:51,128 allensdk.api.api.retrieve_file_over_http INFO     Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/500961530


### 6. Task: Retrieve a cell
Retrieve a __human__ cell with both __electrophysiology__ recordings and __morphological__ reconstruction. Then save them as NWB file and SWC file. Display the cell attributes in jupyter notebook. And try to save its electrophysiology features from a pandas DataFrame to a csv file using pandas method ```your_dataframe.to_csv(path_or_buf='your_file_name.csv')```.