## 1. In Situ Hybridization (ISH) Data Portal

The main web portal to access gene expression data, injection and target structures, experiments, expression summaries, expression visualisations (through online or offline version of 3D BrainExplorer tool), etc is https://mouse.brain-map.org/

[This page](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data) provides explanation of different functions available with the search. It covers [the syntax for search queries](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data#InSituHybridization(ISH)Data-BooleanSyntaxQuery), [starting search from brain structures (Differential Search)](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data#InSituHybridization(ISH)Data-DifferentialSearch) to find what genes they express, comparison to [human microarray datasets (Human Differential Search)](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data#InSituHybridization(ISH)Data-HumanDifferentialSearch).

Exploring genes with similar expression patterns to those queried is possible with [the Corrlative Search](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data#InSituHybridization(ISH)Data-CorrelativeSearch). Once you click on an experiment, a panel to the right appears, which gives access to it.

Details on [experimental detail](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data#InSituHybridization(ISH)Data-ExperimentalDetail) and [image viewer](http://help.brain-map.org/display/mousebrain/In+Situ+Hybridization+%28ISH%29+Data#InSituHybridization(ISH)Data-ExperimentalDetail) are also available.

## 2. Accessing the data through the API

### 2.1 Overview of experiments and reference spaces

The overview of ISH data available through API is given here: http://help.brain-map.org/display/mousebrain/API

### 2.2. RESTful Model Access (RMA)

Gene expression, along with many other data types, are provided through [RMA queries](http://help.brain-map.org/pages/viewpage.action?pageId=5308449). Output provided as _JSON_, _XML_ or _CSV_, it can be parsed accordingly to the format. In essence, RMA queries are URL addresses that can be simply pasted into a browser.

For example, looking up metadata on a particular gene:

http://api.brain-map.org/api/v2/data/query.xml?include=model::Gene[id$eq15]

Other examples of queries can be found [here](http://help.brain-map.org/display/api/Example+Queries+for+Experiment+Metadata).

### 2.3. Accessing RMA through Web App

A very convenient way to contruct and test RMA queries (and recommended to understand how they work) is the web [RMA Query Builder Utility](http://api.brain-map.org/examples/rma_builder/rma_builder.html).

To use it:
- select output format
- add "Model" stage
- enter desired parameters
- press "Build Query"

Key parameter to choose is "Model", corresponding to the type of data/metadata/information queried (there are many). Options relevant to this project are "[SectionDataSet](http://api.brain-map.org/api/v2/data/query.xml?criteria=model::SectionDataSet,rma::criteria,products[abbreviation$eqMouse],genes[acronym$eqDrd1],rma::include,structure_unionizes)" (list of experiments for a gene + expression data in unionized format for each experiment) and "[StructureLookup](http://api.brain-map.org/api/v2/data/query.xml?criteria=model::StructureLookup,rma::criteria,structure[id$eq15566],rma::include,structure,rma::options[only$eq%27structure_lookups.structure_id_path,structure_lookups.termtype%27])" (retrieves metadata of brain structures, hierarchical relationships).

Then, there are "criteria" for selection of data. For example, to look up a particular structure, one will want to specify its id. This is done by selecting the category of criteria from drop down list and pressing "[]" to select criterion type (e.g. id) and what it should be equal to (or >, <, etc). Pressing "," allows to add more criteria.

In the "include" option, the overall kind of data to be queried is specificed. In "only" and "except" options, desired data fields to be included in _JSON/XML/CSV_ are further specified.

Hierarchical relationships between "Model" classes in RMA API are available [here](http://api.brain-map.org/class_hierarchy).

### 2.4. Accessing RMA through Python

A short guide to working with RMA API in Python is shown [here](https://alleninstitute.github.io/AllenSDK/data_api_client.html). First step after the [installation](https://allensdk.readthedocs.io/en/latest/install.html) is importing _RmaApi_:

In [1]:
from allensdk.api.queries.rma_api import RmaApi
import pandas as pd
import numpy as np

Using the [_model_query_ method](https://alleninstitute.github.io/AllenSDK/allensdk.api.queries.rma_api.html#allensdk.api.queries.rma_api.RmaApi.model_query) from _RmaApi_ and specifying the parameters, it is possible to extract list of experiments for a gene and display it as _Pandas_ data frame:

In [2]:
rma = RmaApi()

gene = "Drd1"
        
data = rma.model_query('SectionDataSet', criteria="products[abbreviation$eq'Mouse'],genes[acronym$eq'"+gene+"']",
                      include="structure_unionizes")

data_df = pd.DataFrame(data)

data_df.head()

Unnamed: 0,blue_channel,delegate,expression,failed,failed_facet,green_channel,id,name,plane_of_section_id,qc_date,red_channel,reference_space_id,rnaseq_design_id,section_thickness,specimen_id,sphinx_id,storage_directory,weight,structure_unionizes
0,,True,True,False,734881840,,71307280,,2,2009-05-02T22:56:37Z,,10,,25.0,70429761,150678,/external/aibssan/production32/prod334/image_s...,5470,"[{'expression_density': 0.0159272, 'expression..."
1,,False,False,False,734881840,,352,,1,,,9,,25.0,702565,78451,/external/mouse/prod1/image_series_352/,5470,"[{'expression_density': 0.0136562, 'expression..."
2,,False,False,False,734881840,,353,,2,,,10,,25.0,702529,76510,/external/mouse/prod1/image_series_353/,5470,"[{'expression_density': 0.00817143, 'expressio..."
3,,False,False,False,734881840,,354,,2,,,10,,25.0,702473,95220,/external/mouse/prod0/image_series_354/,5270,"[{'expression_density': 7.11597e-05, 'expressi..."


### 2.5. Unionized data format

In the data frame above, experiment ids are in "id" column and the unionized data is in the "structure_unionizes". The content of "structure_unionizes" column are lists with dictionaries, which themselves can be turned into data frames:

In [3]:
experiment_id = 353

exp_union_data = pd.DataFrame(data_df[data_df['id']==experiment_id]['structure_unionizes'].item())

exp_union_data.head()

Unnamed: 0,expression_density,expression_energy,id,section_data_set_id,structure_id,sum_expressing_pixel_intensity,sum_expressing_pixels,sum_pixel_intensity,sum_pixels,voxel_energy_cv,voxel_energy_mean
0,0.008171,1.09712,398484594,353,15564,1247590000.0,9292150.0,25201000000.0,1137150000.0,2.65761,1.09601
1,0.008171,1.09712,398484597,353,15565,1247590000.0,9292150.0,25201000000.0,1137150000.0,2.65761,1.09601
2,0.010952,1.48177,398484604,353,15566,1148940000.0,8491790.0,16781100000.0,775383000.0,2.32817,1.48118
3,0.012336,1.67205,398484612,353,15567,1135570000.0,8378020.0,14748000000.0,679149000.0,2.18009,1.67129
4,0.000695,0.087989,398484614,353,15568,2630740.0,20770.7,466279000.0,29898500.0,2.34418,0.087989


As explained [here](http://help.brain-map.org/display/mousebrain/API#API-Expression3DGridsExpressionGridding), expression density, intensity and energy are interconnected in the following way:

In [35]:
single_structure_df = exp_union_data[exp_union_data['structure_id']==15564]

expression_density = single_structure_df['expression_density'].item()
expression_energy = single_structure_df['expression_energy'].item()

sum_expressing_pixel_intensity = single_structure_df['sum_expressing_pixel_intensity'].item()
sum_pixel_intensity = single_structure_df['sum_pixel_intensity'].item()

sum_expressing_pixels = single_structure_df['sum_expressing_pixels'].item()
sum_pixels = single_structure_df['sum_pixels'].item()

expression_intensity = sum_expressing_pixel_intensity / sum_expressing_pixels

print(expression_intensity * expression_density)
print(expression_energy)

1.0971190040733307
1.09712


One can easily obtain the data above (expression density/energy and manually calculate intensity) for a particular brain structure. Here is an example of using RMA query to look up the parent of the structure with id = 15568 and retrieving its expression density:

In [6]:
# Function to make the RMA query

def query_id_path(s_id):
    query = rma.model_query('StructureLookup', criteria="structure[id$eq"+str(s_id)+"]",include="structure",
        options="[only$eq'structure_lookups.termtype,structure_lookups.structure_id_path']")[0]
    return query

In [13]:
query = query_id_path(15568)

print("Query contents:")
print(query)

Query contents:
{'id': 4259, 'ontology_id': 12, 'structure_id': 15568, 'term': 'RSP', 'termtype': 'a', 'structure': {'acronym': 'RSP', 'atlas_id': None, 'color_hex_triplet': 'A84D10', 'depth': 4, 'failed': False, 'failed_facet': 734881840, 'graph_id': 17, 'graph_order': 4, 'hemisphere_id': 3, 'id': 15568, 'name': 'rostral secondary prosencephalon', 'neuro_name_structure_id': None, 'neuro_name_structure_id_path': None, 'ontology_id': 12, 'parent_structure_id': 15567, 'safe_name': 'rostral secondary prosencephalon', 'sphinx_id': 9921, 'st_level': 3, 'structure_id_path': '/15564/15565/15566/15567/15568/', 'structure_name_facet': 2675393843, 'weight': 8390}}


Its id path is "/15564/15565/15566/15567/15568/". It specifies the hierarchical sequence of structures, starting from its parent (15567) and above. These paths can be different depending on the structure set adopted.

In [20]:
print("expression density =", exp_union_data[exp_union_data['structure_id']==15567]['expression_density'].item())

expression density = 0.0123361


## 3. Understanding brain structure divisions and hierarchical sets

## 4. Data mining methodology

### 4.1. Choosing the hierarchical structure set

### 4.2. Choosing V2m strcutures

### 4.3. Selecting injection experiments