# Image stack packager

This tool packages data. The data a stack of images off of a brightfield microscope which reside in an Allen Institute repository, the Cell Types Database. This tool downloads a stack of images and generates a JSON manifest of the files. All output artifacts are files in a single directory.

This notebook has been tested to work on Google's Colab and vanilla JupyterLab.


## Setup

The Allen Institute maintains the Allen SDK, `allensdk`, for accessing their data products. Colab has tons of Python packages pre-installed but `allensdk` is not one of them. For more details about working with `allensdk` on Colab, see Reconstrue's [AllenSDK on Colab](http://reconstrue.com/data_sources/allen_institute/allensdk_on_colab.html). For now, simply install the SDK.

In [0]:
!pip3 --quiet install allensdk 

In [0]:
import json
import pandas

As of late 2019, Colab "has a Data Table extension that allows interactive exploration of pandas dataframes with filtering and sorting." [[*](https://colab.research.google.com/notebooks/data_table.ipynb)]


In [0]:
%load_ext google.colab.data_table


## Accessing data

The images can be accessed through a web UI and/or programatically in Python.

A nice feature of the Allen Institute's set-up is that they do not require *any* auth to get to the public data.

### brain-map.org web UI

[brain-map.org](http://brain-map.org) is where the target images reside. The repository has a web UI, wherein the image stack can be viewed. Here is an example from their documentation [[*](http://help.brain-map.org/display/celltypes/Physiology+and+Morphology)]:

>displays two orthogonal projections of the biocytin filled neuron and the neuron's 3D morphology reconstruction. From this page, you can also view the stack of high resolution images used for the reconstruction.

![](http://help.brain-map.org/download/attachments/8323624/MorphBrowse.PNG?version=1&modificationDate=1476664307214&api=v2)

So, we can explore the web UI to preview what the images look like but we want to download them via Python code:

> You can also access the data programatically and obtain sample code to run your own model simulations. For more details go to the Download page. 



### RESTful RMA

The second way to access the data is through the RESTful "RMA" interface.


### Allen SDK

The Allen Institute first came up with a RESTful interface to their resources, called [RMA](http://help.brain-map.org/pages/viewpage.action?pageId=5308449). RMA is a [HATEOAS](https://restfulapi.net/hateoas/) style RESTful API. Later they added the Python SDK as client-side convenience wrapper code around the RMA.

The `allensdk` is Python code which provides a programmatic interface to the info available via RMI. It also maintains a cache of files for performance purposes (`allensdk.core.cell_types_cache.CellTypesCache`).

Although `allensdk` can provide metadata about cells in the repository, it does not have methods to acquire the raw image stack. To get the raw images, RMI is the only method. So, `allensdk` can provide IDs of available cells, but further work is required to then iterate through the stack and grab each file.



## Exploring RMA

Their documentation includes [example URLs for fetching data](http://help.brain-map.org/display/celltypes/API#API-morphology_image_download). Here are some of those exercised in a Jupyter context.

Seemingly data can be requested in multiple formats: XML, JSON, and CSV.


### As XML

In [0]:
xml_request_url = "http://api.brain-map.org/api/v2/data/query.xml?criteria=model::ProjectionImage,rma::criteria,[specimen_id$eq313862022]"
xml_file_name = "response.xml"
!wget -O {xml_file_name} {xml_request_url}

--2020-03-26 20:05:48--  http://api.brain-map.org/api/v2/data/query.xml?criteria=model::ProjectionImage,rma::criteria,[specimen_id]
Resolving api.brain-map.org (api.brain-map.org)... 63.237.233.29
Connecting to api.brain-map.org (api.brain-map.org)|63.237.233.29|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/xml]
Saving to: ‘response.xml’

response.xml            [ <=>                ]  47.38K   278KB/s    in 0.2s    

2020-03-26 20:05:48 (278 KB/s) - ‘response.xml’ saved [48518]



In [0]:
!cat {xml_file_name}

That's nice: for each image stack, they provide both MaximumIntensityProjection and MinimumIntensityProjection from both the frontal view plane (xy) and one of the two side views (yz plane).

### As JSON

This is exactly the same as the above XML response, except in the requested URL `/query.xml?` is changed to `/query.json?` 

In [12]:
json_query_url = "http://api.brain-map.org/api/v2/data/query.json?criteria=model::ProjectionImage,rma::criteria,[specimen_id$eq313862022]"
json_file_name = "/content/response.json"

!wget -O {json_file_name} {json_query_url}

--2020-03-27 10:32:51--  http://api.brain-map.org/api/v2/data/query.json?criteria=model::ProjectionImage,rma::criteria,[specimen_id]
Resolving api.brain-map.org (api.brain-map.org)... 63.237.233.29
Connecting to api.brain-map.org (api.brain-map.org)|63.237.233.29|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘/content/response.json’

/content/response.j     [<=>                 ]       0  --.-KB/s               /content/response.j     [ <=>                ]  19.47K  88.1KB/s               /content/response.j     [  <=>               ]  27.96K   126KB/s    in 0.2s    

2020-03-27 10:32:52 (126 KB/s) - ‘/content/response.json’ saved [28633]



In [13]:
with open(json_file_name) as f:
  eg_data = json.load(f)

print(json.dumps(eg_data, indent=2))

{
  "success": true,
  "id": 0,
  "start_row": 0,
  "num_rows": 50,
  "total_rows": 2686,
  "msg": [
    {
      "annotated": false,
      "axes": "xy",
      "bits_per_component": 8,
      "data_set_id": null,
      "expression": null,
      "expression_path": null,
      "failed": false,
      "height": 7592,
      "id": 520209042,
      "image_height": 7592,
      "image_type": "MinimumIntensityProjection - xy",
      "image_width": 7596,
      "isi_experiment_id": null,
      "lims1_id": null,
      "number_of_components": 1,
      "ophys_experiment_id": null,
      "path": "/external/mousecelltypes/prod765/specimen_517330781/min_xy_517330781.aff",
      "projection_function": "min",
      "resolution": 0.1144,
      "section_number": 0,
      "specimen_id": 517330781,
      "structure_id": null,
      "tier_count": 6,
      "width": 7596,
      "x": 0,
      "y": 0
    },
    {
      "annotated": false,
      "axes": "yz",
      "bits_per_component": 8,
      "data_set_id": null,


### To Pandas

The JSON is shaped ala:
```json
{
  "success": true,
  "id": 0,
  "start_row": 0,
  "num_rows": 50,
  "total_rows": 2686,
  "msg": [
    {
```
There is some pagination going on via `start_row`, `num_rows`, and `total_rows`.

The `msg` array is what we want to feed to Pandas. Here's a hacky, lazy way to perform that task via `pandas.read_json`.



In [0]:
rows_json = eg_data["msg"]

# Write to FS
processed_json_file_name = "/content/query_trimmed.json"
with open(processed_json_file_name, 'w') as json_dest_file:
  json.dump(rows_json, json_dest_file) 

# Test that
with open(processed_json_file_name) as f:
  test_data = json.load(f)

print(json.dumps(test_data, indent=2))



In [14]:
query_df = pandas.read_json(processed_json_file_name)
query_df.sort_values(by=['id'])

Unnamed: 0,annotated,axes,bits_per_component,data_set_id,expression,expression_path,failed,height,id,image_height,image_type,image_width,isi_experiment_id,lims1_id,number_of_components,ophys_experiment_id,path,projection_function,resolution,section_number,specimen_id,structure_id,tier_count,width,x,y
35,False,xy,8,,,,False,7582,324045359,7582,MaximumIntensityProjection - xy,7588,,,1,,/external/mousecelltypes/prod149/specimen_3149...,max,0.1144,0,314900022,,6,7588,0,0
38,False,yz,8,,,,False,5745,324305462,5745,MaximumIntensityProjection - yz,758,,,1,,/external/mousecelltypes/prod155/specimen_3187...,max,0.1144,0,318733871,,6,758,0,0
10,False,xy,8,,,,False,3895,326479418,3895,MinimumIntensityProjection - xy,5743,,,1,,/external/mousecelltypes/prod170/specimen_3206...,min,0.1144,0,320668841,,6,5743,0,0
46,False,yz,8,,,,False,7587,365459984,7587,MinimumIntensityProjection - yz,996,,,1,,/external/mousecelltypes/prod132/specimen_4659...,min,0.1144,0,465924786,,6,996,0,0
40,False,yz,8,,,,False,5746,396703682,5746,MinimumIntensityProjection - yz,1098,,,1,,/external/mousecelltypes/prod225/specimen_3234...,min,0.1144,0,323452245,,6,1098,0,0
5,False,xy,8,,,,False,5735,397052186,5735,MinimumIntensityProjection - xy,5750,,,1,,/external/mousecelltypes/prod231/specimen_3280...,min,0.1144,0,328093618,,6,5750,0,0
37,False,xy,8,,,,False,5743,464095208,5743,MaximumIntensityProjection - xy,7582,,,1,,/external/mousecelltypes/prod187/specimen_3234...,max,0.1144,0,323475862,,6,7582,0,0
36,False,xy,8,,,,False,9460,469948842,9460,MinimumIntensityProjection - xy,9462,,,1,,/external/mousecelltypes/prod258/specimen_4681...,min,0.1144,0,468193142,,7,9462,0,0
28,False,xy,8,,,,False,5764,471348554,5764,MinimumIntensityProjection - xy,5763,,,1,,/external/mousecelltypes/prod281/specimen_3972...,min,0.1144,0,397220859,,6,5763,0,0
9,False,xy,8,,,,False,9437,473452257,9437,MinimumIntensityProjection - xy,5790,,,1,,/external/mousecelltypes/prod312/specimen_4698...,min,0.1144,0,469803003,,7,5790,0,0
