## Obtain data

### Overview and resources

Organization: [Allen Brain Institute](https://www.alleninstitute.org/)

[API Documentation](http://help.brain-map.org/display/api/Allen+Brain+Atlas+API)
[Image download docs](http://help.brain-map.org/display/api/Downloading+an+Image)

[Terms of Use](https://alleninstitute.org/legal/terms-use/): non-commercial, freedom to innovate, [citation policy](https://alleninstitute.org/legal/citation-policy/)

SDK:
- GitHub page: https://github.com/AllenInstitute/AllenSDK
- Notebook: http://alleninstitute.github.io/AllenSDK/_static/examples/nb/cell_types.html

### Dataset of focus

Organism: [Mus Musculus (Mouse)](https://www.uniprot.org/taxonomy/10090)

### Overview

Custom scripts were used to obtain high-resolution tissue section images from the Allen Brain Institute Mouse (mus musculus) brain atlas [1]. Images were obtained from whole medial-orientation sections spanning lateral positions providing whole-brain coverage. A full list of section dataset IDs is available in Supplementary Table N. Sub-images of dimension 1024x1024 were then generated from raw images through a standard tiling scheme and used as input for the example generation pipeline. This data download step can be reproduced using tooling provided in our supporting code release, please refer to the README file in https://github.com/kubeflow/examples/enhance for more details.

## Running download job in batch

In [2]:
import os

In [3]:
APP_ROOT="/home/jovyan/work/examples-enhance/enhance"
LAUNCHER_PATH = os.path.join(APP_ROOT, "py", "launcher.py")

In [4]:
!python {LAUNCHER_PATH} --app_root {APP_ROOT} --mode download --batch

INFO:root:Parsed args: {'gcp_project': None, 'app_root': '/home/jovyan/work/examples-enhance/enhance', 'use_image': 'gcr.io/kubeflow-rl/common-base:0.0.1', 'batch': True, 'rebuild_base': False, 'volume_claim_id': 'nfs-1', 'namespace': 'kubeflow', 'mode': 'download'}
INFO:root:Staged app workdir to /mnt/nfs-1/studies/dev/enhance-download-0326-2252-7e4c/staging
{'_batch': True,
 '_command': ['python',
              '/mnt/nfs-1/studies/dev/enhance-download-0326-2252-7e4c/staging/py/download.py',
              '--data_root',
              '/mnt/nfs-1/studies/dev/enhance-download-0326-2252-7e4c/staging/data'],
 'apiVersion': 'batch/v1',
 'kind': 'Job',
 'metadata': {'name': 'enhance-download-0326-2252-7e4c',
              'namespace': 'kubeflow'},
 'spec': {'backoffLimit': 4,
          'template': {'spec': {'containers': [{'args': ['python',
                                                         '/mnt/nfs-1/studies/dev/enhance-download-0326-2252-7e4c/staging/py/download.py',
             

In [6]:
!tree /mnt/nfs-1/studies/dev/enhance-download-0326-2252-7e4c/staging/data

/mnt/nfs-1/studies/dev/enhance-download-0326-2252-7e4c/staging/data
├── meta
│   ├── 1024x1024_path_manifest_70450649.csv
│   ├── image_list_for_section70474875_1.csv
│   └── section_list_1.csv
└── raw
    └── 70474875
        └── 70450649
            ├── 1024x1024_0_70450649.jpg
            ├── 1024x1024_10_70450649.jpg
            ├── 1024x1024_11_70450649.jpg
            ├── 1024x1024_12_70450649.jpg
            ├── 1024x1024_13_70450649.jpg
            ├── 1024x1024_14_70450649.jpg
            ├── 1024x1024_15_70450649.jpg
            ├── 1024x1024_16_70450649.jpg
            ├── 1024x1024_1_70450649.jpg
            ├── 1024x1024_17_70450649.jpg
            ├── 1024x1024_18_70450649.jpg
            ├── 1024x1024_19_70450649.jpg
            ├── 1024x1024_20_70450649.jpg
            ├── 1024x1024_21_70450649.jpg
            ├── 1024x1024_22_70450649.jpg
            ├── 1024x1024_23_70450649.jpg
            ├── 1024x1024_24_70450649.jpg
            ├── 1024x1024_25_70450649.jpg
      

### What a full run looks like

In [18]:
!df -h

Filesystem                                                     Size  Used Avail Use% Mounted on
overlay                                                         95G   32G   63G  34% /
tmpfs                                                           60G     0   60G   0% /dev
tmpfs                                                           60G     0   60G   0% /sys/fs/cgroup
10.47.249.92:/export/pvc-fb2e4b6b-1be8-11e8-8c86-42010a8e006d 1007G   97G  860G  11% /mnt/nfs-1
10.47.251.54:/export/pvc-f9d1b4dd-1be8-11e8-8c86-42010a8e006d 1007G   93M  956G   1% /mnt/nfs-2
/dev/sda1                                                       95G   32G   63G  34% /etc/hosts
shm                                                             64M     0   64M   0% /dev/shm
tmpfs                                                           60G     0   60G   0% /sys/firmware


Currently have ~95Gb of combined whole and subimage data.

In [11]:
!ls /mnt/nfs-1/datasets/alleninst/mouse/raw

100142550  68545854  69549191  70430882  71489786  72002132  74317040  75831774
100144090  68545971  69549294  70474875  71493663  72054109  74359509  75988403
100144476  68546272  69618145  70513232  71506851  72076677  74363342  75988537
100144523  68632063  69672760  70521728  71529905  72079931  74404835  75990557
1063	   68667557  69734470  70524285  71642163  72082120  74452469  76003611
1085	   68744767  69735240  70537814  71642369  72121203  74452617  76080954
112649047  68798163  69735389  70539159  71668563  72138055  74512017  76081044
1546	   68798543  69771262  70545913  71689647  72149700  74533114  76081887
1580	   68798585  69782174  70546245  71699657  72180163  74640881  76115743
1742	   68799029  69815877  70595799  71700185  72255377  74750011  77332713
203	   68844618  69818030  70595871  71735074  72338175  74819256  77610775
227738	   68844654  69824386  70598359  71750672  72472461  74821733  77620339
2340	   68861717  69835625  70638525  71777437  72472485  74

In [19]:
# TODO: Include the above in appendix of study IDs

In [15]:
!ls /mnt/nfs-1/datasets/alleninst/mouse/raw/100142550

102173162  102173165  102173169  102173175  102173179
102173163  102173167  102173171  102173176  102173180
102173164  102173168  102173172  102173177  102173181


In [16]:
!tree /mnt/nfs-1/datasets/alleninst/mouse/raw/100142550/102173162

/mnt/nfs-1/datasets/alleninst/mouse/raw/100142550/102173162
├── 1024x1024_0_102173162.jpg
├── 1024x1024_100_102173162.jpg
├── 1024x1024_10_102173162.jpg
├── 1024x1024_101_102173162.jpg
├── 1024x1024_102_102173162.jpg
├── 1024x1024_103_102173162.jpg
├── 1024x1024_104_102173162.jpg
├── 1024x1024_105_102173162.jpg
├── 1024x1024_106_102173162.jpg
├── 1024x1024_107_102173162.jpg
├── 1024x1024_108_102173162.jpg
├── 1024x1024_109_102173162.jpg
├── 1024x1024_110_102173162.jpg
├── 1024x1024_1_102173162.jpg
├── 1024x1024_11_102173162.jpg
├── 1024x1024_111_102173162.jpg
├── 1024x1024_112_102173162.jpg
├── 1024x1024_113_102173162.jpg
├── 1024x1024_114_102173162.jpg
├── 1024x1024_115_102173162.jpg
├── 1024x1024_116_102173162.jpg
├── 1024x1024_117_102173162.jpg
├── 1024x1024_118_102173162.jpg
├── 1024x1024_119_102173162.jpg
├── 1024x1024_12_102173162.jpg
├── 1024x1024_13_102173162.jpg
├── 1024x1024_14_102173162.jpg
├── 1024x1024_15_102173162.jpg
├── 1024x1024_16_102173162.jpg
├── 1024x1024_17_102173

### Future improvements

- Use HDF5

## Dataset references

1. Lein, E.S. et al. (2007) Genome-wide atlas of gene expression in the adult mouse brain, Nature 445: 168-176. doi: 10.1038/nature05453