What are all the different ways that HuBMAP data can be accessed programmatically? This survey could be the basis of user-facing documentation, or it could guide efforts to align and unify resources.

Scope includes:
- HTTP APIs without Python SDK wrappers are included
- Config files on github included, if there's nothing better

Out of scope:
- Code and data which are not freely available
- Tools that are only of use to HuBMAP developers
- Software libraries not in Python
- Bioinformatics tools for handling particular file types

## TSV Download

**Maintainer**: Harvard

**Description**: Simple HTTP interface to pull entity metadata. URL queries supported.

**Backing API**: Search API

**Doc Style**: Short pargraph of MD, checked in to `portal-ui`

**Doc URL**: https://portal.hubmapconsortium.org/apis

**Source URL**: https://github.com/hubmapconsortium/portal-ui/blob/main/context/app/routes_api.py#L34

In [2]:
import csv
import io
import requests
import urllib

query = {'assay_type': 'CODEX'}
url_base = 'https://portal.hubmapconsortium.org/metadata/v0/datasets.tsv'
url_query = urllib.parse.urlencode(query)
csv_text = requests.get(f'{url_base}?{url_query}').text
datasets = [d for d in csv.DictReader(io.StringIO(csv_text), dialect=csv.excel_tab)]

datasets[0].keys()

odict_keys(['uuid', 'hubmap_id', 'acquisition_instrument_model', 'acquisition_instrument_vendor', 'analyte_class', 'assay_category', 'assay_type', 'donor.hubmap_id', 'execution_datetime', 'is_targeted', 'number_of_antibodies', 'number_of_channels', 'number_of_cycles', 'operator', 'operator_email', 'pi', 'pi_email', 'preparation_instrument_model', 'preparation_instrument_vendor', 'protocols_io_doi', 'reagent_prep_protocols_io_doi', 'resolution_x_unit', 'resolution_x_value', 'resolution_y_unit', 'resolution_y_value', 'resolution_z_unit', 'resolution_z_value', 'section_prep_protocols_io_doi'])

In [8]:
datasets[0]['preparation_instrument_model']

'The model number/name of the instrument used to prepare the sample for the assay'

In [7]:
{d['preparation_instrument_model'] for d in datasets[1:]}

{'prototype robot - Stanford/Nolan Lab', 'version 1 robot'}

## Cells SDK

**AKA**: hubmap-api-py-client

**Maintainer**: CMU / Harvard

**Description**: Idiomatic wrapper around Cells API providing set operations and result filtering

**Backing API**: https://github.com/hubmapconsortium/cross_modality_query AKA "Cells API"

**Doc style**: Python doc tests in markdown on github

**Doc URL**: https://github.com/hubmapconsortium/hubmap-api-py-client

**Source URL**: https://github.com/hubmapconsortium/hubmap-api-py-client

In [None]:
%pip install hubmap-api-py-client

In [33]:
from hubmap_api_py_client import Client
client = Client('https://cells.dev.hubmapconsortium.org/api/')

# Examples in docs are currently broken: Sean suggested this fix.
gene_symbol = client.select_genes(where="modality", has=["rna"]).get_list()[0]['gene_symbol']
cells_with_gene = client.select_cells(where='gene', has=[f'{gene_symbol} > 0.5'], genomic_modality='rna')

print(gene_symbol, len(cells_with_gene))

A1BG 10584


In [36]:
dataset_a_uuid = client.select_datasets().get_list()[0]['uuid']
dataset_b_uuid = client.select_datasets().get_list()[1]['uuid']
print(dataset_a_uuid, dataset_b_uuid)

cells_in_datasets = client.select_cells(where='dataset', has=[dataset_a_uuid, dataset_b_uuid])
print(cells_in_datasets)

# Errors: "unsupported operand type(s)"
cells_with_gene_in_datasets = cells_with_gene & cells_in_datasets

01d94c78c858b9944b4bbdd5b273c2bd 046251c94ea0e79ee935dd3de57e093c
<CellResultsSet base_url=https://cells.dev.hubmapconsortium.org/api/ handle=2a0f7c4ca147eb367c7808b862f93e0f730935b7709198cbb1e9060cd0cfd1d9>


ClientError: unsupported operand type(s) for &: 'tuple' and 'tuple'

In [37]:
dataset_a_uuid = client.select_datasets().get_list()[0]['uuid']
dataset_b_uuid = client.select_datasets().get_list()[1]['uuid']
cells_in_a_len = len(client.select_cells(where='dataset', has=[dataset_a_uuid]))
cells_in_b_len = len(client.select_cells(where='dataset', has=[dataset_b_uuid]))
cells_in_datasets = client.select_cells(where='dataset', has=[dataset_a_uuid, dataset_b_uuid])
cells_in_datasets_len = len(cells_in_datasets)
assert cells_in_datasets_len > 0
assert cells_in_datasets_len == cells_in_a_len + cells_in_b_len

In [38]:
cells_with_gene_in_datasets = cells_with_gene & cells_in_datasets
assert len(cells_with_gene_in_datasets) > 0

ClientError: unsupported operand type(s) for &: 'tuple' and 'tuple'

## Entity API

**Maintainer**: PSC

**Description**: Wrapper around Neo4J database-of-record. It has methods for traversing the provenance graph, and can return the details for individual entities; It does not provide search functionality.

**Doc Style**: Smart API; Interactive

**Doc URL**: https://smart-api.info/ui/0065e419668f3336a40d1f5ab89c6ba3

**Source URL**: https://github.com/hubmapconsortium/entity-api/

In [21]:
import requests

entity_api_url = 'https://entity.api.hubmapconsortium.org'

requests.get(f'{entity_api_url}/entity-types').json()

['Collection', 'Dataset', 'Donor', 'Sample', 'Upload']

In [25]:
id = 'HBM668.QFDW.774' # UUID also supported
entity = requests.get(f'{entity_api_url}/entities/{id}').json()
entity['title']

'snATAC-seq (SNARE-seq2) [SnapATAC] data from the lung (right) of a 37.0-year-old black or african american male'

In [28]:
ancestors = requests.get(f'{entity_api_url}/ancestors/{id}').json()
[(a['entity_type'], a['uuid']) for a in ancestors]

[('Dataset', '4bc9b335040544bc76d87acb189e594a'),
 ('Sample', '27997171ea74885abbd91a99cac360d9'),
 ('Sample', '6e5e5be224d88f38aa390d0c389839c7'),
 ('Sample', '0e1c2d399477b244ac006eb58918ec0c'),
 ('Donor', '4397fcd072ac96299992b47da1dbae64'),
 ('Dataset', '18f644163d1114f46dc67cc75f0a8edd'),
 ('Dataset', 'c277864db8e229bb4336428b5e1e096d'),
 ('Dataset', 'b94df37c7a261274840750d994bc42a9')]

## CCF API

**AKA**:

**Maintainer**: IU

**Description**: List the reference organ models, and relates then to the physical organ entities

**Doc Style**: Stoplight API docs; Interactive

**Doc URL**: https://ccf-api.hubmapconsortium.org/#/

**Source URL**: https://github.com/hubmapconsortium/ccf-ui/tree/main/projects/ccf-api

In [40]:
import requests

ccf_api_url = 'https://ccf-api.hubmapconsortium.org/v1'

requests.get(f'{ccf_api_url}/technology-names').json()

['10x', 'AF', 'CODEX', 'IMC', 'LC', 'MALDI', 'OTHER', 'PAS']

In [42]:
import urllib

query = {'age-range': '20,40', 'technologies': '10x,AF'}
url_query = urllib.parse.urlencode(query)
response = requests.get(f'{ccf_api_url}/tissue-blocks').json()

In [50]:
[r['@id'] for r in response[:3]]

['https://gtexportal.org/home/eqtls/tissue?tissueName=Colon_Sigmoid#FTissueBlocks',
 'https://gtexportal.org/home/eqtls/tissue?tissueName=Colon_Sigmoid#MTissueBlocks',
 'https://gtexportal.org/home/eqtls/tissue?tissueName=Colon_Transverse#FTissueBlocks']

## HuBMAP Commons

**AKA**:

**Maintainer**: PSC

**Description**: Internal utilities, but it also contains the constants used across the APIs... possibly of interest to external users?

**Backing API**: n/a

**Doc Style**: README

**Doc URL**: https://github.com/hubmapconsortium/commons

**Source URL**: https://github.com/hubmapconsortium/commons

In [1]:
%pip install hubmap-commons

Collecting hubmap-commons
  Using cached https://files.pythonhosted.org/packages/0b/15/1d3d930baf7544acd71fe07e069ca98b20b14982c1e1ecf22c409fef41b8/hubmap_commons-2.0.13-py3-none-any.whl
Collecting neo4j>=4.2.1 (from hubmap-commons)
Collecting cachetools>=4.2.1 (from hubmap-commons)
  Using cached https://files.pythonhosted.org/packages/19/99/ace1769546388976b45e93445bb04c6df95e96363f03fbb56f916da5ebde/cachetools-5.0.0-py3-none-any.whl
Collecting property>=2.2 (from hubmap-commons)
Collecting PyYAML>=5.3.1 (from hubmap-commons)
  Using cached https://files.pythonhosted.org/packages/9d/f6/7e91fbb58c9ee528759aea5892e062cccb426720c5830ddcce92eba00ff1/PyYAML-6.0-cp37-cp37m-macosx_10_9_x86_64.whl
Installing collected packages: neo4j, cachetools, property, PyYAML, hubmap-commons
  Found existing installation: PyYAML 5.1.2
[31mERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a parti

In [None]:
# Things to try, if we could install:

from hubmap_commons.globus_groups import get_globus_groups_info

get_globus_groups_info()

## TODO: Antibody API

**AKA**:

**Maintainer**:

**Description**:

**Backing API**:

**Doc Style**:

**Doc URL**:

**Source URL**:

## TODO: Search API

**AKA**:

**Maintainer**:

**Description**:

**Backing API**:

**Doc Style**:

**Doc URL**:

**Source URL**:

## TODO: Ontology API

**AKA**:

**Maintainer**:

**Description**:

**Backing API**:

**Doc Style**: SmartAPI; Interactive

**Doc URL**: https://smart-api.info/ui/dea4bf91545a51b3dc415ba37e2a9e4e

**Source URL**: https://github.com/hubmapconsortium/ontology-api

## TODO: Python SDK

**AKA**:

**Maintainer**:

**Description**: A Python interface to the various HuBMAP web services

**Backing API**: multiple?

**Doc Style**: README

**Doc URL**: https://github.com/hubmapconsortium/python-sdk

**Source URL**: https://github.com/hubmapconsortium/python-sdk

## TODO: Command Line Transfer

**AKA**: Bulk download

**Maintainer**:

**Description**: TODO: If it will be pip installed, then it should be easy to expose the python interface as well as the command line executable.

**Backing API**:

**Doc Style**: 

**Doc URL**: 

**Source URL**: https://github.com/hubmapconsortium/hubmap-clt

## TODO: Azimuth

**AKA**: 

**Maintainer**:

**Description**: TODO: Just list the list of references? API shortcut around the Shiny UI?

**Backing API**:

**Doc Style**: 

**Doc URL**: 

**Source URL**:

## TODO: ASCTB

**AKA**: 

**Maintainer**:

**Description**: TODO: Are these somehow embedded in the CCF API?

**Backing API**:

**Doc Style**: 

**Doc URL**: 

**Source URL**: