#### Simple Exerciser of multiple biological data source acquisition

##### Goals

- Use Jupyter notebooks (or Zeppelin or Galaxy) as a flexible and open workbench
- Explain and demonstrate API access to different data sources
- (BONUS POINTS) Demonstrate integration, comparison and visualization of diverse data sources

##### Simplifications for this first version

- We're going to look up 'acetylsalicylic acid' rather than 'aspirin', because it is a common term in all of the sources right now and I'm not sure that the Monarch BioLink API I'm using has the term 'aspirin' yet.




### CHEBI Data

Monarch ingests [Chemical Entities of Biological Interest (ChEBI)](https://www.ebi.ac.uk/chebi/) data and makes it available via SciGraph, the Monarch API, and the new BioLink API.

For reference, here is the link to CHEBI's entry for 'acetylsalicylic acid' (aka 'Aspirin'):

[CHEBI:15365 acetylsalicylic acid](https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15365)

### BioLink substance data from CHEBI via Monarch

Monarch has ingested CHEBI data, and we have a `/biolink/substance/{id}/participant_in/` endpoint that seems to return some data:

https://api.monarchinitiative.org/api/bioentity/substance/CHEBI:40036/participant_in/

However, the basic `/biolink/substance/{id}` endpoint returns no useful data, so we'll have to use the above link until BioLink has a fleshed out `/substance` endpoint.


### GINAS API Substance data from GINAS


[GINAS Aspirin](https://tripod.nih.gov/ginas/app/api/v1/substances/search?q=root_names_name:"^ASPIRIN$”)

[GINAS acetylsalicylic acid](https://tripod.nih.gov/ginas/app/api/v1/substances/search?q=root_names_name:"^acetylsalicylic acid$”)



---

### Jupyter Environment Setup

- `pip install pandas requests`


### Working Code


In [2]:
import pandas as pd
from urllib.parse import urlencode

#### Reading BioLink's `/substance` endpoint

In [19]:
biolinkURL = "https://api.monarchinitiative.org/api/bioentity/substance/CHEBI%3A15365/participant_in/?rows=20&fetch_objects=true"
df = pd.read_json(biolinkURL, typ="frame", orient="records")
# df.head(5)
df

Unnamed: 0,evidence_graph,evidence_types,id,object,object_extension,provided_by,publications,qualifiers,relation,slim,subject,subject_extension,type
0,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
1,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
2,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
3,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
4,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
5,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
6,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
7,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
8,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,
9,"{'nodes': None, 'edges': None}",,,"{'types': None, 'taxon': {'id': None, 'label':...",,,,,"{'categories': None, 'id': None, 'types': None...",,"{'types': None, 'taxon': {'id': None, 'label':...",,


In [11]:
ginasBase = "https://tripod.nih.gov/ginas/app/api/v1/substances/search?"
ginasParams = {'q': "root_names_name:\"^acetylsalicylic acid$\""}

ginasPath = urlencode(ginasParams)
ginasURL = ginasBase + ginasPath

# ginasURL = 'https://tripod.nih.gov/ginas/app/api/v1/substances/search?q=root_names_name:\"^acetylsalicylic%20acid$"'
print(ginasURL)

https://tripod.nih.gov/ginas/app/api/v1/substances/search?q=root_names_name%3A%22%5Eacetylsalicylic+acid%24%22


In [12]:
import requests
r = requests.get(ginasURL)
r.json()


{'content': [{'_approvalIDDisplay': 'R16CO5Y76E',
   '_codes': {'count': 51,
    'href': 'http://tripod.nih.gov/ginas/app/api/v1/substances(8911c794-5da3-4934-a683-16d98d93db97)/codes'},
   '_moieties': {'count': 1,
    'href': 'http://tripod.nih.gov/ginas/app/api/v1/substances(8911c794-5da3-4934-a683-16d98d93db97)/moieties'},
   '_name': 'ASPIRIN',
   '_names': {'count': 76,
    'href': 'http://tripod.nih.gov/ginas/app/api/v1/substances(8911c794-5da3-4934-a683-16d98d93db97)/names'},
   '_references': {'count': 73,
    'href': 'http://tripod.nih.gov/ginas/app/api/v1/substances(8911c794-5da3-4934-a683-16d98d93db97)/references'},
   '_relationships': {'count': 27,
    'href': 'http://tripod.nih.gov/ginas/app/api/v1/substances(8911c794-5da3-4934-a683-16d98d93db97)/relationships'},
   '_self': 'http://tripod.nih.gov/ginas/app/api/v1/substances(8911c794-5da3-4934-a683-16d98d93db97)?view=full',
   'access': [],
   'approvalID': 'R16CO5Y76E',
   'approved': 1470433417000,
   'approvedBy': 'FD

In [28]:

df = pd.read_json(ginasURL, typ='frame', orient="index")
# df.head(5)
df

ValueError: arrays must all be same length