This notebook demonstrates the Blue Team's solution to **workflows 1 and 2**, which takes a **disease** as input and return relevant **genes and pathways**. Note, some of the boilerplate code (API calls for instance) is wrapped in `biggim.py`, which contains API calls written by John Earls and Theo Knijnenburg. You can learn more about BigGIM in in their <a href="https://github.com/NCATS-Tangerine/cq-notebooks/tree/master/BigGIM">github</a>. 

Notebook written by: Samson Fong, John Earls, Theo Knijnenburg, and Aaron Gary. 

# Workflow

<img src="images/workflow.png">

Here, we will utitlize BioThings from the orange team to help identity a small relevant disease gene set and tissue. From there, we will query BigGIM for the similarity data between genes and finally utilize DDOT to construct hierarchical pathways from this data. 

In the implementations below, the tissue is currently hardcoded. This will be revised in another version of the workflow.

In [14]:
%load_ext autoreload 
%autoreload 2

import json
import requests
import pandas as pd
from biggim import doid_to_genes, call_biggim

ucsd_hostname = 'ec2-52-37-226-115.us-west-2.compute.amazonaws.com'

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [15]:
ucsd_hostname

'ec2-52-37-226-115.us-west-2.compute.amazonaws.com'

## Implementation

In [2]:
# Getting seed genes
genes = doid_to_genes(["678"])

INFO:root:Geting HP ids from DOID


http://biothings.io/explorer/api/v2/directinput2output?input_prefix=doid&output_prefix=hp&input_value=678&format=translator


INFO:root:Geting OMIM from HP ids


Returned 26 phenotypes
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0002067&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0002354&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0000511&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0000716&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0000623&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0000514&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=hp&output_prefix=omim.disease&input_value=0002381&format=translator


INFO:root:Geting genes from OMIM


Returned 26 mendelian diseases
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=168100&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=610297&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=606324&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=183050&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=607688&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=616361&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=614251&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=609161&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=300911&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=616413&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=277730&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=608907&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=255900&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=609636&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=607625&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=617308&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=616053&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=257220&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=606693&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=168605&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=617145&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=615005&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=616882&format=translator




http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=602079&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=604348&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=615007&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=125480&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=603204&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=608907&format=translator
http://biothings.io/explorer/api/v2/directinput2output?input_prefix=omim.disease&output_prefix=ncbigene&input_value=604928&format=translator
http://biothi



Returned 52 genes


### Calling BigGIM

In [3]:
expanded_genes = call_biggim(
    genes, 
    ["GTEx_Brain_Correlation"], 
    100000, 
    query_id2=False, 
    return_genes=True, 
    limit_genes=250,
)

Sent: GET http://biggim.ncats.io/api/biggim/query?restriction_gt=GTEx_Brain_Correlation%2C0.5&table=BigGIM_70_v1&columns=GTEx_Brain_Correlation&ids1=27429%2C11315%2C1981%2C8622%2C10159%2C9213%2C7086%2C217%2C125%2C1743%2C450086%2C780912%2C10577%2C3988%2C4864%2C23400%2C1639%2C57582%2C2328%2C8863%2C1453%2C8864%2C5159%2C4095%2C6531%2C7494%2C50971%2C450086%2C493856%2C79734%2C6310%2C6314%2C2629%2C6311%2C4287%2C5173%2C10528%2C5828%2C10939%2C6311%2C6622%2C25793%2C6311%2C23533%2C60481%2C2903%2C27286%2C780851%2C22926%2C3822%2C216%2C7280%2C319101%2C3119%2C151507%2C5621%2C692157%2C8360%2C79644%2C79974%2C3897%2C2896%2C477%2C8246&limit=100000
{
  "status": "submitted",
  "request_id": "a08690b0-29e9-4572-ac19-8a57cf4938cd"
}
Sent: GET http://biggim.ncats.io/api/biggim/status/a08690b0-29e9-4572-ac19-8a57cf4938cd?None
{
  "status": "running",
  "message": "Query job is running.",
  "request_id": "a08690b0-29e9-4572-ac19-8a57cf4938cd"
}
Checking again
Sent: GET http://biggim.ncats.io/api/biggim/status/

In [5]:
network = call_biggim(
    expanded_genes, 
    ["GTEx_Brain_Correlation"], 
    1000000000, 
    query_id2=True, 
    return_genes=False, 
)

Sent: GET http://biggim.ncats.io/api/biggim/query?restriction_gt=GTEx_Brain_Correlation%2C0.5&table=BigGIM_70_v1&columns=GTEx_Brain_Correlation&ids1=104%2C119%2C161%2C400%2C478%2C491%2C492%2C719%2C773%2C774%2C899%2C951%2C1016%2C1114%2C1213%2C1452%2C1453%2C1609%2C1639%2C1653%2C1794%2C1951%2C2026%2C2583%2C2664%2C2686%2C2821%2C2822%2C2903%2C2975%2C3098%2C3361%2C3508%2C3631%2C3675%2C3746%2C3748%2C3897%2C3988%2C4141%2C4293%2C4542%2C4943%2C4976%2C5048%2C5110%2C5291%2C5310%2C5463%2C5526%2C5686%2C5707%2C5709%2C5862%2C5870%2C6014%2C6314%2C6326%2C6598%2C6709%2C6749%2C6812%2C6860%2C6916%2C7277%2C7280%2C7305%2C7385%2C7464%2C7786%2C7805%2C7905%2C8028%2C8120%2C8455%2C8497%2C8506%2C8509%2C8533%2C8567%2C8729%2C8803%2C8893%2C8943%2C8992%2C9070%2C9143%2C9145%2C9229%2C9677%2C9737%2C9746%2C9762%2C9764%2C9777%2C9806%2C9807%2C9810%2C9820%2C9826%2C9853%2C9862%2C9867%2C9900%2C9912%2C10159%2C10296%2C10367%2C10369%2C10423%2C10489%2C10500%2C10523%2C10528%2C10577%2C10579%2C10594%2C10681%2C10745%2C10858%2C10868%2C

### Calling DDOT

In [16]:
network.iloc[:, 1:].to_csv('tmp.csv', sep='\t', header=None, index=None)

files = {'file': open('tmp.csv', 'rb')}

ddot_host = f'http://{ucsd_hostname}:8383'
ddot_route = '/api/ontology'
ddot_query = '?alpha=0.007&beta=0.5'
url = '{}{}{}'.format(ddot_host, ddot_route, ddot_query)

print(url)

ndex_url = 'http://test.ndexbio.org/#/network/'
r = requests.post(url, files=files)
if r is not None and r.text is not None:
    uuid = r.text
    print('{}{}'.format(ndex_url, uuid))

http://ec2-52-37-226-115.us-west-2.compute.amazonaws.com:8383/api/ontology?alpha=0.007&beta=0.5
http://dev.ndexbio.org/#/network/8c39b8c7-c7f8-11e8-98d5-0660b7976219 



The hierarchy itself is stored on NDEx and visible on <a href="http://hiview.ucsd.edu">HiView</a>. To view on HiView, you need the NDEx url (the NDEx sever the hierarchy is hosted on) and the unique identifier (UUID). 

# Alternative Gene Expansion Method

For the expansion of genesets, we can also use random walk to achieve the same goal. There are several ways to pull a network for the random walk. One way is to query BigGIM as follows. "column" represents the column of the BigGIM table to use. 

In [None]:
nbgwas_host = f'http://{ucsd_hostname}:5000'
nbgwas_route = '/nbgwas'

url = '{}{}'.format(nbgwas_host, nbgwas_route)

payload = {'seeds': ','.join(genes), 'alpha': '0.2', 'column':'GTEx_Brain_Correlation'}

r = requests.post(url, data=payload)
if r is not None and r.json() is not None:
    print(json.dumps(r.json(), indent=4))

Alternatively, you can upload your own network instead of querying BigGIM. Be aware that you would need to ensure the gene names are in the same name space (using `mygene` for instance). 

In [None]:
# Convert from Entrez to gene symbol
import mygene

mg = mygene.MyGeneInfo()
query = mg.querymany(','.join(genes), scopes=['entrezgene'], as_dataframe=True)
genes_symbol = ','.join(query['symbol'].values.tolist())

In [None]:
files = {'network': open('data/3col_interactions260.csv', 'rb')}

nbgwas_host = f'http://{ucsd_hostname}:5000'
nbgwas_route = '/nbgwas'

url = '{}{}'.format(nbgwas_host, nbgwas_route)

payload = {'seeds': genes, 'alpha': '0.8'}

r = requests.post(url, files=files, data=payload)
if r is not None and r.json() is not None:
    print(json.dumps(r.json(), indent=4))

Finally, you can also pull from NDEx where there are many networks available for the public. 

In [None]:
nbgwas_host = f'http://{ucsd_hostname}:5000'
nbgwas_route = '/nbgwas'

url = '{}{}'.format(nbgwas_host, nbgwas_route)

payload = {'seeds': ','.join(genes), 'alpha': '0.2', 'ndex':'f93f402c-86d4-11e7-a10d-0ac135e8bacf'}

r = requests.post(url, data=payload)
if r is not None and r.json() is not None:
    print(json.dumps(r.json(), indent=4))

The following will show the result of the random walk, and the genes can then be ranked by the random walk score. 

In [None]:
pd.DataFrame.from_dict(r.json(), orient='index').sort_values(by=0, ascending=False)