# Query:
For Drug X that treats Disease Y, find other diseases Drug X might be re-purposed for based on phenotype similarity with Disease Y.

# Proposed Sub-Queries/Tasks:
1. **Traversal**: Retreive diseases that are indictions for Drug X

    Drug -[drug_used_for_treatement] -> SNOMED ID (DrugCentral, MyChem.info) -> Disease Ontology ID (DrugCentral)
    
2. **Traversal**: Retreive all phenotypes for each disease in this set

    Disease Ontology ID -[has_phenotype]-> Human Phenotype Onotology ID (BioLink)
     
3. **Computation**: Find most phenotypically similar diseases based on profile of primary indication

    Potentially through [hposim](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321842/) using their R client

## Import BioThings Explorer

In [1]:
# import biothing explorer (current in local, will make it an independent python package)
from visual import pathViewer
# pathViewer is a Python class for graphically display API connection maps and explore bio-entity relaitonships
k = pathViewer()
#Show How APIs/Endpoints/Bio-Entities can be connected together
# set display_graph=True to display the api road map
k.show_api_road_map(display_graph=False)

## Sub-Query 1: Retreive diseases that are indictions for Drug X 
Here we use 'riluzole' as an example

### Find Path from Drug Name to SNOMED ID

In [3]:
# this command finds the API Endpoints which connect from the start position (Drug Name) to the end position (SNOMED CT ID)
k.find_path(start='Drug Name', end='SNOMED CT', display_graph=False)

Path 0: [{'input': 'Drug Name', 'endpoint': 'http://mychem.info/v1/query', 'output': 'SNOMED CT'}]


[[{'endpoint': 'http://mychem.info/v1/query',
   'input': 'Drug Name',
   'output': 'SNOMED CT'}]]

### Feed Drug Name to the selected path

In [4]:
# This command feed drug name 'riluzole' to the path selected, and get 'snomed id' as the output
k.find_output(path=k.paths[0], value='riluzole', display_graph=False)

{'q': 'aeolus.drug_name:riluzole OR chebi.chebi_name:riluzole OR '
      'chembl.pref_name:riluzole OR drugbank.name:riluzole'}


In [5]:
# This command summarize the result
k.result_summary()

Your exploration starts from Drug Name: riluzole. 
 It goes through 1 API Endpoints. 
 The final output comes from API Endpoint http://mychem.info/v1/query. 
 You can access the final output by calling the 'final_results' object in pathViewer Class.



In [7]:
# This command printout the results, which are 'SNOMED CT IDs' that are indications for drug 'riluzole'
k.final_results

{'riluzole': ['235876009',
  '426000000',
  '166603001',
  '86044005',
  '303011007',
  '18165001',
  '64667001',
  '14783006',
  '7200002']}

### Convert SNOMED ID to DOID using Drugcentral doid_xref file
The file could be found in [CQ Notebook Folder](https://raw.githubusercontent.com/NCATS-Tangerine/cq-notebooks/master/Orange_QB2_Other_CQs/Drug_Repurpose_By_Pheno/doid_xref.csv)

In [8]:
# This part reads in the DOID_SNOMED conversion file as a data frame
import pandas as pd
file_url = 'https://raw.githubusercontent.com/NCATS-Tangerine/cq-notebooks/master/Orange_QB2_Other_CQs/Drug_Repurpose_By_Pheno/doid_xref.csv'
df = pd.read_csv(file_url,names=['structure_id', 'doid', 'xref', 'xref_id'])
# This part converts snomed id found above to doid
doid_list = []
for snomed_id in k.final_results['riluzole']:
    doid = df.loc[df['xref_id'] == snomed_id]['doid']
    for _doid in doid:
        doid_list.append(_doid)
print(doid_list)

['DOID:332', 'DOID:1227', 'DOID:3082', 'DOID:2741']


## Intermediate Results:
Diseases that are indictions for Drug **riluzole** include: **'DOID:332', 'DOID:1227', 'DOID:3082', 'DOID:2741'**.

The results come from **'http://mychem.info/v1/query'** endpoint. 

## Sub-Query 2: Retreive all phenotypes for each disease in this set

### Find Path from DOID to HPO ID

In [9]:
# this command finds the API Endpoints which connect from the start position (DOID) to the end position (HPO ID)
k.find_path(start='Human Disease Ontology', end='Human Phenotype Ontology', display_graph=False)

Path 0: [{'input': 'Human Disease Ontology', 'endpoint': 'https://api.monarchinitiative.org/api/bioentity/disease/disease_id/phenotypes/', 'output': 'Human Phenotype Ontology'}]


[[{'endpoint': 'https://api.monarchinitiative.org/api/bioentity/disease/disease_id/phenotypes/',
   'input': 'Human Disease Ontology',
   'output': 'Human Phenotype Ontology'}]]

### Feed the DOID list found above to the selected path

In [11]:
# This command feed DOID List ['DOID:332', 'DOID:1227', 'DOID:3082', 'DOID:2741] to the path selected, and get 'HPO ID' as the output
k.find_output(path=k.paths[0], value=doid_list, display_graph=False)

In [12]:
# This command summarize the result
k.result_summary()

Your exploration starts from Human Disease Ontology: ['DOID:332', 'DOID:1227', 'DOID:3082', 'DOID:2741']. 
 It goes through 1 API Endpoints. 
 The final output comes from API Endpoint https://api.monarchinitiative.org/api/bioentity/disease/disease_id/phenotypes/. 
 You can access the final output by calling the 'final_results' object in pathViewer Class.



In [15]:
# This command printout the results, which are 'HPO IDs' that are phenotypes for the DOIDs list
# only print out the first 30 phenotypes for DOID:332 here.
HPO_ID_332 = ['HP:' + _result for _result in k.final_results['DOID:332']]
HPO_ID_332[0:30]

['HP:0009067',
 'HP:0001310',
 'HP:0003198',
 'HP:0001739',
 'HP:0001308',
 'HP:0002015',
 'HP:0003881',
 'HP:0002459',
 'HP:0000739',
 'HP:0001348',
 'HP:0002511',
 'HP:0002460',
 'HP:0003473',
 'HP:0003447',
 'HP:0002380',
 'HP:0000183',
 'HP:0002446',
 'HP:0002366',
 'HP:0001284',
 'HP:0003547',
 'HP:0002145',
 'HP:0000217',
 'HP:0000605',
 'HP:0002483',
 'HP:0002495',
 'HP:0000733',
 'HP:0002186',
 'HP:0003202',
 'HP:0001278',
 'HP:0100256']

## Final Results
For HPO IDs for each of DOID, access it using k.final_results