# Introduction


This notebook demonstrates how BioThings Explorer can be used to execute queries having more than one intermediate nodes:

The query starts from drug "Anisindione", the two intermediate nodes with be *Gene and DiseaseOrPhenotypicFeature", the final output will be "PhenotypicFeature".


**Background**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/biothings/biothings_explorer/blob/master/jupyter%20notebooks/Multi%20intermediate%20nodes%20query.ipynb).**

## Step 0: Load BioThings Explorer modules

Install the `biothings_explorer` and `biothings_schema` packages, as described in this [README](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/README.md#prerequisite).  This only needs to be done once (but including it here for compability with [colab](https://colab.research.google.com/)).

In [None]:
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

Next, import the relevant modules:

* **Hint**: Find corresponding bio-entity representation used in BioThings Explorer based on user input (could be any database IDs, symbols, names)
* **FindConnection**: Find intermediate bio-entities which connects user specified input and output

In [1]:
from biothings_explorer.hint import Hint
from biothings_explorer.user_query_dispatcher import FindConnection
import nest_asyncio
nest_asyncio.apply()

## Step 1: Find representation of "Anisindione" in BTE

In this step, BioThings Explorer translates our query string "Anisindioine"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [2]:
ht = Hint()
anisindione = ht.query("Anisindione")['ChemicalSubstance'][0]

anisindione

{'DRUGBANK': 'DB01125',
 'CHEBI': 'CHEBI:133809',
 'name': 'anisindione',
 'primary': {'identifier': 'CHEBI',
  'cls': 'ChemicalSubstance',
  'value': 'CHEBI:133809'},
 'display': 'CHEBI(CHEBI:133809) DRUGBANK(DB01125) name(anisindione)',
 'type': 'ChemicalSubstance'}

## Step 2: Find phenotypes that are associated with Anisindione through Gene and DiseaseOrPhenotypicFeature as intermediate nodes

In this section, we find all paths in the knowledge graph that connect Anisindione to any entity that is a phenotypic feature.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

In [3]:
fc = FindConnection(input_obj=anisindione, 
                    output_obj='PhenotypicFeature', 
                    intermediate_nodes=['Gene', 'Disease'])

In [4]:
fc.connect(verbose=True)


BTE will find paths that join 'anisindione' and 'PhenotypicFeature'.                   Paths will have 2 intermediate node.

Intermediate node #1 will have these type constraints: Gene

Intermediate node #2 will have these type constraints: Disease




==== Step #1: Query path planning ====

Because anisindione is of type 'ChemicalSubstance', BTE will query our meta-KG for APIs that can take 'ChemicalSubstance' as input and 'Gene' as output

BTE found 10 apis:

API 1. semmed_chemical(13 API calls)
API 2. chembio(1 API call)
API 3. hmdb(1 API call)
API 4. mychem(3 API calls)
API 5. scigraph(1 API call)
API 6. scibite(1 API call)
API 7. dgidb(1 API call)
API 8. pharos(1 API call)
API 9. ctd(1 API call)
API 10. cord_chemical(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 2.1: https://biothings.ncats.io/semmedchemical/query?fields=physically_interacts_with (POST -d q=C0051919&sco

In [5]:
df = fc.display_table_view()

In [6]:
df.head()

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,...,node2_type,node2_name,node2_id,pred3,pred3_source,pred3_api,pred3_pubmed,output_type,output_name,output_id
0,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,disrupts,...,Disease,C0600688,UMLS:C0600688,affected_by,SEMMED,SEMMED Disease API,399568119668867,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
1,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,causes,...,Disease,C0600688,UMLS:C0600688,affected_by,SEMMED,SEMMED Disease API,399568119668867,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
2,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,C0600688,UMLS:C0600688,affected_by,SEMMED,SEMMED Disease API,399568119668867,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
3,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,treats,...,Disease,C0600688,UMLS:C0600688,affected_by,SEMMED,SEMMED Disease API,399568119668867,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
4,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,related_to,...,Disease,C0600688,UMLS:C0600688,affected_by,SEMMED,SEMMED Disease API,399568119668867,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE


The df object contains the full output from BioThings Explorer. Each row shows one path that joins the input node (ANISINDIONE) to an intermediate node (a gene or protein) to another intermediate node (a DisseaseOrPhenotypicFeature) to an ending node (a Phenotypic Feature). The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources). Let's remove all examples where the output_name (the phenotype label) is None, and specifically focus on paths with specific mechanistic predicates **target** and **causes**.

In [15]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "physically_interacts_with" and pred2 == "prevents"')

In [16]:
dfFilt

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,...,node2_type,node2_name,node2_id,pred3,pred3_source,pred3_api,pred3_pubmed,output_type,output_name,output_id
2,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,C0600688,UMLS:C0600688,affected_by,SEMMED,SEMMED Disease API,399568119668867,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
7,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,C0600688,UMLS:C0600688,disrupted_by,SEMMED,SEMMED Disease API,17628682,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
12,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,C0600688,UMLS:C0600688,caused_by,SEMMED,SEMMED Disease API,1449514,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
17,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,C0600688,UMLS:C0600688,treated_by,SEMMED,SEMMED Disease API,10865999,PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
22,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,C0600688,UMLS:C0600688,prevented_by,SEMMED,SEMMED Disease API,"1610167,6975248,6381340,3697994,12495814,96435...",PhenotypicFeature,REDUCED GLUTATHIONE,name:REDUCED GLUTATHIONE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47532,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,BREAST CANCER,MONDO:MONDO:0007254,related_to,,BioLink API,,PhenotypicFeature,HP:0002861,HP:HP:0002861
47631,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,BREAST CANCER,MONDO:MONDO:0007254,related_to,,BioLink API,22538716,PhenotypicFeature,HP:0025318,HP:HP:0025318
51259,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,BREAST CANCER,MONDO:MONDO:0007254,related_to,,BioLink API,,PhenotypicFeature,HP:0030406,HP:HP:0030406
54475,ANISINDIONE,ChemicalSubstance,physically_interacts_with,,DGIdb API,,Gene,GC,NCBIGene:2638,prevents,...,Disease,SHOCK,MONDO:C0036974,coexists_with,SEMMED,SEMMED Disease API,26825935,PhenotypicFeature,PREGNANCY TEST NEGATIVE,name:PREGNANCY TEST NEGATIVE
