# imProving Agent - Relay Examples 2020-09-21

imProving Agent currently supports n-hop linear queries to SPOKE that are optionally processed through ranking algorithms.

NOTE: This notebook generated with Python 3.7.5 on 2020-09-23. The only requirement (beyond Jupyter) is `requests`, which can be installed with `pip install requests`

## Nodes and Edges

### Currently supporting nodes and their identifiers
| biolink node type | SPOKE node type | supported curies (examples) |  
| --- | --- | --- |
|biolink:BiologicalProcess| BiologicalProcess | GO:0000348 |
|biolink:Cell| CellType | CL:1001033 |
|biolink:CellularComponent| CellularComponent | GO:0005833 |
|biolink:ChemicalSubstance| Compound | DB00122 or CHEMBL1076872 |
|biolink:Disease| Disease | DOID:0111749 |
|biolink:Gene| Gene | NCBIGene:672 |
|biolink:GrossAnatomicalStructure| Anatomy | UBERON:2001647 |
|biolink:MolecularActivity | MolecularFunction | GO:0061731 |
|biolink:Pathway| Pathway | WP4804_r109130 |
|biolink:PhenotypicFeature| Symptom | D000078064 |
|biolink:Protein| Protein | UNIPROT:Q9BYF1 |

Note: additional nodes types exist in SPOKE, but do not map well onto the existing biolink. Further, curie normalization happens internally for querying KPs, but is absent from query resolution at the moment. If these are returned in your query, they will be identified as biolink:NamedThing

### Edges

Edge types cannot be specified at this point. 

Edges have not been normalized to biolink at this point in time. imProving Agent will return SPOKE edges that are represented as `<VERB>_<Subject abbreviation><verb abbreviation><Object abbreviation>`, for example `DOWNREGULATES_CdD` is English translated as compound downregulates disease.

See: http://www.cgl.ucsf.edu/home/meng/spoke/docs/index.html for further details on SPOKE's data

## Ranking Algorithm Options
### PSEV: Propagated SPOKE Entry Vectors
PSEVs are derived from the UCSF EHR (https://www.nature.com/articles/s41467-019-11069-0). They weight all graph nodes based on their presence in random walks through the graph for any given concept in SPOKE, in this case diseases.  
  
Currently available PSEV Contexts:
- DOID:9351     Diabetes mellitus
- DOID:9970     Obesity
- DOID:10763    Hypertension
- DOID:14330    Parkinson's Disease
- DOID:3393     Coronary Artery Disease
- DOID:2377     Multiple sclerosis
- DOID:7148     Rheumatoid arthritis
- DOID:3083     Chronic Obstructive Pulmonary Disease (COPD)
- DOID:0060224  Atrial Fibrillation
- DOID:2800     Idiopathic Pulmonary Disease
- DOID:9617     Albuminuria

## Other Options
Query KPs: whether to query Translator Knowledge Providers to get ranking information. This can be slow (up to 5 minutes) because it requires node normalization and sometimes hundreds of transactions over the open internet.

n_results: number of results, which is currently limited to 200. Note that longer queries (2 or 3+ hops) tend to emphasize the terminal node. In other words, in a 3-hop query with only 200 results, the first 5 (node 1 - edge 1 - node 2 - edge 2 - node 3) elements of the response may be exactly the same for all 200 results, with only the terminal edge 3 and node 4 changing.

In [None]:
# Examples

In [None]:
import requests

IA_BASE_URL = 'https://evidara.healthdatascience.cloud'
IA_TRAPI_QUERY_URL = '/api/v1/query'

In [None]:
# simple one-hop query for gene-gene
# "Which genes are related to BRCA1?"
#
# Note: no ranking

gene_gene_request_payload = {
    "query_message": {
        "query_graph":{ 
            "nodes":[
                {
                    "node_id": "n0",
                    "curie": "NCBIGene:672",
                    "type": "biolink:Gene"
                },
                {
                    "node_id": "n1",
                    "curie": "",
                    "type":"biolink:Gene"
                }
            ],
            "edges":[
                {
                    "edge_id": "e0",
                    "source_id":"n0",
                    "target_id":"n1"
                }
            ]
        },
        "query_options":{}
    }
}

In [None]:
gene_gene_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=gene_gene_request_payload)
gene_gene_response.raise_for_status()

gene_gene_results = gene_gene_response.json()['results']

In [None]:
# one-hop query for compound-protein
# "Which proteins does modafinil bind? Rank them in the context of multiple sclerosis"
compound_protein_request_payload = {
    "query_message":{
        "query_graph":{
            "nodes":[
                {
                    "node_id": "n0",
                    "curie": "DB00745",
                    "type":"biolink:ChemicalSubstance"
                },
                {
                    "node_id": "n1",
                    "curie": "",
                    "type":"biolink:Protein"
                }
            ],
            "edges":[
                {
                    "edge_id": "e0",
                    "source_id":"n0",
                    "target_id":"n1"
                }
            ]
        },
        "query_options":{
            "psev-context":"DOID:2377"
        }
    }
}

In [None]:
compound_protein_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=compound_protein_request_payload)
compound_protein_response.raise_for_status()

compound_protein_results = compound_protein_response.json()['results']

In [None]:
# three hop demonstration of unspecified node
# "Show me two-hop paths between Parkinson's drugs and proteins"
three_hop_request_payload = {
    "query_message":{
        "query_graph":{
            "nodes":[
                {"node_id": "n0", "curie": "DOID:14330", "type": "biolink:Disease"},
                {"node_id": "n1", "curie": "","type": "biolink:ChemicalSubstance"},
                {"node_id": "n2", "curie": "","type": ""}, # biolink:NamedThing also works for 'any'
                {"node_id": "n3", "curie": "","type": "biolink:Protein"}
            ],
            "edges":[
                {"edge_id":"e0", "source_id": "n0", "target_id": "n1"},
                {"edge_id":"e1", "source_id": "n1", "target_id": "n2"},
                {"edge_id":"e2", "source_id": "n2", "target_id": "n3"}
            ]
        },
        "query_options": {"psev-context": "DOID:14330"}
    }
}

In [None]:
three_hop_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=three_hop_request_payload)
three_hop_response.raise_for_status()

three_hop_results = three_hop_response.json()['results']

In [None]:
# Integration Use Cases

In [None]:
# "Which proteins or genes might be related to a symptom of a disease given the drugs that treat it?"
# Symptom – Disease – Compound – Protein 
# specific question: 
#    "Which proteins might be related to symptoms of 
#     coronary artery disease given the drugs that treat it?
#     Query KPs for extra annotations and rank the results in 
#     context of coronary artery disease"

use_case_1_request_payload = {
    "query_message": {
        "query_graph": {
            "nodes": [
                {"node_id": "n0", "curie": "", "type": "biolink:PhenotypicFeature"},
                {"node_id": "n1", "curie": "DOID:3393", "type": "biolink:Disease"},
                {"node_id": "n2", "curie": "", "type": "biolink:ChemicalSubstance"},
                {"node_id": "n3", "curie": "", "type": "biolink:Protein"}
            ],
            "edges": [
                {"edge_id": "e0", "source_id": "n0", "target_id": "n1"},
                {"edge_id": "e1", "source_id": "n1", "target_id": "n2"},
                {"edge_id": "e2", "source_id": "n2", "target_id": "n3"}
            ]
        },
        "query_options": { "psev-context": "DOID:3393", "query_kps": "true" }
    }
}

In [None]:
# warning: several minutes
use_case_1 = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=use_case_1_request_payload)
use_case_1.raise_for_status()

use_case_1_results = use_case_1.json()['results']

In [None]:
use_case_1_results

In [None]:
# "For a patient with disease X, what are some factors 
# (such as genetic features, comorbidities, etc) that 
# could cause sensitivity or resistance to drug Y?"
#
# Compound - Disease - Disease - Gene
# 
# Specific question:
#     Which genes related to comorbidities of rheumatoid arthitis 
#     might cause sensitivity to drugs that treat COPD? Rank results 
#     in context of rheumatoid arthitis
#
#     NOTE: this specific example returns a 'child of disease
#     superclass' edge and demonstrates the need to specify edge types

use_case_2_request_payload = {
    "query_message":{
        "query_graph":{
            "nodes":[
                {"node_id": "n0","curie": "", "type": "biolink:ChemicalSubstance"},
                {"node_id": "n1","curie": "DOID:7148","type": "biolink:Disease"},
                {"node_id": "n2","curie": "", "type": "biolink:Disease"},
                {"node_id": "n3","curie": "", "type": "biolink:Gene"}
            ],
            "edges":[
                {"edge_id": "e0", "source_id": "n0", "target_id": "n1"},
                {"edge_id": "e1", "source_id": "n1", "target_id": "n2"},
                {"edge_id": "e2", "source_id": "n2", "target_id": "n3"}
            ]
        },
        "query_options":{
            "psev-context":"DOID:7148", "query_kps":"true"
        }
    }
}

In [None]:
# warning: several minutes
use_case_2 = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=use_case_2_request_payload)
use_case_2.raise_for_status()

use_case_2_results = use_case_2.json()['results']

In [None]:
use_case_2_results