# imProving Agent - Examples 2021-01-21

imProving Agent currently supports n-hop linear queries to SPOKE that are optionally processed through ranking algorithms.

NOTE: This notebook generated with Python 3.7.5 on 2021-01-21. The only requirement (beyond Jupyter) is `requests`, which can be installed with:  
`pip install requests`

## Nodes and Edges

imProving Agent will attempt to handle all incoming CURIEs by translating them in real time via node-normalization to something appropriate for SPOKE. That said, we do not currently "traverse" the biolink ontology, so input node categories and edge predicates must match exactly.

### Currently supporting nodes and their identifiers
| biolink node type | SPOKE node type | supported curies (examples) |  
| --- | --- | --- |
|biolink:BiologicalProcess| BiologicalProcess | GO:0000348 |
|biolink:Cell| CellType | CL:1001033 |
|biolink:CellularComponent| CellularComponent | GO:0005833 |
|biolink:ChemicalSubstance| Compound | DB00122 or CHEMBL1076872 |
|biolink:Disease| Disease | DOID:0111749 |
|biolink:Gene| Gene | NCBIGENE:672 |
|biolink:GrossAnatomicalStructure| Anatomy | UBERON:2001647 |
|biolink:MolecularActivity | MolecularFunction | GO:0061731 |
|biolink:Pathway| Pathway | WP4804_r109130 |
|biolink:PhenotypicFeature| Symptom | D000078064 |
|biolink:Protein| Protein | UNIPROT:Q9BYF1 |

Note: additional nodes types exist in SPOKE, but do not map well onto the existing biolink.

### Edges

Most SPOKE edges have been mapped to biolink equivalents. Use the TRAPI `predicates` endpoint to retrieve this information.

See: https://spoke.rbvi.ucsf.edu/docs/index.html for further details on SPOKE's data

## Options
All options are specified as keys in the `query` object, e.g. `query_json = {"message" { ... }, "option_1": "value_1"}`
### Ranking Algorithm Options
#### PSEV: Propagated SPOKE Entry Vectors
PSEVs are derived from the UCSF EHR (https://www.nature.com/articles/s41467-019-11069-0). They weight all graph nodes based on their presence in random walks through the graph for any given concept in SPOKE, in this case diseases.  
  
Currently available PSEV Contexts:
- DOID:9351     Diabetes mellitus
- DOID:9970     Obesity
- DOID:10763    Hypertension
- DOID:14330    Parkinson's Disease
- DOID:3393     Coronary Artery Disease
- DOID:2377     Multiple sclerosis
- DOID:7148     Rheumatoid arthritis
- DOID:3083     Chronic Obstructive Pulmonary Disease (COPD)
- DOID:0060224  Atrial Fibrillation
- DOID:2800     Idiopathic Pulmonary Disease
- DOID:9617     Albuminuria

### Other Options
`query_kps`: whether to query Translator Knowledge Providers to get ranking information. This can be slow (up to 5 minutes) because it sometimes requires hundreds of transactions over the open internet.

`max_results`: number of results, which is currently limited to 200. Note that longer queries (2 or 3+ hops) tend to emphasize the terminal node. In other words, in a 3-hop query with only 200 results, the first 5 (node 1 - edge 1 - node 2 - edge 2 - node 3) elements of the response may be exactly the same for all 200 results, with only the terminal edge 3 and node 4 changing.

## Examples

In [None]:
import requests

IA_BASE_URL = 'https://evidara.healthdatascience.cloud'
IA_TRAPI_QUERY_URL = '/api/v1/query'

### Basic Queries - SPOKE only 

In [None]:
# simple one-hop query for gene-gene
# "Which genes are related to BRCA1?"
#
# Note: no ranking
gene_gene_request_payload = {
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "NCBIGENE:672",
          "category": "biolink:Gene"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      }
    }
  }
}

In [None]:
gene_gene_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=gene_gene_request_payload)
gene_gene_response.raise_for_status()

gene_gene_results = gene_gene_response.json()

In [None]:
gene_gene_results

In [None]:
# same query as above, but with predicates specified
gene_regulates_gene_request_payload = {
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "NCBIGENE:672",
          "category": "biolink:Gene"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicate": [
            "biolink:negatively_regulates_entity_to_entity",
            "biolink:positively_regulates_entity_to_entity"
          ]
        }
      }
    }
  }
}

In [None]:
gene_regulates_gene_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=gene_regulates_gene_request_payload)
gene_regulates_gene_response.raise_for_status()

gene_regulates_gene_results = gene_regulates_gene_response.json()

In [None]:
# one-hop query for compound-protein
# "Which proteins does modafinil bind? Rank them in the context of multiple sclerosis"
compound_protein_request_payload = {
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "DB00745",
          "category": "biolink:ChemicalSubstance"
        },
        "n01": {
          "category": "biolink:Protein"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        }
      }
    },
  },
  "psev_context": "DOID:2377"
}


In [None]:
compound_protein_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=compound_protein_request_payload)
compound_protein_response.raise_for_status()

compound_protein_results = compound_protein_response.json()

In [None]:
# three hop demonstration of unspecified node
# "Show me two-hop paths between Parkinson's drugs and proteins"
three_hop_request_payload = {
  "message": {
    "query_graph": {
      "nodes": {
        "n00": { "id": "DOID:14330", "category": "biolink:Disease" },
        "n01": { "category": "biolink:ChemicalSubstance" },
        "n02": {},
        "n03": { "category": "biolink:Protein" }
      },
      "edges": {
        "e00": { "subject": "n00", "object": "n01" },
        "e01": { "subject": "n01", "object": "n02" },
        "e02": { "subject": "n02", "object": "n03" }
      }
    }
  },
  "psev_context": "DOID:14330"
}

In [None]:
three_hop_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=three_hop_request_payload)
three_hop_response.raise_for_status()

three_hop_results = three_hop_response.json()

### Queries that utilize KPs

Gene-Gene queries that are made in a PSEV "disease context" will result in queries  
to the Multiomics Provider's BigGIM. These queries are first "optimized" by SPOKE  
wherein we query SPOKE for anatomy that is relevant to the disease of interest and  
only fetch expression data from the related tissues in BigGIM

In [None]:
gene_gene_for_biggim = {
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00",
          "predicate": [
            "biolink:negatively_regulates_entity_to_entity",
            "biolink:positively_regulates_entity_to_entity"
          ]
        }
      },
      "nodes": {
        "n00": { "category": "biolink:Gene", "id": "1831" },
        "n01": { "category": "biolink:Gene" }
      }
    }
  },
  "query_kps": True,
  "psev_context": "DOID:3393",  # coronary artery disease
  "max_results": 50  # cuts down on time for this demo
}


In [None]:
biggim_response = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=gene_gene_for_biggim)
biggim_response.raise_for_status()

biggim_response = biggim_response.json()

In [None]:
# inspect the edges in the knowledge graph to see
# additional annotations that have been fetched from BigGIM
# for some edges

biggim_response['message']['knowledge_graph']

### Integration Use Cases

#### **IMPORTANT NOTE**: KP querying was broken during the upgrade to TRAPI 1.0 and is being restored gradually with a target of completion by the end of January. The queries below will still work, but they may or may not query any/all of the KPs.

In [None]:
# "Which proteins or genes might be related to a symptom of a disease given the drugs that treat it?"
# Symptom – Disease – Compound – Protein 
# specific question: 
#    "Which proteins might be related to symptoms of 
#     coronary artery disease given the drugs that treat it?
#     Query KPs for extra annotations and rank the results in 
#     context of coronary artery disease"

use_case_1_request_payload = {
  "message": {
    "query_graph": {
      "nodes": {
        "n00": { "category": "biolink:PhenotypicFeature" },
        "n01": { "id": "DOID:3393", "category": "biolink:Disease" },
        "n02": { "category": "biolink:ChemicalSubstance" },
        "n03": { "category": "biolink:Protein" }
      },
      "edges": {
        "e00": { "subject": "n00", "object": "n01" },
        "e01": { "subject": "n01", "object": "n02" },
        "e02": { "subject": "n02", "object": "n03" }
      }
    }
  },
  "psev_context": "DOID:3393",
  "query_kps": True
}

In [None]:
# warning: several minutes
use_case_1 = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=use_case_1_request_payload)
use_case_1.raise_for_status()

use_case_1_results = use_case_1.json()

In [None]:
# "For a patient with disease X, what are some factors 
# (such as genetic features, comorbidities, etc) that 
# could cause sensitivity or resistance to drug Y?"
#
# Compound - Disease - Disease - Gene
# 
# Specific question:
#     Which genes related to comorbidities of rheumatoid arthitis 
#     might cause sensitivity to drugs that treat COPD? Rank results 
#     in context of rheumatoid arthitis
#
#     NOTE: this specific example returns a 'child of disease
#     superclass' edge and demonstrates the need to specify edge types

use_case_2_request_payload = {
  "message": {
    "query_graph": {
      "nodes": {
        "n00": { "category": "biolink:ChemicalSubstance" },
        "n01": { "id": "DOID:7148", "category": "biolink:Disease" },
        "n02": { "category": "biolink:Disease" },
        "n03": { "category": "biolink:Gene" }
      },
      "edges": {
        "e00": { "subject": "n00", "object": "n01" },
        "e01": { "subject": "n01", "object": "n02" },
        "e02": { "subject": "n02", "object": "n03" }
      }
    }
  },
  "psev_context": "DOID:7148",
  "query_kps": True
}

In [None]:
# warning: several minutes
use_case_2 = requests.post(f'{IA_BASE_URL}{IA_TRAPI_QUERY_URL}', json=use_case_2_request_payload)
use_case_2.raise_for_status()

use_case_2_results = use_case_2.json()