# Using ROBOKOP's quick service to find hypoglycemic coma treatments

Hypoglycemic Coma is a phenotype from HPO, but there are no chemicals associated with it in ROBOKOP.  We can take a couple of approaches to come up with chemicals for it:

1. Pick a similar disease (hypoglycemia) by fiat, and look for its treatments
2. Find a path that starts at Hypoglycemic Coma, and goes to treatments via diseases.

## Approach 1: Drugs that treat hypoglycemia

The most basic functionality in answering questions is to start with an entity and find other connected entities.
In this context, an entity is defined by a curie-formatted identifier.

Quick handles general pattern matching queries with arbitrarily specified nodes and edges.  In this case, we're going to run this query:

(?)-[treats]->(Hypoglycemia)

And see what ROBOKOP returns.

These cells set up some of the functions that we want to use:

In [1]:
import requests
import json
import pandas as pd

robokop_server = 'robokop.renci.org'

def quick(question):
    """Call robokop's quick service with a json-question"""
    url=f'http://{robokop_server}:80/api/simple/quick/'
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

def make_N_step_question(types,curies,props,forwards):
    """Create a json question that can be passed to quick.  The question will be a linear chain where
    the nodes on the chain have types, and each node can have an optional query. There can also be 
    optional predicate types (props).  Forwards is an array with the same cardinality as props and indicates
    the direction of the edge."""
    question = {
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    for i,t in enumerate(types):
        newnode = {'id': f'n{i}', 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        if i > 0:
            if forwards[i-1]:
                edge = {'id': f'e{i}', 'source_id': f'n{i-1}', 'target_id': f'n{i}'}
            else:
                edge = {'id': f'e{i}', 'source_id': f'n{i}', 'target_id': f'n{i-1}'}
            if props[i-1] is not None:
                edge['type'] = props[i-1]
            question['machine_question']['edges'].append( edge )
    return question

def answers2frame(returnanswer):
    #First, parse out the parts of the kg that we want, names and types
    kg_node_names = { n['id']: n['name'] if 'name' in n else n['id'] for n in returnanswer['knowledge_graph']['nodes'] }
    #now put the answers into a table, looking up node/edge info from the above dicts
    #because of the question, we know that we're interested in the binding for "n1" and "e0"
    answers = [ {"chem_id": answer['node_bindings']['n1'], 
                 "chem_name": kg_node_names[answer['node_bindings']['n1']], 
                 "score" :      answer['score']}
              for answer in returnanswer['answers']]
    return pd.DataFrame(answers)

In [2]:
#Specify the particular question we want to answer.  This says, start at the disease MONDO:0004946 (which is
# hypoglycemia) and find a chemical_substance that treats it.  The [False] in the final argument means that the
# direction of the last edge is 'backwards' from chemical_substance to disease.
q = make_N_step_question(['disease','chemical_substance'], ['MONDO:0004946', None], ['treats'], [False])

#What does the question format look like?
#import json
#print(json.dumps(q,indent=4))

In [3]:
a = quick(q)

Return Status: 200


In [4]:
df = answers2frame(a)
df

Unnamed: 0,chem_id,chem_name,score
0,CHEBI:4495,diazoxide,1.068687
1,CHEBI:17234,glucose,0.740737
2,MESH:C000589286,2-(4-methoxyphenyl)ethyl-2-acetamido-2-deoxygl...,0.408147
3,CHEBI:17855,triglyceride,0.408147
4,CHEBI:4636,diphenhydramine,0.408147
5,MESH:D004041,Dietary Fats,0.408147
6,MESH:C532601,"sodium-4,5-dihydroxy-1,3-benzene disulfonate",0.408147
7,MESH:D024502,alpha-Tocopherol,0.408147
8,MESH:D011429,Propolis,0.408147
9,CHEBI:3650,chlorpropamide,0.408147


## Approach 2: Drugs that treat diseases that have a phenotype of hypoglycemic coma

If we really want to start, not with hypoglycemia, but with hypoglycemic coma, we will have to go through a disease:

(?)-[treats]->(disease)<-[has_phenotype]-(Hypoglycemic coma)

And see what ROBOKOP returns.

In [5]:
q2 = make_N_step_question(['phenotypic_feature','disease','chemical_substance'], ['HP:0001325', None,None], ['has_phenotype','treats'], [False,False])

#What does the question format look like?
print(json.dumps(q2,indent=4))

{
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "phenotypic_feature",
                "curie": "HP:0001325"
            },
            {
                "id": "n1",
                "type": "disease"
            },
            {
                "id": "n2",
                "type": "chemical_substance"
            }
        ],
        "edges": [
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n0",
                "type": "has_phenotype"
            },
            {
                "id": "e2",
                "source_id": "n2",
                "target_id": "n1",
                "type": "treats"
            }
        ]
    }
}


In [6]:
a2 = quick(q2)

Return Status: 200


In [7]:
def answers2frame2(graph_answers):
    """A function for taking the standard answer format and converting it into a simple frame.
    Specific to the question in this notebook."""
    #note that diesase in the question is n1 and chemical is n2
    #First, parse out the parts of the kg that we want, names and types
    kg_node_names = { n['id']: n['name'] if 'name' in n else n['id'] for n in graph_answers['knowledge_graph']['nodes'] }
    #now put the answers into a table, looking up node/edge info from the above dicts
    #because of the question, we know that we're interested in the binding for "n1" and "e0"
    answers = [ {"disease_id": answer['node_bindings']['n1'], 
                 "disease_name": kg_node_names[answer['node_bindings']['n1']], 
                 "chem_id": answer['node_bindings']['n2'], 
                 "chem_name": kg_node_names[answer['node_bindings']['n2']], 
                 "score" :      answer['score']}
              for answer in graph_answers['answers']]
    return pd.DataFrame(answers)
  
df2 = answers2frame2(a2)
df2

Unnamed: 0,chem_id,chem_name,disease_id,disease_name,score
0,CHEBI:6801,metformin,MONDO:0005148,type 2 diabetes mellitus,0.529276
1,CHEBI:5441,glyburide,MONDO:0005148,type 2 diabetes mellitus,0.516758
2,CHEBI:28748,doxorubicin,MONDO:0004992,cancer,0.493752
3,CHEBI:9753,troglitazone,MONDO:0005148,type 2 diabetes mellitus,0.492174
4,CHEBI:6801,metformin,MONDO:0005015,diabetes mellitus (disease),0.492174
5,CHEBI:8228,pioglitazone,MONDO:0005148,type 2 diabetes mellitus,0.486219
6,CHEBI:5441,glyburide,MONDO:0005015,diabetes mellitus (disease),0.461883
7,CHEBI:175901,gemcitabine,MONDO:0015882,rare tumor of pancreas,0.448824
8,CHEBI:175901,gemcitabine,MONDO:0021040,pancreatic neoplasm,0.448824
9,CHEBI:50122,rosiglitazone,MONDO:0005148,type 2 diabetes mellitus,0.448800


Here we can see the problem with this approach: hypoglycemic coma can be a phenotype of diabetes, but that does not mean that a treatment for diabetes will also treat hypoglycemic coma. In fact many times the coma can be produced by overmedicating the diabetes, leading to too much insulin, and too little glucose in the body.

I think that this is indicating that either or both the has_phenotype and treats relationships are (in this case) too overloaded to allow for fine reasoning.   Another way of viewing this is that our overall phenotype-disease-chemical pathway is flawed: treating a disease is simply not the same as treating a phenotype that sometimes occurs with the disease (though it may work in some cases!)