# Workflow 1, Module 2

One approach to solving this module is to not define quite so tightly what's going on at the subquestion level in terms of enrichements and archetypes and so on, but simply pass the question to ROBOKOP and let its scoring bring the the best answers to the top.  Here we will use the quick service to start with a disease, find relevant phenotypes, and from there find genetic conditions. The answers will come out ranked by path.

For more details, see the "quick" notebook in greengamma/general.

First, we'll have a quick function that calls the quick service, and some functions for properly creating the question. 

In [2]:
import requests
import pandas as pd


def quick(question,max_connectivity=None):
    url=f'http://robokop.renci.org:80/api/simple/quick'
    if max_connectivity is not None:
        url += f'?max_connectivity={max_connectivity}'
    print(url)
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

In [3]:
def parse_answer(returnanswer,node_id):
    #First, parse out the parts of the kg that we want, names and types
    kg_node_names = { n['id']: n['name'] if 'name' in n else n['id'] for n in returnanswer['knowledge_graph']['nodes'] }
    kg_edge_types = { e['id']: e['type'] for e in returnanswer['knowledge_graph']['edges']}
    kg_edge_sources = { e['id']: e['edge_source'] for e in returnanswer['knowledge_graph']['edges']}
    answers = [ {"result_id": answer['node_bindings'][node_id], 
                 "result_name": kg_node_names[answer['node_bindings'][node_id]], #if 'name' in node else node['id'], 
                 "score" :      answer['score']}
              for answer in returnanswer['answers']]
    return pd.DataFrame(answers)

## Without pathway expansion

The module as written starts with a genetic condition, gets genes, and then gets chemicals interacting with those genes.   On the presumption that this is insufficient to get into new chemical space, it also adds the expansion from genes to other genes via pathways.   But lets originally start without the pathway expansion.  So we'll just do 

`genetic_condition` to `gene` to `chemical_substance`

In [4]:
def no_expand_question(disease_id):
    q = {
        "machine_question": {
            "nodes": [
                {
                    "id": "n0",
                    "type": "disease",
                    "curie": disease_id
                },
                {
                    "id": "n1",
                    "type": "gene"
                },
                {
                    "id": "n2",
                    "type": "chemical_substance"
                }
            ],
            "edges": [
                {
                    "id": "e0",
                    "source_id": "n0",
                    "target_id": "n1"
                },
                {
                    "id": "e1",
                    "source_id": "n1",
                    "target_id": "n2"
                }
            ]
        }
    }
    return q
    

In [5]:
glucose_intolerance = 'MONDO:0001076' #glucose intolerance
cf = 'MONDO:0009061' #cystic fibrosis
sids = 'MONDO:0010086'

In [5]:
glucose_intolerance_question = no_expand_question(glucose_intolerance)
gi_answer = quick(glucose_intolerance_question)

http://robokop.renci.org:80/api/simple/quick
Return Status: 200


In [13]:
gi_frame = parse_answer(gi_answer,'n2')
gi_frame

Unnamed: 0,result_id,result_name,score
0,CHEBI:45713,trans-resveratrol,0.255890
1,CHEBI:29865,benzo[a]pyrene,0.252910
2,CHEBI:16469,17beta-estradiol,0.252229
3,CHEBI:28119,"2,3,7,8-tetrachlorodibenzodioxine",0.252009
4,CHEBI:254496,"7,12-dimethyltetraphene",0.251872
5,CHEBI:17234,glucose,0.251201
6,CHEBI:9516,thapsigargin,0.249734
7,CHEBI:16503,selane,0.247647
8,CHEBI:45713,trans-resveratrol,0.247472
9,CHEBI:28119,"2,3,7,8-tetrachlorodibenzodioxine",0.247332


In [6]:
SIDS_question = no_expand_question(sids)
SIDS_answer = quick(SIDS_question)
SIDS_frame = parse_answer(SIDS_answer,'n2')
SIDS_frame

http://robokop.renci.org:80/api/simple/quick
Return Status: 200


Unnamed: 0,result_id,result_name,score
0,CHEBI:86990,N-methyl-3-phenyl-3-[4-(trifluoromethyl)phenox...,0.250799
1,CHEBI:28593,quinidine,0.250005
2,CHEBI:29678,sodium arsenite,0.249569
3,CHEBI:3374,capsaicin,0.249171
4,CHEBI:37537,phorbol 13-acetate 12-myristate,0.248834
5,CHEBI:27958,cocaine,0.248506
6,CHEBI:28487,reserpine,0.248345
7,CHEBI:16469,17beta-estradiol,0.246984
8,CHEBI:6809,methamphetamine,0.246532
9,CHEBI:28119,"2,3,7,8-tetrachlorodibenzodioxine",0.245686


In [None]:
CF_question = no_expand_question(cf)
#import json
#print(json.dumps(CF_question,indent=4))
CF_answer = quick(CF_question,max_connectivity=1000)

http://robokop.renci.org:80/api/simple/quick?max_connectivity=1000


In [15]:
CF_frame = parse_answer(CF_answer,'n2')
CF_frame

TypeError: 'Response' object is not subscriptable

## With pathway expansion

We can also run the question as asked by opening up the genes to those involved in the same biological processes

In [None]:
def pathway_expand_question(disease_id):
    q = {
        "machine_question": {
            "nodes": [
                {
                    "id": "n0",
                    "type": "disease",
                    "curie": disease_id
                },
                {
                    "id": "n1",
                    "type": "gene"
                },
                {
                    "id": "n2",
                    "type": "biological_process_or_activity"
                },
                {
                    "id": "n3",
                    "type": "gene"
                },
                {
                    "id": "n4",
                    "type": "chemical_substance"
                }
            ],
            "edges": [
                {
                    "id": "e0",
                    "source_id": "n0",
                    "target_id": "n1"
                },
                {
                    "id": "e1",
                    "source_id": "n1",
                    "target_id": "n2"
                },
                {
                    "id": "e2",
                    "source_id": "n2",
                    "target_id": "n3"
                },
                {
                    "id": "e3",
                    "source_id": "n3",
                    "target_id": "n4"
                }
            ]
        }
    }
    return q

In [None]:
gi_exp_question = pathway_expand_question(glucose_intoloerance)
gi_exp_answer = quick(gi_exp_question,max_connectivity=1000)
gi_exp_frame = extract_final_nodes(gi_exp_answer)
gi_exp_frame

In [None]:
CF_exp_question = pathway_expand_question(CF)
CF_exp_answer = quick(CF_exp_question,max_connectivity=1000)
CF_exp_frame = extract_final_nodes(CF_exp_answer)
CF_exp_frame

In [None]:
def pathway_expand_sharing_process_question(disease_id):
    q = {
        "machine_question": {
            "nodes": [
                {
                    "id": "n0",
                    "type": "disease",
                    "curie": disease_id
                },
                {
                    "id": "n1",
                    "type": "gene"
                },
                {
                    "id": "n2",
                    "type": "biological_process_or_activity"
                },
                {
                    "id": "n3",
                    "type": "gene"
                },
                {
                    "id": "n4",
                    "type": "chemical_substance"
                }
            ],
            "edges": [
                {
                    "id": "e0",
                    "source_id": "n0",
                    "target_id": "n1"
                },
                {
                    "id": "e1",
                    "source_id": "n1",
                    "target_id": "n2"
                },
                {
                    "id": "e2",
                    "source_id": "n2",
                    "target_id": "n3"
                },
                {
                    "id": "e3",
                    "source_id": "n3",
                    "target_id": "n4"
                },
                {
                    "id": "e4",
                    "source_id": "n0",
                    "target_id": "n2"
                }
            ]
        }
    }
    return q