# Workflow 1, Module 2

One approach to solving this module is to not define quite so tightly what's going on at the subquestion level in terms of enrichements and archetypes and so on, but simply pass the question to ROBOKOP and let its scoring bring the the best answers to the top.  Here we will use the quick service to start with a disease, find relevant phenotypes, and from there find genetic conditions. The answers will come out ranked by path.

For more details, see the "quick" notebook in greengamma/general.

First, we'll have a quick function that calls the quick service, and some functions for properly creating the question. 

In [1]:
#Load some functions for calling the quick service and parsing its output

import os
import sys
module_path = os.path.abspath(os.path.join('../../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from gg_functions import parse_answer, quick, get_view_url

#To make nicer looking outputs
from IPython.core.display import display, HTML

## Without pathway expansion

The module as written starts with a genetic condition, gets genes, and then gets chemicals interacting with those genes.   On the presumption that this is insufficient to get into new chemical space, it also adds the expansion from genes to other genes via pathways.   But lets originally start without the pathway expansion.  So we'll just do 

`genetic_condition` to `gene` to `chemical_substance`

In [2]:
def no_expand_question(disease_id):
    q = {
        "machine_question": {
            "nodes": [
                {
                    "id": "n0",
                    "type": "disease",
                    "curie": disease_id
                },
                {
                    "id": "n1",
                    "type": "gene"
                },
                {
                    "id": "n2",
                    "type": "chemical_substance",
                    "drug": True
                }
            ],
            "edges": [
                {
                    "id": "e0",
                    "source_id": "n0",
                    "target_id": "n1"
                },
                {
                    "id": "e1",
                    "source_id": "n1",
                    "target_id": "n2"
                }
            ]
        }
    }
    return q
    

In [3]:
glucose_intolerance = 'MONDO:0001076' #glucose intolerance
sids = 'MONDO:0010086'

In [4]:
glucose_intolerance_question = no_expand_question(glucose_intolerance)
gi_answer = quick(glucose_intolerance_question)

Return Status: 200


In [5]:
gi_frame = parse_answer(gi_answer,node_list=['n2'],edge_list=[])
gi_frame

Unnamed: 0,n2 - id,n2 - name,score
0,CHEBI:44185,methotrexate,0.877307
1,CHEBI:28088,genistein,0.781815
2,CHEBI:41500,4-phenylbutyric acid,0.751353
3,CHEBI:6801,metformin,0.749104
4,CHEBI:42471,forskolin,0.737284
5,CHEBI:50122,rosiglitazone,0.734526
6,CHEBI:5441,glyburide,0.710310
7,CHEBI:41879,dexamethasone,0.707996
8,CHEBI:46345,5-fluorouracil,0.701950
9,CHEBI:18388,apigenin,0.698508


In [6]:
SIDS_question = no_expand_question(sids)
SIDS_answer = quick(SIDS_question)
SIDS_frame = parse_answer(SIDS_answer,node_list=['n2'],edge_list=[])
SIDS_frame

Return Status: 200


Unnamed: 0,n2 - id,n2 - name,score
0,CHEBI:27958,cocaine,0.820608
1,CHEBI:6809,methamphetamine,0.815531
2,CHEBI:75984,flecainide,0.777763
3,CHEBI:6456,lidocaine,0.777261
4,CHEBI:28593,quinidine,0.776896
5,CHEBI:17688,(S)-nicotine,0.768765
6,CHEBI:15355,acetylcholine,0.750312
7,CHEBI:1391,"3,4-methylenedioxymethamphetamine",0.728962
8,CHEBI:7640,nortriptyline,0.717750
9,CHEBI:47499,imipramine,0.649263


## With pathway expansion

We can also run the question as asked by opening up the genes to those involved in the same biological processes

In [7]:
def pathway_expand_question(disease_id):
    q = {
        "machine_question": {
            "nodes": [
                {
                    "id": "n0",
                    "type": "disease",
                    "curie": disease_id
                },
                {
                    "id": "n1",
                    "type": "gene"
                },
                {
                    "id": "n2",
                    "type": "biological_process_or_activity"
                },
                {
                    "id": "n3",
                    "type": "gene"
                },
                {
                    "id": "n4",
                    "type": "chemical_substance",
                    "drug": True
                }
            ],
            "edges": [
                {
                    "id": "e0",
                    "source_id": "n0",
                    "target_id": "n1"
                },
                {
                    "id": "e1",
                    "source_id": "n1",
                    "target_id": "n2"
                },
                {
                    "id": "e2",
                    "source_id": "n2",
                    "target_id": "n3"
                },
                {
                    "id": "e3",
                    "source_id": "n3",
                    "target_id": "n4"
                }
            ]
        }
    }
    return q

In [8]:
gi_exp_question = pathway_expand_question(glucose_intolerance)
gi_exp_answer = quick(gi_exp_question,max_connectivity=500)
gi_exp_frame = parse_answer(gi_exp_answer,node_list=['n1','n2','n3','n4'],edge_list=[],node_properties=['name'])
view_url = get_view_url(gi_exp_answer)
display(HTML(f'<a href={view_url}>View Answer in ROBOKOP</a>'))
display(gi_exp_frame)

Return Status: 200


Unnamed: 0,n1 - name,n2 - name,n3 - name,n4 - name,score
0,LEP,glucose homeostasis,INS,rosiglitazone,0.664575
1,INS,glucose homeostasis,LEP,troglitazone,0.648367
2,LEP,glucose homeostasis,INS,troglitazone,0.645855
3,LEPR,glucose homeostasis,INS,rosiglitazone,0.640955
4,INS,glucose homeostasis,LEP,rosiglitazone,0.624642
5,PRKAA1,glucose homeostasis,PRKAA2,AICA ribonucleotide,0.622323
6,INSR,glucose homeostasis,INS,rosiglitazone,0.615093
7,PRKAA1,lipid biosynthetic process,PRKAA2,AICA ribonucleotide,0.614460
8,PRKAA1,energy homeostasis,PRKAA2,AICA ribonucleotide,0.614455
9,PRKAA1,fatty acid biosynthetic process,PRKAA2,AICA ribonucleotide,0.613039


In [9]:
sids_exp_question = pathway_expand_question(sids)
sids_exp_answer = quick(sids_exp_question,max_connectivity=500)
view_sids_url = get_view_url(sids_exp_answer)
display(HTML(f'<a href={view_sids_url}>View Answer in ROBOKOP</a>'))
sids_exp_frame = parse_answer(sids_exp_answer,node_list=['n0','n1','n2','n3','n4'],edge_list=[],node_properties=['name'])
display(sids_exp_frame)

Return Status: 200


Unnamed: 0,n0 - name,n1 - name,n2 - name,n3 - name,n4 - name,score
0,sudden infant death syndrome,CHRNB2,memory,TH,acetylcholine,0.555378
1,sudden infant death syndrome,CHRNB2,learning,TH,acetylcholine,0.549145
2,sudden infant death syndrome,TH,catecholamine biosynthetic process,TH,reserpine,0.546963
3,sudden infant death syndrome,TH,dopamine biosynthetic process,TH,reserpine,0.546119
4,sudden infant death syndrome,TH,tyrosine 3-monooxygenase activity,TH,reserpine,0.545042
5,sudden infant death syndrome,TH,response to hypoxia,TH,reserpine,0.545005
6,sudden infant death syndrome,TH,memory,TH,reserpine,0.544903
7,sudden infant death syndrome,TH,fatty acid metabolic process,TH,reserpine,0.544399
8,sudden infant death syndrome,TH,oxygen binding,TH,reserpine,0.543913
9,sudden infant death syndrome,TH,learning,TH,reserpine,0.543902


In [13]:
def pathway_expand_question_connected_process(disease_id):
    q = {
        "machine_question": {
            "nodes": [
                {
                    "id": "n0",
                    "type": "disease",
                    "curie": disease_id
                },
                {
                    "id": "n1",
                    "type": "gene"
                },
                {
                    "id": "n2",
                    "type": "biological_process_or_activity"
                },
                {
                    "id": "n3",
                    "type": "gene"
                },
                {
                    "id": "n4",
                    "type": "chemical_substance",
                    "drug": True
                }
            ],
            "edges": [
                {
                    "id": "e0",
                    "source_id": "n0",
                    "target_id": "n1"
                },
                {
                    "id": "e1",
                    "source_id": "n1",
                    "target_id": "n2"
                },
                {
                    "id": "e2",
                    "source_id": "n2",
                    "target_id": "n3"
                },
                {
                    "id": "e3",
                    "source_id": "n3",
                    "target_id": "n4"
                },
                {
                    "id": "e4",
                    "source_id": "n0",
                    "target_id": "n2"
                }
            ]
        }
    }
    return q

In [14]:
gi_exp_question = pathway_expand_question_connected_process(glucose_intolerance)
gi_exp_answer = quick(gi_exp_question,max_connectivity=1500)
gi_exp_frame = parse_answer(gi_exp_answer,node_list=['n1','n2','n3','n4'],edge_list=[],node_properties=['name'])
view_url = get_view_url(gi_exp_answer)
display(HTML(f'<a href={view_url}>View Answer in ROBOKOP</a>'))
display(gi_exp_frame)

Return Status: 200


Unnamed: 0,n1 - name,n2 - name,n3 - name,n4 - name,score
0,LEP,glucose metabolic process,INS,rosiglitazone,0.800535
1,INS,glucose metabolic process,LEP,troglitazone,0.770265
2,LEP,glucose metabolic process,INS,troglitazone,0.767598
3,PRKAA1,glucose metabolic process,PRKAA1,metformin,0.765756
4,INS,glucose metabolic process,PRKAA1,metformin,0.765411
5,PRKAA1,glucose metabolic process,INS,metformin,0.749784
6,INS,glucose metabolic process,INS,metformin,0.749432
7,LEP,glucose metabolic process,INS,metformin,0.748530
8,INS,glucose metabolic process,LEP,rosiglitazone,0.741085
9,INS,glucose metabolic process,LEP,bisphenol A,0.737557
