# Workflow 1, Module 1, Questions 2 and 3

## What conditions present [symptoms]?
## Filter [conditions] to only keep ones with defined genetic causes.

In ROBOKOP, we don't do question 3 (filtering to genetic conditions) independently.  Instead, we have defined a subtype of disease called `genetic_condition`.  If we redo the analyses from the question 2 notebook, replacing `disease` with `genetic_condition`, then we will be answering questions 2 and 3 as one.  See below:

### Approach 1:  Quick to find conditions that have every symptom.

If this question means find any condition that has all of the given set of symptoms, then we can use the quick service.  Here we will do what's called a "star query" in the quick notebook. First, we'll define a quick function that takes a question and calls the service, then we'll define a function for creating the proper question format. Finally, we'll show some examples using symptom lists derived from the workflow1, module1 question 1 notebook.

In [2]:
import requests
import pandas as pd

In [3]:
def quick(question):
    url=f'http://robokop.renci.org:80/api/simple/quick/'
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

In [4]:
def make_star_question(types,curies,shared_type):
    """Create a question to find entities of shared_type that are linked to all of the nodes specified in the
    types and curies arrays."""
    question = {
                'rebuild': 'False',  #We're doing disease/phenotype, so they're cached.
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    question['machine_question']['nodes'].append( {'id': 'n0', 'type': shared_type})
    for i,t in enumerate(types):
        newnode = {'id': f'n{i+1}', 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        question['machine_question']['edges'].append( {'id':f'e{i}','source_id': 'n0', 'target_id': f'n{i+1}'})
    return question

In [5]:
def make_shared_symptom_question(curies):
    types = [ 'phenotypic_feature' for c in curies ]
    shared_type = 'genetic_condition'  #replaced disease
    return make_star_question(types,curies,shared_type)

We will use this function to pretty print quick results.

In [6]:
def extract_node_info(returnanswer):
    #First, parse out the parts of the kg that we want, names and types
    kg_node_names = { n['id']: n['name'] if 'name' in n else n['id'] for n in returnanswer['knowledge_graph']['nodes'] }
    #now put the answers into a table, looking up node/edge info from the above dicts
    #because of the question, we know that we're interested in the binding for "n0"
    answers = [ {"result_id": answer['node_bindings']['n0'], 
                 "result_name": kg_node_names[answer['node_bindings']['n0']], #if 'name' in node else node['id']
                }
              for answer in returnanswer['answers']]
    return pd.DataFrame(answers)

We can now pass a list of curies to make_shared_symptom_question, and then pass the result to quick.  We'll be using (arbitrarily) the top 5 phenotypes from ROBOKOP's answers to question 1.

In [7]:
# These are "Maturity-onset diabetes of the young", "Recurrent hypoglycemia", "Glucose intolerance", 
# "Beta-cell dysfunction", and "Hyperinsulinemia"
diabetes_symptoms=['HP:0004904', 'HP:0001988', 'HP:0000833', 'HP:0006279','HP:0000842']
diabetes_question = make_shared_symptom_question(diabetes_symptoms)
diabetes_question

{'machine_question': {'edges': [{'id': 'e0',
    'source_id': 'n0',
    'target_id': 'n1'},
   {'id': 'e1', 'source_id': 'n0', 'target_id': 'n2'},
   {'id': 'e2', 'source_id': 'n0', 'target_id': 'n3'},
   {'id': 'e3', 'source_id': 'n0', 'target_id': 'n4'},
   {'id': 'e4', 'source_id': 'n0', 'target_id': 'n5'}],
  'nodes': [{'id': 'n0', 'type': 'genetic_condition'},
   {'curie': 'HP:0004904', 'id': 'n1', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0001988', 'id': 'n2', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0000833', 'id': 'n3', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0006279', 'id': 'n4', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0000842', 'id': 'n5', 'type': 'phenotypic_feature'}]},
 'rebuild': 'False'}

In [8]:
diabetes_answer = quick(diabetes_question)
diabetes_nodes = extract_node_info(diabetes_answer)
diabetes_nodes

Return Status: 200


Unnamed: 0,result_id,result_name
0,MONDO:0004993,carcinoma
1,MONDO:0019041,rare genetic inherited tumor
2,MONDO:0015618,genetic pancreatic disease
3,MONDO:0019052,inborn errors of metabolism
4,MONDO:0015967,rare genetic diabetes mellitus
5,MONDO:0015615,rare genetic gastroenterological disease
6,MONDO:0015513,rare genetic endocrine disease
7,MONDO:0019214,inborn carbohydrate metabolic disorder


In [9]:
#['Exercise-induced asthma(HP:0012652)',
# 'Allergic rhinitis(HP:0003193)',
# 'Obstructive lung disease(HP:0006536)',
# 'Status asthmaticus(HP:0012653)',
# 'Increased IgE level(HP:0003212)',
asthma_symptoms=['HP:0012652','HP:0003193','HP:0006536','HP:0012653','HP:0003212']
asthma_question = make_shared_symptom_question(asthma_symptoms)

In [13]:
asthma_answer = quick(asthma_question)
#asthma_nodes = extract_node_info(asthma_answer)
#asthma_nodes

Return Status: 200


In [14]:
asthma_answer

'No results found'

The error above, while not terribly pretty or informative (to be fixed).  Is telling us that no results came back.  That is, there are no genetic conditions that share exactly this set of phenotypes.  A less restrictive approach is called for (see below).

### Approach 2: Enriched Expansion

A downside of the intersection approach above is that the conditions that come out must be attached to every phenotype.  The results then, can be very sensitive to exactly which phenotypes are chosen as input.  A less sensitive way to handle this is to find conditions that are enriched for the list of phenotypes.  This lets us be more open in which phenotypes we're including, and also which conditions are returning.  We'll use the same list of phenotypes for illustration, but it would also make sense to go back to the list and be more inclusive.  We'll also use `include_descendants=True`, but experimentation with that option would be worthwhile.

In [15]:
def enrichment(type1,identlist,type2,threshhold=None,maxresults=None,numtype1=None,include_descendants=None,rebuild=None):
    url=f'http://robokop.renci.org/api/simple/enriched/{type1}/{type2}'
    params = { 'threshhold': threshhold, 'max_results': maxresults, 
              'num_type1':numtype1, 'identifiers': identlist, 
              'include_descendants':include_descendants, 'rebuild': rebuild }
    params = { k:v for k,v in params.items() if v is not None }
    response=requests.post(url, json = params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [16]:
diabetes_enriched = enrichment('phenotypic_feature',diabetes_symptoms,'genetic_condition',threshhold=0.01,include_descendants=True)
diabetes_enriched_frame = pd.DataFrame(diabetes_enriched)
diabetes_enriched_frame

Return Status: 200


Unnamed: 0,id,name,p
0,MONDO:0012381,hyperinsulinism due to INSR deficiency,3.741608e-09
1,MONDO:0011236,hyperinsulinism due to glucokinase deficiency,2.303490e-08
2,MONDO:0015967,rare genetic diabetes mellitus,3.871420e-08
3,MONDO:0005803,hyperinsulinemic hypoglycemia (disease),1.952519e-07
4,MONDO:0015618,genetic pancreatic disease,2.970920e-07
5,MONDO:0004993,carcinoma,3.382536e-07
6,MONDO:0019010,congenital isolated hyperinsulinism,5.363540e-07
7,MONDO:0001076,glucose intolerance,2.038438e-06
8,MONDO:0017688,disorder of glycolysis,5.615776e-06
9,MONDO:0007540,multiple endocrine neoplasia type 1,1.035890e-05


In [17]:
asthma_enriched = enrichment('phenotypic_feature',asthma_symptoms,'genetic_condition',threshhold=0.1,include_descendants=True)
asthma_enriched_frame = pd.DataFrame(asthma_enriched)
asthma_enriched_frame

Return Status: 200


Unnamed: 0,id,name,p
0,MONDO:0011751,"COPD, severe early onset",0.000001
1,MONDO:0016575,primary ciliary dyskinesia,0.000005
2,MONDO:0018395,male infertility due to sperm motility disorder,0.000005
3,MONDO:0007535,"emphysema, hereditary pulmonary",0.000050
4,MONDO:0018390,male infertility due to sperm disorder,0.000057
5,MONDO:0007817,"IgE responsiveness, atopic",0.000110
6,MONDO:0014697,"immunodeficiency, common variable, 12",0.000167
7,MONDO:0009663,mucus inspissation of respiratory tract,0.000194
8,MONDO:0014203,primary ciliary dyskinesia 25,0.000313
9,MONDO:0018305,chronic granulomatous disease,0.000366
