# Workflow 1, Module 1, Question 2

## What conditions present [symptoms]? 

### Approach 1:  Quick to find conditions that have every symptom.

If this question means find any condition that has all of the given set of symptoms, then we can use the quick service.  Here we will do what's called a "star query" in the quick notebook. First, we'll define a quick function that takes a question and calls the service, then we'll define a function for creating the proper question format. Finally, we'll show some examples using symptom lists derived from the workflow1, module1 question 1 notebook.

In [1]:
import requests
import pandas as pd

In [2]:
def quick(question):
    url=f'http://robokop.renci.org:80/api/simple/quick/'
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

In [3]:
def make_star_question(types,curies,shared_type):
    """Create a question to find entities of shared_type that are linked to all of the nodes specified in the
    types and curies arrays."""
    question = {
                'rebuild': 'False',  #We're doing disease/phenotype, so they're cached.
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    question['machine_question']['nodes'].append( {'id': 'n0', 'type': shared_type})
    for i,t in enumerate(types):
        newnode = {'id': f'n{i+1}', 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        question['machine_question']['edges'].append( {'id':f'e{i}','source_id': 'n0', 'target_id': f'n{i+1}'})
    return question

In [4]:
def make_shared_symptom_question(curies):
    types = [ 'phenotypic_feature' for c in curies ]
    shared_type = 'disease'
    return make_star_question(types,curies,shared_type)

We will use this function to pretty print quick results.

In [5]:
def extract_node_info(returnanswer):
    #First, parse out the parts of the kg that we want, names and types
    kg_node_names = { n['id']: n['name'] if 'name' in n else n['id'] for n in returnanswer['knowledge_graph']['nodes'] }
    #now put the answers into a table, looking up node/edge info from the above dicts
    #because of the question, we know that we're interested in the binding for "n0"
    answers = [ {"result_id": answer['node_bindings']['n0'], 
                 "result_name": kg_node_names[answer['node_bindings']['n0']], #if 'name' in node else node['id']
                }
              for answer in returnanswer['answers']]
    return pd.DataFrame(answers)


We can now pass a list of curies to make_shared_symptom_question, and then pass the result to quick.  We'll be using (arbitrarily) the top 5 phenotypes from ROBOKOP's answers to question 1.

In [6]:
# These are "Maturity-onset diabetes of the young", "Recurrent hypoglycemia", "Glucose intolerance", 
# "Beta-cell dysfunction", and "Hyperinsulinemia"
diabetes_symptoms=['HP:0004904', 'HP:0001988', 'HP:0000833', 'HP:0006279','HP:0000842']
diabetes_question = make_shared_symptom_question(diabetes_symptoms)
diabetes_question

{'machine_question': {'edges': [{'id': 'e0',
    'source_id': 'n0',
    'target_id': 'n1'},
   {'id': 'e1', 'source_id': 'n0', 'target_id': 'n2'},
   {'id': 'e2', 'source_id': 'n0', 'target_id': 'n3'},
   {'id': 'e3', 'source_id': 'n0', 'target_id': 'n4'},
   {'id': 'e4', 'source_id': 'n0', 'target_id': 'n5'}],
  'nodes': [{'id': 'n0', 'type': 'disease'},
   {'curie': 'HP:0004904', 'id': 'n1', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0001988', 'id': 'n2', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0000833', 'id': 'n3', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0006279', 'id': 'n4', 'type': 'phenotypic_feature'},
   {'curie': 'HP:0000842', 'id': 'n5', 'type': 'phenotypic_feature'}]},
 'rebuild': 'False'}

In [7]:
diabetes_answer = quick(diabetes_question)
diabetes_nodes = extract_node_info(diabetes_answer)
diabetes_nodes

Return Status: 200


Unnamed: 0,result_id,result_name
0,MONDO:0005048,pancreatic insulin-producing neuroendocrine tumor
1,MONDO:0005148,type 2 diabetes mellitus
2,MONDO:0002177,hyperinsulinism (disease)
3,MONDO:0005147,type 1 diabetes mellitus
4,MONDO:0005815,pancreatic neuroendocrine neoplasm
5,MONDO:0009831,malignant pancreatic neoplasm
6,MONDO:0005015,diabetes mellitus (disease)
7,MONDO:0002516,digestive system cancer
8,MONDO:0015936,rare tumor of endocrine glands
9,MONDO:0021069,malignant endocrine neoplasm


In [8]:
#['Exercise-induced asthma(HP:0012652)',
# 'Allergic rhinitis(HP:0003193)',
# 'Obstructive lung disease(HP:0006536)',
# 'Status asthmaticus(HP:0012653)',
# 'Increased IgE level(HP:0003212)',
asthma_symptoms=['HP:0012652','HP:0003193','HP:0006536','HP:0012653','HP:0003212']
asthma_question = make_shared_symptom_question(asthma_symptoms)

In [9]:
asthma_answer = quick(asthma_question)
asthma_nodes = extract_node_info(asthma_answer)
asthma_nodes

Return Status: 200


Unnamed: 0,result_id,result_name
0,MONDO:0004979,asthma
1,MONDO:0005087,respiratory system disease
2,MONDO:0000270,lower respiratory tract disease
3,MONDO:0001358,bronchial disease


### Approach 2: Enriched Expansion

A downside of the intersection approach above is that the conditions that come out must be attached to every phenotype.  The results then, can be very sensitive to exactly which phenotypes are chosen as input.  A less sensitive way to handle this is to find conditions that are enriched for the list of phenotypes.  This lets us be more open in which phenotypes we're including, and also which conditions are returning.  We'll use the same list of phenotypes for illustration, but it would also make sense to go back to the list and be more inclusive.  We'll also use `include_descendants=True`, but experimentation with that option would be worthwhile.

In [10]:
def enrichment(type1,identlist,type2,threshhold=None,maxresults=None,numtype1=None,include_descendants=None,rebuild=None):
    url=f'http://robokop.renci.org/api/simple/enriched/{type1}/{type2}'
    params = { 'threshhold': threshhold, 'max_results': maxresults, 
              'num_type1':numtype1, 'identifiers': identlist, 
              'include_descendants':include_descendants, 'rebuild': rebuild }
    params = { k:v for k,v in params.items() if v is not None }
    response=requests.post(url, json = params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [11]:
diabetes_enriched = enrichment('phenotypic_feature',diabetes_symptoms,'disease',threshhold=0.01,include_descendants=True)
diabetes_enriched_frame = pd.DataFrame(diabetes_enriched)
diabetes_enriched_frame

Return Status: 200


Unnamed: 0,id,name,p
0,MONDO:0021470,benign neoplasm of pancreas,1.303577e-15
1,MONDO:0018520,rare epithelial tumor of pancreas,5.203943e-13
2,MONDO:0019954,pancreatic neuroendocrine tumor,5.203943e-13
3,MONDO:0015882,rare tumor of pancreas,5.426063e-13
4,MONDO:0005893,pancreatic endocrine carcinoma,8.133611e-13
5,MONDO:0005048,pancreatic insulin-producing neuroendocrine tumor,1.714101e-12
6,MONDO:0015078,gastroenteropancreatic neuroendocrine neoplasm,3.699616e-12
7,MONDO:0000386,"digestive system neuroendocrine tumor, grade 1/2",4.745344e-12
8,MONDO:0002177,hyperinsulinism (disease),1.624023e-11
9,MONDO:0000385,benign digestive system neoplasm,1.753150e-11


In [12]:
asthma_enriched = enrichment('phenotypic_feature',asthma_symptoms,'disease',threshhold=0.1,include_descendants=True)
asthma_enriched_frame = pd.DataFrame(asthma_enriched)
asthma_enriched_frame

Return Status: 200


Unnamed: 0,id,name,p
0,MONDO:0001358,bronchial disease,1.521630e-08
1,MONDO:0004979,asthma,5.216939e-08
2,MONDO:0000771,allergic respiratory disease,5.852291e-07
3,MONDO:0020028,rare allergic respiratory disease,5.852291e-07
4,MONDO:0000270,lower respiratory tract disease,1.915463e-06
5,MONDO:0005749,eosinophilic pneumonia,2.773158e-06
6,MONDO:0002267,obstructive lung disease,3.674500e-06
7,MONDO:0011751,"COPD, severe early onset",3.721814e-06
8,MONDO:0002232,nasal cavity disease,5.547244e-06
9,MONDO:0005990,tracheitis,7.527456e-06
