# Workflow 1, Module 1, Questions 2 and 3

## What conditions present [symptoms]?
## Filter [conditions] to only keep ones with defined genetic causes.

In ROBOKOP, we don't do question 3 (filtering to genetic conditions) independently.  Instead, we have defined a subtype of disease called `genetic_condition`.  If we redo the analyses from the question 2 notebook, replacing `disease` with `genetic_condition`, then we will be answering questions 2 and 3 as one.  See below:

### Approach 1:  Quick to find conditions that have every symptom.

If this question means find any condition that has all of the given set of symptoms, then we can use the quick service.  Here we will do what's called a "star query" in the quick notebook. First, we'll define a quick function that takes a question and calls the service, then we'll define a function for creating the proper question format. Finally, we'll show some examples using symptom lists derived from the workflow1, module1 question 1 notebook.

In [36]:
import requests
import pandas as pd

In [37]:
def quick(question):
    url=f'http://robokop.renci.org:80/api/simple/quick/'
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

In [38]:
def make_star_question(types,curies,shared_type):
    """Create a question to find entities of shared_type that are linked to all of the nodes specified in the
    types and curies arrays."""
    question = {
                'rebuild': 'False',
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    question['machine_question']['nodes'].append( {'id': 0, 'type': shared_type})
    for i,t in enumerate(types):
        newnode = {'id': i+1, 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        question['machine_question']['edges'].append( {'source_id': 0, 'target_id': i+1})
    return question

In [39]:
def make_shared_symptom_question(curies):
    types = [ 'phenotypic_feature' for c in curies ]
    shared_type = 'genetic_condition'  #replaced disease
    return make_star_question(types,curies,shared_type)

We will use this function to pretty print quick results.

In [40]:
def extract_node_info(returnanswer):
    nodes = [{'name': answer['nodes'][0]['name'], 'id': answer['nodes'][0]['id']} for answer in returnanswer['answers']]
    return pd.DataFrame(nodes)


We can now pass a list of curies to make_shared_symptom_question, and then pass the result to quick.  We'll be using (arbitrarily) the top 5 phenotypes from ROBOKOP's answers to question 1.

In [41]:
# These are "Maturity-onset diabetes of the young", "Recurrent hypoglycemia", "Glucose intolerance", 
# "Beta-cell dysfunction", and "Hyperinsulinemia"
diabetes_symptoms=['HP:0004904', 'HP:0001988', 'HP:0000833', 'HP:0006279','HP:0000842']
diabetes_question = make_shared_symptom_question(diabetes_symptoms)
diabetes_question

{'machine_question': {'edges': [{'source_id': 0, 'target_id': 1},
   {'source_id': 0, 'target_id': 2},
   {'source_id': 0, 'target_id': 3},
   {'source_id': 0, 'target_id': 4},
   {'source_id': 0, 'target_id': 5}],
  'nodes': [{'id': 0, 'type': 'genetic_condition'},
   {'curie': 'HP:0004904', 'id': 1, 'type': 'phenotypic_feature'},
   {'curie': 'HP:0001988', 'id': 2, 'type': 'phenotypic_feature'},
   {'curie': 'HP:0000833', 'id': 3, 'type': 'phenotypic_feature'},
   {'curie': 'HP:0006279', 'id': 4, 'type': 'phenotypic_feature'},
   {'curie': 'HP:0000842', 'id': 5, 'type': 'phenotypic_feature'}]},
 'rebuild': 'False'}

In [42]:
diabetes_answer = quick(diabetes_question)
diabetes_nodes = extract_node_info(diabetes_answer)
diabetes_nodes

Return Status: 200


Unnamed: 0,id,name
0,MONDO:0005066,metabolic disease
1,MONDO:0019041,rare genetic inherited tumor
2,MONDO:0004993,carcinoma
3,MONDO:0015967,rare genetic diabetes mellitus
4,MONDO:0015618,genetic pancreatic disease
5,MONDO:0019214,inborn carbohydrate metabolic disorder
6,MONDO:0015615,rare genetic gastroenterological disease
7,MONDO:0015513,rare genetic endocrine disease


In [43]:
#['Exercise-induced asthma(HP:0012652)',
# 'Allergic rhinitis(HP:0003193)',
# 'Obstructive lung disease(HP:0006536)',
# 'Status asthmaticus(HP:0012653)',
# 'Increased IgE level(HP:0003212)',
asthma_symptoms=['HP:0012652','HP:0003193','HP:0006536','HP:0012653','HP:0003212']
asthma_question = make_shared_symptom_question(asthma_symptoms)

In [44]:
asthma_answer = quick(asthma_question)
asthma_nodes = extract_node_info(asthma_answer)
asthma_nodes

Return Status: 500


TypeError: 'Response' object is not subscriptable

The error above, while not terribly pretty or informative (to be fixed).  Is telling us that no results came back.  That is, there are no genetic conditions that share exactly this set of phenotypes.  A less restrictive approach is called for (see below).

### Approach 2: Enriched Expansion

A downside of the intersection approach above is that the conditions that come out must be attached to every phenotype.  The results then, can be very sensitive to exactly which phenotypes are chosen as input.  A less sensitive way to handle this is to find conditions that are enriched for the list of phenotypes.  This lets us be more open in which phenotypes we're including, and also which conditions are returning.  We'll use the same list of phenotypes for illustration, but it would also make sense to go back to the list and be more inclusive.  We'll also use `include_descendants=True`, but experimentation with that option would be worthwhile.

In [45]:
def enrichment(type1,identlist,type2,threshhold=None,maxresults=None,numtype1=None,include_descendants=None,rebuild=None):
    url=f'http://robokop.renci.org/api/simple/enriched/{type1}/{type2}'
    params = { 'threshhold': threshhold, 'maxresults': maxresults, 
              'num_type1':numtype1, 'identifiers': identlist, 
              'include_descendants':include_descendants, 'rebuild': rebuild }
    params = { k:v for k,v in params.items() if v is not None }
    response=requests.post(url, json = params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [46]:
diabetes_enriched = enrichment('phenotypic_feature',diabetes_symptoms,'genetic_condition',threshhold=0.01,include_descendants=True)
diabetes_enriched_frame = pd.DataFrame(diabetes_enriched)
diabetes_enriched_frame

Return Status: 200


Unnamed: 0,id,name,p
0,MONDO:0001076,glucose intolerance,2.038003e-08
1,MONDO:0015967,rare genetic diabetes mellitus,9.380591e-08
2,MONDO:0011236,hyperinsulinism due to glucokinase deficiency,1.994226e-07
3,MONDO:0004993,carcinoma,7.123472e-07
4,MONDO:0015618,genetic pancreatic disease,7.123472e-07
5,MONDO:0001324,hyperandrogenism,2.159753e-06
6,MONDO:0005803,hyperinsulinemic hypoglycemia (disease),5.350974e-06
7,MONDO:0017688,disorder of glycolysis,9.001780e-06
8,MONDO:0007540,multiple endocrine neoplasia type 1,1.627879e-05
9,MONDO:0012819,diabetic ketoacidosis,1.877935e-05


In [47]:
asthma_enriched = enrichment('phenotypic_feature',asthma_symptoms,'genetic_condition',threshhold=0.1,include_descendants=True)
asthma_enriched_frame = pd.DataFrame(asthma_enriched)
asthma_enriched_frame

Return Status: 200


Unnamed: 0,id,name,p
0,MONDO:0016575,primary ciliary dyskinesia,0.000012
1,MONDO:0018395,male infertility due to sperm motility disorder,0.000015
2,MONDO:0018390,male infertility due to sperm disorder,0.000170
3,MONDO:0009735,Netherton syndrome,0.000360
4,MONDO:0007818,autosomal dominant hyper-IgE syndrome,0.000825
5,MONDO:0013282,alpha 1-antitrypsin deficiency,0.000853
6,MONDO:0008575,nicotine dependence,0.001127
7,MONDO:0008903,lung cancer,0.001298
8,MONDO:0009088,"deafness, neural, with atypical atopic dermatitis",0.001535
9,MONDO:0007535,"emphysema, hereditary pulmonary",0.001535
