# Workflow 1, Module 1, Question 1

## What are the defining symptoms / phenotypes of [condition x]?

### Approach 1:  Expand

ROBOKOP expand will return all phenotypes associated with a condition, ranked using its normal algorithm.  It is then up to the user to set a threshhold on how many of these symptoms are interesting/acceptable.

In [10]:
import requests
import pandas as pd

In [4]:
def expand(type1,identifier,type2,rebuild=None,csv=None,predicate=None):
    url=f'http://robokop.renci.org:80/api/simple/expand/{type1}/{identifier}/{type2}'
    params = {'rebuild': rebuild, 
              'csv'    : csv,
              'predicate': predicate} 
    params = { k:v for k,v in params.items() if v is not None }
    response = requests.get(url,params=params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

Note that here we are returning a simple format of the output for display.  To get the identifiers it may be more useful to call with csv = False and parse the result

In [7]:
diabetes = 'MONDO:0005148' #type 2 diabetes
diabetes_phenotypes = expand('disease',diabetes,'phenotypic_feature',csv=True)
n = 20
print( f"Returned {len(diabetes_phenotypes)} phenotypes" )
print( f"Top {n}:")
diabetes_phenotypes[:20]

Return Status: 200
Returned 194 phenotypes
Top 20:


['Maturity-onset diabetes of the young(HP:0004904)',
 'Recurrent hypoglycemia(HP:0001988)',
 'Glucose intolerance(HP:0000833)',
 'Beta-cell dysfunction(HP:0006279)',
 'Hyperinsulinemia(HP:0000842)',
 'Increased adipose tissue(HP:0009126)',
 'Decreased HDL cholesterol concentration(HP:0003233)',
 'Maternal diabetes(HP:0009800)',
 'Acanthosis nigricans(HP:0000956)',
 'Insulin-resistant diabetes mellitus(HP:0000831)',
 'Hyperglycemia(HP:0003074)',
 'Ketoacidosis(HP:0001993)',
 'Accelerated atherosclerosis(HP:0004943)',
 'Hypoglycemia(HP:0001943)',
 'Type I diabetes mellitus(HP:0100651)',
 'Fasting hyperinsulinemia(HP:0008283)',
 'Hypertriglyceridemia(HP:0002155)',
 'Glycosuria(HP:0003076)',
 'Increased body weight(HP:0004324)',
 'Diabetic ketoacidosis(HP:0001953)']

In [8]:
asthma = 'MONDO:0004979' #Asthma
asthma_phenotypes = expand('disease',asthma,'phenotypic_feature',csv=True)
n = 20
print( f"Returned {len(asthma_phenotypes)} phenotypes" )
print( f"Top {n}:")
asthma_phenotypes[:20]

Return Status: 200
Returned 128 phenotypes
Top 20:


['Exercise-induced asthma(HP:0012652)',
 'Allergic rhinitis(HP:0003193)',
 'Obstructive lung disease(HP:0006536)',
 'Status asthmaticus(HP:0012653)',
 'Increased IgE level(HP:0003212)',
 'Chronic bronchitis(HP:0004469)',
 'Chronic rhinitis(HP:0002257)',
 'Nasal polyposis(HP:0100582)',
 'Allergic conjunctivitis(HP:0007879)',
 'Bronchiolitis(HP:0011950)',
 'Asthma(HP:0002099)',
 'Eosinophilia(HP:0001880)',
 'Eczema(HP:0000964)',
 'Hypersensitivity pneumonitis(HP:0006516)',
 'Recurrent respiratory infections(HP:0002205)',
 'Chronic sinusitis(HP:0011109)',
 'Atopic dermatitis(HP:0001047)',
 'Bronchiectasis(HP:0002110)',
 'IgE deficiency(HP:0005479)',
 'Recurrent bronchitis(HP:0002837)']

### Approach 2: Enriched Expansion (No Descendants)

Here we will start with a condition and find phenotypes that have a high enrichment factor.  That is, they are linked to the condition at a higher rate than might be expected.  Given that we are doing enrichment with a single input, we're really finding the phenotypes that are linked specifically to this condition.

In [9]:
def enrichment(type1,identlist,type2,threshhold=None,maxresults=None,numtype1=None,include_descendants=None,rebuild=None):
    url=f'http://robokop.renci.org/api/simple/enriched/{type1}/{type2}'
    params = { 'threshhold': threshhold, 'maxresults': maxresults, 
              'num_type1':numtype1, 'identifiers': identlist, 
              'include_descendants':include_descendants, 'rebuild': rebuild }
    params = { k:v for k,v in params.items() if v is not None }
    response=requests.post(url, json = params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [11]:
enriched_diabetes_phenotypes = enrichment('disease',[diabetes],'phenotypic_feature')
pd.DataFrame(enriched_diabetes_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0000745,Diminished motivation,0.002678
1,HP:0005974,Episodic ketoacidosis,0.004603
2,HP:0006279,Beta-cell dysfunction,0.005691
3,HP:0100739,Bulimia,0.005859
4,HP:0008711,Benign prostatic hyperplasia,0.006361
5,HP:0001993,Ketoacidosis,0.008704
6,HP:0001095,Hypertensive retinopathy,0.008955
7,HP:0009126,Increased adipose tissue,0.009123
8,HP:0004904,Maturity-onset diabetes of the young,0.009960
9,HP:0008887,Adipose tissue loss,0.009960


In [12]:
enriched_asthma_phenotypes = enrichment('disease',[asthma],'phenotypic_feature')
pd.DataFrame(enriched_asthma_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0012652,Exercise-induced asthma,0.000335
1,HP:0012653,Status asthmaticus,0.000418
2,HP:0010865,Oppositional defiant disorder,0.005775
3,HP:0007879,Allergic conjunctivitis,0.008704
4,HP:0002257,Chronic rhinitis,0.010211
5,HP:0005943,Respiratory arrest,0.010546
6,HP:0001686,Loss of voice,0.011215
7,HP:0005972,Respiratory acidosis,0.01155
8,HP:0100033,Tics,0.012136
9,HP:0100845,Anaphylactic shock,0.013308


### Approach 3: Enriched Expansion with Descendants

Sometimes it can be useful to enrich on a larger set of inputs.  We use the descendents of the input condition to generate an input set here.

In [13]:
desc_enriched_diabetes_phenotypes = enrichment('disease',[diabetes],'phenotypic_feature',include_descendants=True)
pd.DataFrame(desc_enriched_diabetes_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0004904,Maturity-onset diabetes of the young,0.000004
1,HP:0000831,Insulin-resistant diabetes mellitus,0.000104
2,HP:0005974,Episodic ketoacidosis,0.000124
3,HP:0006279,Beta-cell dysfunction,0.000190
4,HP:0003074,Hyperglycemia,0.000305
5,HP:0001993,Ketoacidosis,0.000445
6,HP:0100651,Type I diabetes mellitus,0.000505
7,HP:0008887,Adipose tissue loss,0.000583
8,HP:0001950,Respiratory alkalosis,0.000602
9,HP:0007485,Absence of subcutaneous fat,0.000674


In [14]:
desc_enriched_asthma_phenotypes = enrichment('disease',[asthma],'phenotypic_feature',include_descendants=True)
pd.DataFrame(desc_enriched_asthma_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0012653,Status asthmaticus,4.202662e-07
1,HP:0005943,Respiratory arrest,0.0003287244
2,HP:0005972,Respiratory acidosis,0.0003943284
3,HP:0002099,Asthma,0.0005570363
4,HP:0002913,Myoglobinuria,0.0006337839
5,HP:0003750,Increased muscle fatiguability,0.0009030892
6,HP:0012652,Exercise-induced asthma,0.0010041
7,HP:0004469,Chronic bronchitis,0.001953331
8,HP:0003201,Rhabdomyolysis,0.002267048
9,HP:0011950,Bronchiolitis,0.003561395
