# Workflow 1, Module 1, Question 1

## What are the defining symptoms / phenotypes of [condition x]?

### Approach 1:  Expand

ROBOKOP expand will return all phenotypes associated with a condition, ranked using its normal algorithm.  It is then up to the user to set a threshhold on how many of these symptoms are interesting/acceptable.

In [1]:
import requests
import pandas as pd

In [2]:
def expand(type1,identifier,type2,rebuild=None,output_format=None,predicate=None):
    url=f'http://robokop.renci.org:80/api/simple/expand/{type1}/{identifier}/{type2}'
    params = {'rebuild'       : rebuild, 
              'output_format' : output_format,
              'predicate'     : predicate} 
    params = { k:v for k,v in params.items() if v is not None }
    response = requests.get(url,params=params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

Note that here we are returning a simple format of the output for display.  To get the identifiers it may be more useful to call with csv = False and parse the result

In [3]:
diabetes = 'MONDO:0005148' #type 2 diabetes
diabetes_phenotypes = expand('disease',diabetes,'phenotypic_feature',output_format='csv').split('\n')
n = 20
print( f"Returned {len(diabetes_phenotypes)} phenotypes" )
print( f"Top {n}:")
diabetes_phenotypes[:(n+1)]  # +1 because there's a header that comes back

Return Status: 200
Returned 251 phenotypes
Top 20:


['n0,n1',
 'MONDO:0005148,HP:0002578',
 'MONDO:0005148,HP:0000093',
 'MONDO:0005148,HP:0001824',
 'MONDO:0005148,HP:0004324',
 'MONDO:0005148,HP:0001513',
 'MONDO:0005148,HP:0000010',
 'MONDO:0005148,HP:0000831',
 'MONDO:0005148,HP:0000708',
 'MONDO:0005148,HP:0004325',
 'MONDO:0005148,HP:0001268',
 'MONDO:0005148,HP:0003881',
 'MONDO:0005148,HP:0001325',
 'MONDO:0005148,HP:0000166',
 'MONDO:0005148,HP:0001941',
 'MONDO:0005148,HP:0011675',
 'MONDO:0005148,HP:0002870',
 'MONDO:0005148,HP:0001943',
 'MONDO:0005148,HP:0001262',
 'MONDO:0005148,HP:0001271',
 'MONDO:0005148,HP:0003774']

In [5]:
asthma = 'MONDO:0004979' #Asthma
asthma_phenotypes = expand('disease',asthma,'phenotypic_feature',output_format='csv').split('\n')
n = 20
print( f"Returned {len(asthma_phenotypes)} phenotypes" )
print( f"Top {n}:")
asthma_phenotypes[:(n+1)]

Return Status: 200
Returned 251 phenotypes
Top 20:


['n0,n1',
 'MONDO:0004979,HP:0100749',
 'MONDO:0004979,HP:0001609',
 'MONDO:0004979,HP:0002094',
 'MONDO:0004979,HP:0002093',
 'MONDO:0004979,HP:0001337',
 'MONDO:0004979,HP:0002870',
 'MONDO:0004979,HP:0001618',
 'MONDO:0004979,HP:0002076',
 'MONDO:0004979,HP:0001025',
 'MONDO:0004979,HP:0000709',
 'MONDO:0004979,HP:0006510',
 'MONDO:0004979,HP:0002133',
 'MONDO:0004979,HP:0002960',
 'MONDO:0004979,HP:0002206',
 'MONDO:0004979,HP:0001047',
 'MONDO:0004979,HP:0006516',
 'MONDO:0004979,HP:0011947',
 'MONDO:0004979,HP:0100033',
 'MONDO:0004979,HP:0002110',
 'MONDO:0004979,HP:0100633']

### Approach 2: Enriched Expansion (No Descendants)

Here we will start with a condition and find phenotypes that have a high enrichment factor.  That is, they are linked to the condition at a higher rate than might be expected.  Given that we are doing enrichment with a single input, we're really finding the phenotypes that are linked specifically to this condition.

In [6]:
def enrichment(type1,identlist,type2,threshhold=None,maxresults=None,numtype1=None,include_descendants=None,rebuild=None):
    url=f'http://robokop.renci.org/api/simple/enriched/{type1}/{type2}'
    params = { 'threshhold': threshhold, 'max_results': maxresults, 
              'num_type1':numtype1, 'identifiers': identlist, 
              'include_descendants':include_descendants, 'rebuild': rebuild }
    params = { k:v for k,v in params.items() if v is not None }
    response=requests.post(url, json = params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [7]:
enriched_diabetes_phenotypes = enrichment('disease',[diabetes],'phenotypic_feature')
pd.DataFrame(enriched_diabetes_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,UMLS:C0206064,Microvascular Angina,0.000219
1,MESH:D053201,"Urinary Bladder, Overactive",0.000437
2,NCIT:C53652,Acute Coronary Syndrome,0.000510
3,NCIT:C94250,Overweight,0.000656
4,NCIT:C76325,Birth Weight,0.001239
5,HP:0012603,Abnormal urine sodium concentration,0.001458
6,NCIT:C81328,Body Weight,0.002041
7,HP:0008776,Abnormal renal artery morphology,0.002915
8,HP:0025323,Abnormal arterial physiology,0.003207
9,HP:0040063,Decreased adipose tissue,0.004373


In [8]:
enriched_asthma_phenotypes = enrichment('disease',[asthma],'phenotypic_feature')
pd.DataFrame(enriched_asthma_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0012417,Hypocapnia,0.000146
1,HP:0025095,Sneeze,0.000146
2,HP:0012652,Exercise-induced asthma,0.000292
3,HP:0012653,Status asthmaticus,0.000364
4,HP:0012416,Hypercapnia,0.000364
5,UMLS:C0026635,Mouth Breathing,0.000510
6,NCIT:C116315,Snoring,0.000583
7,EFO:1001793,fetal hypoxia,0.000729
8,UMLS:C0600467,Neurogenic Inflammation,0.000802
9,NCIT:C76325,Birth Weight,0.001239


### Approach 3: Enriched Expansion with Descendants

Sometimes it can be useful to enrich on a larger set of inputs.  We use the descendents of the input condition to generate an input set here.

In [9]:
desc_enriched_diabetes_phenotypes = enrichment('disease',[diabetes],'phenotypic_feature',include_descendants=True)
pd.DataFrame(desc_enriched_diabetes_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0011014,Abnormal glucose homeostasis,4.522681e-21
1,HP:0011013,Abnormality of carbohydrate metabolism/homeost...,3.590805e-20
2,HP:0000819,Diabetes mellitus,1.813143e-18
3,HP:0000818,Abnormality of the endocrine system,5.391267e-11
4,HP:0100651,Type I diabetes mellitus,2.620584e-09
5,HP:0012337,Abnormal homeostasis,2.969068e-09
6,HP:0030082,Abnormal drinking behavior,3.289071e-08
7,HP:0006217,Limited mobility of proximal interphalangeal j...,7.367831e-07
8,HP:0410050,"Decreased level of 1,5 anhydroglucitol in serum",1.512499e-06
9,HP:0001993,Ketoacidosis,1.551586e-06


In [10]:
desc_enriched_asthma_phenotypes = enrichment('disease',[asthma],'phenotypic_feature',include_descendants=True)
pd.DataFrame(desc_enriched_asthma_phenotypes)

Return Status: 200


Unnamed: 0,id,name,p
0,HP:0100326,Immunologic hypersensitivity,5.299831e-08
1,HP:0012653,Status asthmaticus,1.062095e-06
2,HP:0002099,Asthma,2.217554e-06
3,HP:0002795,Functional respiratory abnormality,0.0001208274
4,HP:0002793,Abnormal pattern of respiration,0.0005807957
5,HP:0012417,Hypocapnia,0.0007287567
6,HP:0025095,Sneeze,0.0007287567
7,HP:0011948,Acute respiratory tract infection,0.0007706434
8,HP:0012387,Bronchitis,0.0008347656
9,HP:0010978,Abnormality of immune system physiology,0.001031134
