# Workflow 1, Module 1 (condition similarity)

One approach to solving this module is to not define quite so tightly what's going on at the subquestion level in terms of enrichements and archetypes and so on, but simply pass the question to ROBOKOP and let its scoring bring the the best answers to the top.  Here we will use the quick service to start with a disease, find relevant phenotypes, and from there find genetic conditions. The answers will come out ranked by path.

For more details, see the "quick" notebook in greengamma/general.

First, we'll have a quick function that calls the quick service, and some functions for properly creating the question.  Then we'll create the question, run it, and pretty print it for two examples: diabetes and asthma.

In [1]:
import requests
import pandas as pd

def quick(question,max_connectivity=None):
    url=f'http://robokop.renci.org:80/api/simple/quick'
    if max_connectivity is not None:
        url += f'?max_connectivity={max_connectivity}'
    print(url)
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

The basic machine question created below goes from a disease to a set of phenotypes to a genetic_condition.  Making the phenotypes a set allows there to be many phenotypes that connect the disease to the condition.

In [2]:
def create_basic_question(disease_id):
    return {
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "disease",
                "curie": disease_id
            },
            {
                "id": "n1",
                "type": "phenotypic_feature",
                "set": True
            },
            {
                "id": "n2",
                "type": "genetic_condition"
            }
        ],
        "edges": [
            {
                "id": "e0",
                "source_id": "n0",
                "target_id": "n1"
            },
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n2"
            }
        ]
    }
}

In [3]:
def parse_answer(returnanswer):
    #First, parse out the parts of the kg that we want, names and types
    kg_node_names = { n['id']: n['name'] if 'name' in n else n['id'] for n in returnanswer['knowledge_graph']['nodes'] }
    answers = [ {"phenotype_id": answer['node_bindings']['n1'],
                 "phenotype_name": [kg_node_names[x] for x in answer['node_bindings']['n1']],
                 "condition_id": answer['node_bindings']['n2'], 
                 "condition_name": kg_node_names[answer['node_bindings']['n2']], #if 'name' in node else node['id'], 
                 "score" :      answer['score']
                }
              for answer in returnanswer['answers']]
    return pd.DataFrame(answers)

In [4]:
diabetes = 'MONDO:0005148' #type 2 diabetes
asthma = 'MONDO:0004979' #asthma

For both diabetes and asthma, let's make a question of the type above, and run it.  The max_connectivity option sets a maximum degree for a node in the path, and is used to control the amount of time it takes to run and the specificity of the result. 1000 is a decent across the board value.

In [18]:
diabetes_question = create_basic_question(diabetes)
diabetes_answer = quick(diabetes_question,max_connectivity=1000)

http://robokop.renci.org:80/api/simple/quick?max_connectivity=1000
Return Status: 200


In [None]:
asthma_question = create_basic_question(asthma)
asthma_answer = quick(asthma_question,max_connectivity=1000)

In [6]:
diabetes_frame = parse_answer(diabetes_answer)
diabetes_frame.head()

Unnamed: 0,condition_id,condition_name,phenotype_id,phenotype_name,score
0,MONDO:0011565,metabolic syndrome X,"[HP:0000831, HP:0100753, HP:0003233, HP:000084...","[Insulin-resistant diabetes mellitus, Schizoph...",65.627247
1,MONDO:0011382,sickle cell anemia,"[HP:0000802, HP:0003146, HP:0001900, HP:000215...","[Impotence, Hypocholesterolemia, Increased hem...",34.605712
2,MONDO:0001076,glucose intolerance,"[HP:0003233, HP:0000833, HP:0002621, HP:000085...",[Decreased circulating high-density lipoprotei...,31.84894
3,MONDO:0006507,hereditary hemochromatosis,"[HP:0000842, HP:0002621, HP:0000855, HP:001209...","[Hyperinsulinemia, Atherosclerosis, Insulin re...",31.190267
4,MONDO:0008575,nicotine dependence,"[HP:0002621, HP:0000704, HP:0100785, HP:000073...","[Atherosclerosis, Periodontitis, Insomnia, Dis...",30.738832


The results that come back make reasonable sense, especially metabolic syndrome X and glucose intolerance.  

In [7]:
asthma_frame = parse_answer(asthma_answer)
asthma_frame.head()

Unnamed: 0,condition_id,condition_name,phenotype_id,phenotype_name,score
0,MONDO:0007186,gastroesophageal reflux disease,"[HP:0002110, HP:0004469, HP:0100021, HP:000287...","[Bronchiectasis, Chronic bronchitis, Cerebral ...",21.838889
1,MONDO:0010086,sudden infant death syndrome,"[HP:0002791, HP:0100710, HP:0002788, HP:000594...","[Hypoventilation, Impulsivity, Recurrent upper...",21.767215
2,MONDO:0015977,agammaglobulinemia,"[HP:0006510, HP:0002846, HP:0006528, HP:001195...","[Chronic obstructive pulmonary disease, Abnorm...",20.901615
3,MONDO:0013282,alpha 1-antitrypsin deficiency,"[HP:0006510, HP:0006536, HP:0100665, HP:001195...","[Chronic obstructive pulmonary disease, Obstru...",20.091313
4,MONDO:0001901,selective IgG subclass deficiency,"[HP:0002843, HP:0010701, HP:0002837, HP:001110...","[Abnormal T cell morphology, Abnormal immunogl...",18.574985


The results for asthma look a bit odder, but we can dig into the phenotypes a little bit and see what it is that are connecting, e.g. asthma and sudden infant death syndrome, which shows a pretty reasonble amount of respiratory phenotypes.

In [8]:
asthma_frame.loc[0,'phenotype_name']

['Bronchiectasis',
 'Chronic bronchitis',
 'Cerebral palsy',
 'Obstructive sleep apnea',
 'Sleep apnea',
 'Pneumothorax',
 'Stridor',
 'Otitis media',
 'Chronic sinusitis',
 'Hoarse voice',
 'Abnormality of the voice',
 'Abnormality of the middle ear',
 'Bronchitis',
 'Abnormal bronchus morphology',
 'Abnormal tracheobronchial morphology',
 'Abnormality of the upper respiratory tract',
 'Abnormality of the nasopharynx',
 'Abnormal pattern of respiration',
 'Abnormality of the pharynx',
 'Abnormal vascular physiology',
 'Abnormality of cardiovascular system electrophysiology',
 'Abnormality of esophagus morphology',
 'Abnormality of esophagus physiology',
 'Abnormality of the stomach',
 'Abnormality of the paranasal sinuses',
 'Abnormal social behavior',
 'Abnormal aggressive, impulsive or violent behavior',
 'Neoplasm of the peripheral nervous system',
 'Impairment in personality functioning',
 'Immunologic hypersensitivity',
 'Recurrent lower respiratory tract infections',
 'Recurrent

We can potentially sharpen some of these answers if we allow our searches to include conditions that are similar by both phenotype and biological process.  That is, we want to allow two paths connecting the input to the output, saying that they should be similar phenotypically, but also similar in terms of the processes that create the disease:

In [9]:
def create_complex_question(disease_id,other_type):
    return {
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "disease",
                "curie": disease_id
            },
            {
                "id": "n1",
                "type": "phenotypic_feature",
                "set": True
            },
            {
                "id": "n2",
                "type": "genetic_condition"
            },
            {
                "id": "n3",
                "type": other_type,
                "set": True
            }
        ],
        "edges": [
            {
                "id": "e0",
                "source_id": "n0",
                "target_id": "n1"
            },
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n2"
            },
            {
                "id": "e2",
                "source_id": "n0",
                "target_id": "n3"
            },
            {
                "id": "e3",
                "source_id": "n3",
                "target_id": "n2"
            }
        ]
    }
}

In [16]:
two_part_diabetes_question = create_complex_question(diabetes,'biological_process_or_activity')
two_part_asthma_question = create_complex_question(asthma,'biological_process_or_activity')
two_part_diabetes_answer = quick(two_part_diabetes_question,max_connectivity=1000)
two_part_asthma_answer = quick(two_part_asthma_question,max_connectivity=1000)

http://robokop.renci.org:80/api/simple/quick?max_connectivity=1000
Return Status: 200
http://robokop.renci.org:80/api/simple/quick?max_connectivity=1000
Return Status: 200


In [17]:
diabetes_frame = parse_answer(two_part_diabetes_answer)
diabetes_frame.head()

Unnamed: 0,condition_id,condition_name,phenotype_id,phenotype_name,score
0,MONDO:0012819,diabetic ketoacidosis,"[HP:0001942, HP:0001735, HP:0100753, HP:000195...","[Metabolic acidosis, Acute pancreatitis, Schiz...",0
1,MONDO:0001076,glucose intolerance,"[HP:0004950, HP:0000842, HP:0009126, HP:000003...","[Peripheral arterial stenosis, Hyperinsulinemi...",0
2,MONDO:0015967,rare genetic diabetes mellitus,"[HP:0003758, HP:0009125, HP:0002359, HP:000730...","[Reduced subcutaneous adipose tissue, Lipodyst...",0
3,MONDO:0007455,"diabetes mellitus, noninsulin-dependent","[HP:0000855, HP:0011014, HP:0011013]","[Insulin resistance, Abnormal glucose homeosta...",0
4,MONDO:0008763,Alstrom syndrome,"[HP:0000842, HP:0000147, HP:0002621, HP:000083...","[Hyperinsulinemia, Polycystic ovaries, Atheros...",0


In [12]:
two_part_asthma_answer

'No results found'

For diabetes, we get an improved set of outputs because the conditions are both phenotypically similar to diabetes, but also share some of the biological mechanisms of the diesease.  However, for asthma, we find no results - there are no biological activites associated with asthma in our database, so no connections can be made. 

In [13]:
two_part_diabetes_question = create_complex_question(diabetes,'gene')
two_part_asthma_question = create_complex_question(asthma,'gene')
two_part_diabetes_answer = quick(two_part_diabetes_question,max_connectivity=1000)
two_part_asthma_answer = quick(two_part_asthma_question,max_connectivity=1000)

http://robokop.renci.org:80/api/simple/quick?max_connectivity=1000
Return Status: 200
http://robokop.renci.org:80/api/simple/quick?max_connectivity=1000
Return Status: 200


In [14]:
diabetes_frame = parse_answer(two_part_diabetes_answer)
diabetes_frame.head()

Unnamed: 0,condition_id,condition_name,phenotype_id,phenotype_name,score
0,MONDO:0011565,metabolic syndrome X,"[HP:0004950, HP:0003758, HP:0009125, HP:000730...","[Peripheral arterial stenosis, Reduced subcuta...",0
1,MONDO:0008487,polycystic ovary syndrome,"[HP:0000842, HP:0000147, HP:0000833, HP:000262...","[Hyperinsulinemia, Polycystic ovaries, Glucose...",0
2,MONDO:0011382,sickle cell anemia,"[HP:0001942, HP:0001712, HP:0000096, HP:000080...","[Metabolic acidosis, Left ventricular hypertro...",0
3,MONDO:0006507,hereditary hemochromatosis,"[HP:0000842, HP:0000802, HP:0000833, HP:000262...","[Hyperinsulinemia, Impotence, Glucose intolera...",0
4,MONDO:0001076,glucose intolerance,"[HP:0004950, HP:0000842, HP:0009126, HP:000003...","[Peripheral arterial stenosis, Hyperinsulinemi...",0


In [15]:
asthma_frame = parse_answer(two_part_asthma_answer)
asthma_frame.head()

Unnamed: 0,condition_id,condition_name,phenotype_id,phenotype_name,score
0,MONDO:0007186,gastroesophageal reflux disease,"[HP:0002110, HP:0002107, HP:0011109, HP:001053...","[Bronchiectasis, Pneumothorax, Chronic sinusit...",0
1,MONDO:0015977,agammaglobulinemia,"[HP:0006510, HP:0002110, HP:0011109, HP:000652...","[Chronic obstructive pulmonary disease, Bronch...",0
2,MONDO:0015517,common variable immunodeficiency,"[HP:0006510, HP:0002110, HP:0006528, HP:000284...","[Chronic obstructive pulmonary disease, Bronch...",0
3,MONDO:0010086,sudden infant death syndrome,"[HP:0005943, HP:0001742, HP:0006543, HP:001053...","[Respiratory arrest, Nasal obstruction, Cardio...",0
4,MONDO:0013282,alpha 1-antitrypsin deficiency,"[HP:0006510, HP:0002110, HP:0002107, HP:000652...","[Chronic obstructive pulmonary disease, Bronch...",0
