# Workflow 1, Module 1 (condition similarity)

One approach to solving this module is to not define quite so tightly what's going on at the subquestion level in terms of enrichements and archetypes and so on, but simply pass the question to ROBOKOP and let its scoring bring the the best answers to the top.  Here we will use the quick service to start with a disease, find relevant phenotypes, and from there find genetic conditions. The answers will come out ranked by path.

For more details, see the "quick" notebook in greengamma/general.

First, we'll have a quick function that calls the quick service, and some functions for properly creating the question.  Then we'll create the question, run it, and pretty print it for two examples: diabetes and asthma.

In [9]:
#Load some functions for calling the quick service and parsing its output

import os
import sys
module_path = os.path.abspath(os.path.join('../../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from gg_functions import parse_answer, quick

The basic machine question created below goes from a disease to a set of phenotypes to a genetic_condition.  Making the phenotypes a set allows there to be many phenotypes that connect the disease to the condition.

In [4]:
def create_basic_question(disease_id):
    return {
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "disease",
                "curie": disease_id
            },
            {
                "id": "n1",
                "type": "phenotypic_feature",
                "set": True
            },
            {
                "id": "n2",
                "type": "genetic_condition"
            }
        ],
        "edges": [
            {
                "id": "e0",
                "source_id": "n0",
                "target_id": "n1"
            },
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n2"
            }
        ]
    }
}

In [6]:
diabetes = 'MONDO:0005148' #type 2 diabetes
asthma = 'MONDO:0004979' #asthma

For both diabetes and asthma, let's make a question of the type above, and run it.  The max_connectivity option sets a maximum degree for a node in the path, and is used to control the amount of time it takes to run and the specificity of the result. 1000 is a decent across the board value.

In [7]:
diabetes_question = create_basic_question(diabetes)
diabetes_answer = quick(diabetes_question,max_connectivity=1000)

Return Status: 200


In [14]:
asthma_question = create_basic_question(asthma)
asthma_answer = quick(asthma_question,max_connectivity=1000)

Return Status: 200


In [12]:
diabetes_frame = parse_answer(diabetes_answer,node_list=['n2'],edge_list=[])
diabetes_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0011565,metabolic syndrome X,65.70655
1,MONDO:0011382,sickle cell anemia,34.698179
2,MONDO:0001076,glucose intolerance,32.087022
3,MONDO:0006507,hereditary hemochromatosis,31.28175
4,MONDO:0008575,nicotine dependence,30.822532


The results that come back make reasonable sense, especially metabolic syndrome X and glucose intolerance.  

In [15]:
asthma_frame = parse_answer(asthma_answer,node_list=['n2'],edge_list=[])
asthma_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0007186,gastroesophageal reflux disease,21.994939
1,MONDO:0010086,sudden infant death syndrome,21.869378
2,MONDO:0015977,agammaglobulinemia,21.002775
3,MONDO:0013282,alpha 1-antitrypsin deficiency,20.196203
4,MONDO:0001901,selective IgG subclass deficiency,18.675023


We can potentially sharpen some of these answers if we allow our searches to include conditions that are similar by both phenotype and biological process.  That is, we want to allow two paths connecting the input to the output, saying that they should be similar phenotypically, but also similar in terms of the processes that create the disease:

In [17]:
def create_complex_question(disease_id,other_type):
    return {
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "disease",
                "curie": disease_id
            },
            {
                "id": "n1",
                "type": "phenotypic_feature",
                "set": True
            },
            {
                "id": "n2",
                "type": "genetic_condition"
            },
            {
                "id": "n3",
                "type": other_type,
                "set": True
            }
        ],
        "edges": [
            {
                "id": "e0",
                "source_id": "n0",
                "target_id": "n1"
            },
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n2"
            },
            {
                "id": "e2",
                "source_id": "n0",
                "target_id": "n3"
            },
            {
                "id": "e3",
                "source_id": "n3",
                "target_id": "n2"
            }
        ]
    }
}

In [18]:
two_part_diabetes_question = create_complex_question(diabetes,'biological_process_or_activity')
two_part_asthma_question = create_complex_question(asthma,'biological_process_or_activity')
two_part_diabetes_answer = quick(two_part_diabetes_question,max_connectivity=1000)
two_part_asthma_answer = quick(two_part_asthma_question,max_connectivity=1000)

Return Status: 200
Return Status: 500


In [19]:
diabetes_frame = parse_answer(two_part_diabetes_answer,node_list=['n2'],edge_list=[])
diabetes_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0012819,diabetic ketoacidosis,54.717417
1,MONDO:0001076,glucose intolerance,35.153795
2,MONDO:0007455,"diabetes mellitus, noninsulin-dependent",26.702664
3,MONDO:0019154,androgen insensitivity syndrome,20.140831
4,MONDO:0008763,Alstrom syndrome,19.961212


In [20]:
two_part_asthma_answer

<Response [500]>

For diabetes, we get an improved set of outputs because the conditions are both phenotypically similar to diabetes, but also share some of the biological mechanisms of the diesease.  However, for asthma, we find no results - there are no biological activites associated with asthma in our database, so no connections can be made. 

Instead of going through biological process, let's go through gene:

In [21]:
two_part_diabetes_question = create_complex_question(diabetes,'gene')
two_part_asthma_question = create_complex_question(asthma,'gene')
two_part_diabetes_answer = quick(two_part_diabetes_question,max_connectivity=1000)
two_part_asthma_answer = quick(two_part_asthma_question,max_connectivity=1000)

Return Status: 200
Return Status: 200


In [22]:
diabetes_frame = parse_answer(two_part_diabetes_answer,node_list=['n2'],edge_list=[])
diabetes_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0011565,metabolic syndrome X,68.640888
1,MONDO:0011382,sickle cell anemia,35.643111
2,MONDO:0008487,polycystic ovary syndrome,33.337035
3,MONDO:0001076,glucose intolerance,32.810945
4,MONDO:0006507,hereditary hemochromatosis,32.342166


In [23]:
asthma_frame = parse_answer(two_part_asthma_answer,node_list=['n2'],edge_list=[])
asthma_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0007186,gastroesophageal reflux disease,22.809748
1,MONDO:0010086,sudden infant death syndrome,22.020661
2,MONDO:0015977,agammaglobulinemia,21.87582
3,MONDO:0013282,alpha 1-antitrypsin deficiency,20.377447
4,MONDO:0015517,common variable immunodeficiency,19.782271
