# Workflow 1, Module 1 (condition similarity)

One approach to solving this module is to not define quite so tightly what's going on at the subquestion level in terms of enrichements and archetypes and so on, but simply pass the question to ROBOKOP and let its scoring bring the the best answers to the top.  Here we will use the quick service to start with a disease, find relevant phenotypes, and from there find genetic conditions. The answers will come out ranked by path.

For more details, see the "quick" notebook in greengamma/general.

First, we'll have a quick function that calls the quick service, and some functions for properly creating the question.  Then we'll create the question, run it, and pretty print it for two examples: diabetes and asthma.

In [1]:
#Load some functions for calling the quick service and parsing its output

import os
import sys
module_path = os.path.abspath(os.path.join('../../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from gg_functions import parse_answer, quick

The basic machine question created below goes from a disease to a set of phenotypes to a genetic_condition.  Making the phenotypes a set allows there to be many phenotypes that connect the disease to the condition.

In [2]:
def create_basic_question(disease_id):
    return {
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "disease",
                "curie": disease_id
            },
            {
                "id": "n1",
                "type": "phenotypic_feature",
                "set": True
            },
            {
                "id": "n2",
                "type": "genetic_condition"
            }
        ],
        "edges": [
            {
                "id": "e0",
                "source_id": "n0",
                "target_id": "n1"
            },
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n2"
            }
        ]
    }
}

In [3]:
diabetes = 'MONDO:0005148' #type 2 diabetes
asthma = 'MONDO:0004979' #asthma

For both diabetes and asthma, let's make a question of the type above, and run it.  The max_connectivity option sets a maximum degree for a node in the path, and is used to control the amount of time it takes to run and the specificity of the result. 1000 is a decent across the board value.

In [4]:
diabetes_question = create_basic_question(diabetes)
diabetes_answer = quick(diabetes_question,max_connectivity=1000)

Return Status: 200


In [5]:
asthma_question = create_basic_question(asthma)
asthma_answer = quick(asthma_question,max_connectivity=1000)

Return Status: 200


In [6]:
diabetes_frame = parse_answer(diabetes_answer,node_list=['n2'],edge_list=[])
diabetes_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0011565,metabolic syndrome X,57.508989
1,MONDO:0011382,sickle cell anemia,36.194567
2,MONDO:0009693,multiple myeloma,34.576429
3,MONDO:0009061,cystic fibrosis,32.139963
4,MONDO:0002413,glycogen storage disease I,28.325185


The results that come back make reasonable sense, especially metabolic syndrome X and glucose intolerance.  

In [7]:
asthma_frame = parse_answer(asthma_answer,node_list=['n2'],edge_list=[])
asthma_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0009061,cystic fibrosis,29.826465
1,MONDO:0015977,agammaglobulinemia,19.849952
2,MONDO:0011382,sickle cell anemia,19.316353
3,MONDO:0010086,sudden infant death syndrome,18.861209
4,MONDO:0007186,gastroesophageal reflux disease,18.840881


We can potentially sharpen some of these answers if we allow our searches to include conditions that are similar by both phenotype and biological process.  That is, we want to allow two paths connecting the input to the output, saying that they should be similar phenotypically, but also similar in terms of the processes that create the disease:

In [8]:
def create_complex_question(disease_id,other_type):
    return {
    "machine_question": {
        "nodes": [
            {
                "id": "n0",
                "type": "disease",
                "curie": disease_id
            },
            {
                "id": "n1",
                "type": "phenotypic_feature",
                "set": True
            },
            {
                "id": "n2",
                "type": "genetic_condition"
            },
            {
                "id": "n3",
                "type": other_type,
                "set": True
            }
        ],
        "edges": [
            {
                "id": "e0",
                "source_id": "n0",
                "target_id": "n1"
            },
            {
                "id": "e1",
                "source_id": "n1",
                "target_id": "n2"
            },
            {
                "id": "e2",
                "source_id": "n0",
                "target_id": "n3"
            },
            {
                "id": "e3",
                "source_id": "n3",
                "target_id": "n2"
            }
        ]
    }
}

In [15]:
two_part_diabetes_question = create_complex_question(diabetes,'biological_process_or_activity')
two_part_asthma_question = create_complex_question(asthma,'biological_process_or_activity')
two_part_diabetes_answer = quick(two_part_diabetes_question,max_connectivity=1000)
two_part_asthma_answer = quick(two_part_asthma_question,max_connectivity=1000)

Return Status: 200
Return Status: 200


In [16]:
diabetes_frame = parse_answer(two_part_diabetes_answer,node_list=['n2'],edge_list=[])
diabetes_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0012819,diabetic ketoacidosis,44.56046
1,MONDO:0001076,glucose intolerance,30.152026
2,MONDO:0018150,Gaucher disease,26.379361
3,MONDO:0002615,xanthomatosis (disease),23.304314
4,MONDO:0018105,Wolfram syndrome,22.383431


In [17]:
asthma_frame = parse_answer(two_part_asthma_answer,node_list=['n2'],edge_list=[])
asthma_frame.head()

For diabetes, we get an improved set of outputs because the conditions are both phenotypically similar to diabetes, but also share some of the biological mechanisms of the diesease.  However, for asthma, we find no results - there are no biological activites associated with asthma in our database, so no connections can be made. 

Instead of going through biological process, let's go through gene:

In [18]:
two_part_diabetes_question = create_complex_question(diabetes,'gene')
two_part_asthma_question = create_complex_question(asthma,'gene')
two_part_diabetes_answer = quick(two_part_diabetes_question,max_connectivity=1000)
two_part_asthma_answer = quick(two_part_asthma_question,max_connectivity=1000)

Return Status: 200
Return Status: 200


In [19]:
diabetes_frame = parse_answer(two_part_diabetes_answer,node_list=['n2'],edge_list=[])
diabetes_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0011565,metabolic syndrome X,71.561315
1,MONDO:0011382,sickle cell anemia,41.544386
2,MONDO:0015967,rare genetic diabetes mellitus,40.386608
3,MONDO:0009693,multiple myeloma,40.198767
4,MONDO:0001076,glucose intolerance,36.984862


In [20]:
asthma_frame = parse_answer(two_part_asthma_answer,node_list=['n2'],edge_list=[])
asthma_frame.head()

Unnamed: 0,n2 - id,n2 - name,score
0,MONDO:0015977,agammaglobulinemia,34.696751
1,MONDO:0009061,cystic fibrosis,34.262193
2,MONDO:0015974,severe combined immunodeficiency (disease),28.798527
3,MONDO:0007186,gastroesophageal reflux disease,26.939721
4,MONDO:0015517,common variable immunodeficiency,24.132471
