# Workflow 1, Module 1 (condition similarity)

One approach to solving this module is to not define quite so tightly what's going on at the subquestion level in terms of enrichements and archetypes and so on, but simply pass the question to ROBOKOP and let its scoring bring the the best answers to the top.  Here we will use the quick service to start with a disease, find relevant phenotypes, and from there find genetic conditions. The answers will come out ranked by path.

For more details, see the "quick" notebook in greengamma/general.

First, we'll have a quick function that calls the quick service, and some functions for properly creating the question.  Then we'll create the question, run it, and pretty print it for two examples: diabetes and asthma.

In [1]:
import requests
import pandas as pd

def quick(question):
    url=f'http://robokop.renci.org:80/api/simple/quick/'
    response = requests.post(url,json=question)
    print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return response

In [2]:
def make_N_step_question(types,curies):
    question = {
                'machine_question': {
                    'nodes': [],
                    'edges': []
                }
            }
    for i,t in enumerate(types):
        newnode = {'id': i, 'type': t}
        if curies[i] is not None:
            newnode['curie'] = curies[i]
        question['machine_question']['nodes'].append(newnode)
        if i > 0:
            question['machine_question']['edges'].append( {'source_id': i-1, 'target_id': i})
    return question

In [3]:
def extract_final_nodes(returnanswer):
    nodes = [{'node_name': answer['nodes'][2]['name'], 'node_id': answer['nodes'][2]['id']} for answer in returnanswer['answers']]
    return pd.DataFrame(nodes)

In [4]:
diabetes = 'MONDO:0005148' #type 2 diabetes
asthma = 'MONDO:0004979' #asthma

In [5]:
diabetes_question = make_N_step_question(['disease','phenotypic_feature','genetic_condition'],[diabetes,None,None])
diabetes_answer = quick(diabetes_question)

Return Status: 200


In [6]:
diabetes_frame = extract_final_nodes(diabetes_answer)
diabetes_frame

Unnamed: 0,node_id,node_name
0,MONDO:0001076,glucose intolerance
1,MONDO:0001076,glucose intolerance
2,MONDO:0018911,maturity-onset diabetes of the young (disease)
3,MONDO:0001076,glucose intolerance
4,MONDO:0001076,glucose intolerance
5,MONDO:0001076,glucose intolerance
6,MONDO:0001076,glucose intolerance
7,MONDO:0001076,glucose intolerance
8,MONDO:0001076,glucose intolerance
9,MONDO:0001076,glucose intolerance


It appears that we are getting repeat answers, but that's not the case.  For quick, an answer is the entire path. In this case, (`asthma`)-(`phenotypic_feature`)-(`genetic_condition`).  So if two paths have the same final `genetic_condition`, say, "glucose intolerance", but different `phenotypic_features` then they are considered different paths.  But here we're just writing that final node. If we look at the intermediate nodes, we can see this:

In [11]:
def extract_both_nodenames(returnanswer):
    nodes = [{'phenotypic_feature': answer['nodes'][1]['name'], 'genetic_condition': answer['nodes'][2]['name']} for answer in returnanswer['answers']]
    return pd.DataFrame(nodes)

In [12]:
nameframe = extract_both_nodenames(diabetes_answer)
nameframe

Unnamed: 0,genetic_condition,phenotypic_feature
0,glucose intolerance,Increased body weight
1,glucose intolerance,Maturity-onset diabetes of the young
2,maturity-onset diabetes of the young (disease),Maturity-onset diabetes of the young
3,glucose intolerance,Increased adipose tissue
4,glucose intolerance,Decreased HDL cholesterol concentration
5,glucose intolerance,Glucose intolerance
6,glucose intolerance,Fasting hyperinsulinemia
7,glucose intolerance,Insulin-resistant diabetes mellitus
8,glucose intolerance,Acanthosis nigricans
9,glucose intolerance,Hyperinsulinemia


In [7]:
asthma_question = make_N_step_question(['disease','phenotypic_feature','genetic_condition'],[asthma,None,None])
asthma_answer = quick(asthma_question)

Return Status: 200


In [8]:
asthma_frame = extract_final_nodes(asthma_answer)
asthma_frame

Unnamed: 0,node_id,node_name
0,MONDO:0007186,gastroesophageal reflux disease
1,MONDO:0007186,gastroesophageal reflux disease
2,MONDO:0011292,"dermatitis, atopic"
3,MONDO:0011292,"dermatitis, atopic"
4,MONDO:0011292,"dermatitis, atopic"
5,MONDO:0009061,cystic fibrosis
6,MONDO:0009061,cystic fibrosis
7,MONDO:0007186,gastroesophageal reflux disease
8,MONDO:0013282,alpha 1-antitrypsin deficiency
9,MONDO:0007186,gastroesophageal reflux disease
