# Metabolic Pathways

## Query Pattern

Most of our information about metabolism of chemicals comes from kegg.  We are not storing kegg reactions as entities themselves.  Instead, if we have a reaction where chemical A -> chemical B, and it's catalyzed by a gene C, we create the following relationships:

```
A derives_into B
C increases_degradation_of A
C increases_synthesis_of B
```

So we can look for reactions using that pattern.

## Functions

In [70]:
import requests
import pandas as pd

robokop_server = 'robokop.renci.org'

def quick(question):
    url=f'http://{robokop_server}:80/api/simple/quick/'
    #print(url)
    response = requests.post(url,json=question)
    #print( f"Return Status: {response.status_code}" )
    if response.status_code == 200:
        return response.json()
    return {}
    #return response

In [7]:
def make_decomposition_question(chemical_identifier):
    """Given a chemical input, construct a graph query tofind chemicals that it metabolizes into, 
    and the gene that catalyzes the rxn."""
    question = {
                'machine_question': {
                    'nodes': [
                        {
                            'id': 'n0',
                            'curie': chemical_identifier,
                            'type': 'chemical_substance'
                        },
                        {
                            'id': 'n1',
                            'type': 'gene'
                        },
                        {
                            'id': 'n2',
                            'type': 'chemical_substance'
                        }
                    ],
                    'edges': [
                        {
                            'id': 'e0',
                            'source_id': 'n1',
                            'target_id': 'n0',
                            'type': 'increases_degradation_of'
                        },
                        {
                            'id': 'e1',
                            'source_id': 'n1',
                            'target_id': 'n2',
                            'type': 'increases_synthesis_of'
                        },
                        {
                            'id': 'e2',
                            'source_id': 'n0',
                            'target_id': 'n2',
                            'type': 'derives_into'
                        }
                    ]
                }
            }
    return question

In [8]:
import pandas as pd
def answers2frame(graph_answers, input_chem):
    """Convert the answer from a decomposition query into a pandas DataFrame"""
    answers = []
    for graph_answer in graph_answers['answers']:
        try:
            for node in graph_answer['nodes']:
                if node['type'] == 'gene':
                    gene = node
                elif node['type'] == 'chemical_substance':
                    if node['id'] == input_chem:
                        reactant=node
                    else:
                        product=node
            ans = { 'score': graph_answer['score'],
                    'reactant': reactant['name'] if 'name' in reactant else reactant['id'],
                    'gene': gene['name'],
                    'product': product['name'] if 'name' in product else product['id'],
                    'product_id': product['id']
                  }
        except IndexError:
            #Can this happen here?
            continue
        answers.append(ans)
    df = pd.DataFrame(answers)
    ordered_columns = ['score','reactant','gene','product','product_id']
    df = df[ordered_columns]
    return df

## Example

Suppose we wanted to investigate the metabolism of Prostaglandin H2 (CHEBI:15554).  We can do that with the following code, which creates the question, calls the ROBOKOP quick service, then creates and displays a data frame of the answers.

In [9]:
chem_id = 'CHEBI:15554'  #Prostaglandin H2
q = make_decomposition_question(chem_id)
a = quick(q)

http://robokop.renci.org:80/api/simple/quick/


In [11]:
df = answers2frame(a,chem_id)
df

Unnamed: 0,score,reactant,gene,product,product_id
0,0,prostaglandin H2,HPGDS,prostaglandin D2,CHEBI:15555
1,0,prostaglandin H2,TBXAS1,thromboxane A2,CHEBI:15627
2,0,prostaglandin H2,PTGES,prostaglandin E2,CHEBI:15551
3,0,prostaglandin H2,PTGES2,prostaglandin E2,CHEBI:15551
4,0,prostaglandin H2,PTGDS,prostaglandin D2,CHEBI:15555
5,0,prostaglandin H2,PTGES3,prostaglandin E2,CHEBI:15551
6,0,prostaglandin H2,PRXL2B,prostaglandin F2alpha,CHEBI:15553


## Tracing Pathways Recursively

The code above asks about a single step.  Given a chemical, what does it metabolize into?  Now, we can write a function to recursively take the outputs and pass them back as inputs, so that we can trace out multiple steps.  The tricky thing here is that there are a bunch of 'balancing' chemicals that we don't want to consider part of our pathways.  These are things like water or carbon dioxide that will often appear as the product of a reaction. But we don't want to then use them to look for reactions where they are on the left hand side, because we will get an explosion of reactions.

In [53]:
def find_pathway(input_chemical_with_label,accumulator,max_products=0):
    """Pass in the identifier for the chemical you want the metabolic pathway of, and an empty dict for accumulator.
    Results will be passed out in accumulator."""
    dead_ends = ['CHEBI:16526', #carbon dioxide
                 'CHEBI:15741', #succinic acid / succinate
                 'CHEBI:15846', #NAD(+)
                 'CHEBI:18009', #NADP(+)
                 'CHEBI:44409', #NADP Zwitterion
                 'CHEBI:58349', #NADP(3-)
                 'CHEBI:16908', #NADH
                 'CHEBI:16474', #NADPH
                 'CHEBI:15377', #Water
                 'CHEBI:15379', #dioxygen
                 'CHEBI:16526', #carbon dioxide
                 'CHEBI:16842', #formaldehyde
                 'CHEBI:17659', #UDP
                 'CHEBI:15713', #ADP
                 'CHEBI:16761', #ADP? 
                 'CHEBI:16134', #ammonia
                 'CHEBI:16240', #hydrogen peroxide
                 'CHEBI:24636', #proton
                 'CHEBI:15378', #hydron 
                 'CHEBI:29888', #diphosphoric acid
                 'CHEBI:30769', #citric acid
                ]
    input_chemical = input_chemical_with_label[0]
    input_label = input_chemical_with_label[1]
    if input_chemical_with_label not in accumulator and input_chemical not in dead_ends:
        #print(input_chemical, input_label)
        q = make_decomposition_question(input_chemical)
        a = quick(q)
        answers = []
        if 'answers' in a:
            for answer in a['answers']:
                if len(answer['nodes']) < 3:
                    continue
                for node in answer['nodes']:
                    if node['type']=='gene':
                        gene=node
                    elif not node['id'] == input_chemical:
                        product = node
                if product['id'] not in dead_ends:
                    answers.append( {'gene':gene, 'product':product, 'score':answer['score']})
            accumulator[input_chemical_with_label] = answers
            distinct_products = set([(answer['product']['id'],answer['product']['name']) for answer in answers])
            #print('  ',len(distinct_products))
            if max_products > 0 and len(distinct_products) <= max_products:
                for product in distinct_products:
                    find_pathway(product,accumulator)
            

In [71]:
pathway = {}
chem_id = ('CHEBI:16714' , 'codeine')
#chem_id = ('CHEBI:15554', 'Prostaglandin H2')
#chem_id = ('CHEBI:18243', 'dopamine')
#chem_id = ('CHEBI:16436','CDP-choline')
#chem_id = ('CHEBI:27732','caffeine')
#chem_id = ('CHEBI:41774', 'tamoxifen')
#chem_id = ('CHEBI:27432', 'alpha-linolenic acid')

find_pathway(chem_id,pathway,max_products=20)

In [72]:
flatter = []
for k,v in pathway.items():
    for o in v:
        flatter.append({'reactant':k[0],
                        'reactantname': k[1],
                        'geneid':o['gene']['id'], 
                        'genename':o['gene']['name'] if 'name' in o['gene'] else '', 
                        'productid': o['product']['id'], 
                        'productname': o['product']['name'] if 'name' in o['product'] else '',
                        'score': o['score']})

In [73]:
df = pd.DataFrame(flatter)
ordered_columns = ['reactant','reactantname','genename','productname','productid']
df = df[ordered_columns]

from IPython.display import display
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(df)

Unnamed: 0,reactant,reactantname,genename,productname,productid
0,CHEBI:16714,codeine,CYP2D6,morphine,CHEBI:17303
1,CHEBI:16714,codeine,CYP2D7,morphine,CHEBI:17303
2,CHEBI:16714,codeine,CYP1A2,morphine,CHEBI:17303
3,CHEBI:16714,codeine,CYP2C8,morphine,CHEBI:17303
4,CHEBI:16714,codeine,CYP3A5,morphine,CHEBI:17303
5,CHEBI:16714,codeine,CYP1A1,morphine,CHEBI:17303
6,CHEBI:16714,codeine,,morphine,CHEBI:17303
7,CHEBI:16714,codeine,CYP2F1,morphine,CHEBI:17303
8,CHEBI:16714,codeine,CYP3A43,morphine,CHEBI:17303
9,CHEBI:16714,codeine,CYP4B1,morphine,CHEBI:17303


In [77]:
products_and_reactants = df[ ['reactant','reactantname','productid','productname']]
products_and_reactants.drop_duplicates(inplace=True)
products_and_reactants

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,reactant,reactantname,productid,productname
0,CHEBI:16714,codeine,CHEBI:17303,morphine
24,CHEBI:16714,codeine,CHEBI:80579,norcodeine
25,CHEBI:16714,codeine,CHEBI:80580,Codeine-6-glucuronide
44,CHEBI:17303,morphine,CHEBI:80581,Morphine-6-glucuronide
45,CHEBI:17303,morphine,CHEBI:80631,Morphine-3-glucuronide
69,CHEBI:17303,morphine,CHEBI:7633,Normorphine
