# Workflow 1, Module 2, Question 3

## What pathways/processes are [genes] involved in?

### Strategy 1: expand with loop

If you want to know the pathways / processes that a gene is involved in, you can use expand.  If you have a list of genes and you want to know the union of all processes that the genes are involved with, this is the best approach.

In [1]:
robokop_server = 'robokop.renci.org'

In [2]:
import requests
import json
import pandas as pd

In [3]:
def expand(type1,identifier,type2,rebuild=None,csv=None,predicate=None):
    url=f'http://{robokop_server}:80/api/simple/expand/{type1}/{identifier}/{type2}'
    params = {'rebuild': rebuild, 
              'csv'    : csv,
              'predicate': predicate} 
    params = { k:v for k,v in params.items() if v is not None }
    response = requests.get(url,params=params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [4]:
def parse_answer(returnanswer):
    nodes = [answer['nodes'][1] for answer in returnanswer['answers']]
    edges = [answer['edges'][0] for answer in returnanswer['answers']]
    answers = [ {"result_id": node["id"], 
                 "result_name": node["name"] if 'name' in node else node['id'], 
                 "type": edge["type"],
                 "source": edge['edge_source']}
              for node,edge in zip(nodes,edges)]
    return pd.DataFrame(answers)

These are some genes from Questions 1 and 2:

In [6]:
PRKAA1 = 'HGNC:9376'
XBP1 = 'HGNC:12801'
MTATP1 = 'HGNC:7414'
NEIL1 = 'HGNC:18448'
FRK = 'HGNC:3955'

mody_genes = [PRKAA1,XBP1,MTATP1,NEIL1,FRK]

Expand takes a single gene, so you need to call it individually.  Here we combine the results into a single frame, but you may wish to handle it differently.

In [7]:
pframes = []
for gene in mody_genes:
    processes = expand('gene',gene,'biological_process_or_activity')
    process_frame = parse_answer(processes)
    process_frame['gene'] = gene
    pframes.append(process_frame)
all_processes = pd.concat(pframes)
all_processes

Return Status: 200
Return Status: 200
Return Status: 200
Return Status: 200
Return Status: 200


Unnamed: 0,result_id,result_name,source,type,gene
0,GO:0004679,AMP-activated protein kinase activity,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
1,GO:0004674,protein serine/threonine kinase activity,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
2,GO:0055089,fatty acid homeostasis,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
3,GO:0016236,macroautophagy,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
4,GO:0008284,positive regulation of cell proliferation,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
5,GO:0097009,energy homeostasis,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
6,GO:0009631,cold acclimation,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
7,GO:0019395,fatty acid oxidation,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
8,GO:0006633,fatty acid biosynthetic process,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376
9,GO:0004672,protein kinase activity,biolink.gene_get_process_or_function,actively_involved_in,HGNC:9376


### Enrichment

If you don't care about every process that every gene is involved with, but are instead trying to find the most common process, you can use enrichment and do a single call:

In [8]:
def enrichment(type1,identlist,type2,threshhold=None,maxresults=None,numtype1=None,include_descendants=None,rebuild=None):
    url=f'http://{robokop_server}/api/simple/enriched/{type1}/{type2}'
    params = { 'threshhold': threshhold, 'maxresults': maxresults, 
              'num_type1':numtype1, 'identifiers': identlist, 
              'include_descendants':include_descendants, 'rebuild': rebuild }
    params = { k:v for k,v in params.items() if v is not None }
    response=requests.post(url, json = params)
    print( f'Return Status: {response.status_code}' )
    if response.status_code == 200:
        return response.json()
    return []

In [9]:
enriched_processes = enrichment('gene',mody_genes,'biological_process_or_activity')
pd.DataFrame(enriched_processes)

Return Status: 200


Unnamed: 0,id,name,p
0,GO:0055089,fatty acid homeostasis,0.000004
1,GO:0006633,fatty acid biosynthetic process,0.000025
2,GO:0042149,cellular response to glucose starvation,0.000025
3,GO:0071333,cellular response to glucose stimulus,0.000064
4,GO:0009631,cold acclimation,0.000425
5,GO:0062028,regulation of stress granule assembly,0.000425
6,GO:0050405,[acetyl-CoA carboxylase] kinase activity,0.000425
7,GO:2000758,positive regulation of peptidyl-lysine acetyla...,0.000425
8,GO:0047322,[hydroxymethylglutaryl-CoA reductase (NADPH)] ...,0.000425
9,GO:0035404,histone-serine phosphorylation,0.000638
