# Enrichment

This is an example on how to enrich a set of genes. Here we make use of the GO enrichment API (http://geneontology.org/), which allows to perform enrichment against Reactome Pathways as well as GO or the Panther Protein class.

Defenition of parameters. 

In [9]:
import json
import requests
import pandas as pd

def converttostr(input_seq, seperator):
   # Join all the strings in list
   final_str = seperator.join(input_seq)
   return final_str



api_url_base = 'http://pantherdb.org/services/oai/pantherdb/enrich/overrep?'
gene_input = "geneInputList="
organism = "&organism=9606"
dataset= "&annotDataSet="
#change type and correction if applicable. Refer to the original documentation for this
end = "&enrichmentTestType=FISHER&correction=BONFERRONI"
ref_input = "&refInputList="
#change organism Taxonomy ID if applicable
ref_organism = "&refOrganism=9606"
headers = {'Content-Type': 'application/json'}

#this are the available options to enrich against
go_mf = "GO%3A0003674"
go_bp = "GO%3A0008150"
go_cc = "GO%3A0005575"
reactome = "ANNOT_TYPE_ID_REACTOME_PATHWAY"
pr_class = "ANNOT_TYPE_ID_PANTHER_PC"

un is a list of Gene Symbols or Ensembl Gene IDs to be enriched. The API will return all results, of which the significant ones are selected based on their pValue. In the case study you find this method applied to compare different communities bewteen two networks.

In [3]:
un = ["APH1A", "ARRB1", "CCND1", "CUL1", "DLL1", "DTX1"]
    
#replace reactome with the other available parameters if needed

query = api_url_base+gene_input+converttostr(un, ",%20")+organism+dataset+reactome+end

response = requests.get(query, headers=headers)
res = json.loads(response.content.decode('utf-8'))

#adjust the pvalue if needed
significant = []
for r in res["results"]["result"]:
    if r["pValue"] <= 0.05:
        significant.append(r)




In [4]:
significant

[{'number_in_list': 0,
  'fold_enrichment': 0,
  'fdr': 0,
  'expected': 2.8961398397669336,
  'number_in_reference': 9941,
  'pValue': 0,
  'term': {'label': 'UNCLASSIFIED'},
  'plus_minus': '-'},
 {'number_in_list': 6,
  'fold_enrichment': 102.975,
  'fdr': 0,
  'expected': 0.05826656955571741,
  'number_in_reference': 200,
  'pValue': 1.922346547451178e-09,
  'term': {'id': 'R-HSA-157118', 'label': 'Signaling by NOTCH'},
  'plus_minus': '+'},
 {'number_in_list': 5,
  'fold_enrichment': 241.72535211267606,
  'fdr': 0,
  'expected': 0.02068463219227968,
  'number_in_reference': 71,
  'pValue': 7.3957739627208264e-09,
  'term': {'id': 'R-HSA-1980143', 'label': 'Signaling by NOTCH1'},
  'plus_minus': '+'},
 {'number_in_list': 4,
  'fold_enrichment': 442.90322580645164,
  'fdr': 0,
  'expected': 0.009031318281136198,
  'number_in_reference': 31,
  'pValue': 2.1596307469202088e-07,
  'term': {'id': 'R-HSA-2122948',
   'label': 'Activated NOTCH1 Transmits Signal to the Nucleus'},
  'plus_m

Here is an example on how to convert the JSON output into a DataFrame

In [6]:
# convert into a dataframe

pdData = {}

pval = []
cls = []
bps = []
direction = []

    
for i in significant:

    if i["term"]["label"] != "UNCLASSIFIED":
        pval.append(i["pValue"])
        
        bps.append(i["term"]["label"])
        direction.append(i["plus_minus"])
        
    
    
pdData = {"pval": pval,  "term": bps, "direction":direction}

In [10]:
t = pd.DataFrame(pdData)
t

Unnamed: 0,pval,term,direction
0,1.922347e-09,Signaling by NOTCH,+
1,7.395774e-09,Signaling by NOTCH1,+
2,2.159631e-07,Activated NOTCH1 Transmits Signal to the Nucleus,+
3,0.0009653916,Constitutive Signaling by NOTCH1 HD+PEST Domai...,+
4,0.0009653916,Signaling by NOTCH1 HD+PEST Domain Mutants in ...,+
5,0.0009653916,Constitutive Signaling by NOTCH1 PEST Domain M...,+
6,0.0009653916,Signaling by NOTCH1 in Cancer,+
7,0.0009653916,Signaling by NOTCH1 PEST Domain Mutants in Cancer,+
8,0.003086913,Diseases of signal transduction,+
9,0.01124459,Signal Transduction,+
