# GSEAPy

This notebook outlines how other pathway enrichment methods can be used to obtain the pathway dysregulation vectors.

In [1]:
import sys, time, getpass

import pandas as pd
import gseapy

In [2]:
print(sys.version)

3.6.5 (default, Apr 20 2018, 08:54:42) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]


In [3]:
print(time.asctime())

Wed May 15 15:47:42 2019


In [4]:
print(getpass.getuser())

ddomingofernandez


In [5]:
data = pd.read_csv('/home/ddomingofernandez/Downloads/datasets/brca_deseq2.csv')

In [6]:
data.head()

Unnamed: 0.1,Unnamed: 0,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj,gene_symbol
0,0,3115.007965,-0.611308,0.09215,-6.633868,3.270033e-11,1.225591e-10,TSPAN6
1,1,137.42418,-3.909312,0.236414,-16.535858,2.0247460000000002e-61,5.575769000000001e-60,TNMD
2,2,2312.437908,0.445773,0.055601,8.017309,1.08087e-15,5.272969e-15,DPM1
3,3,1893.468051,0.384674,0.055613,6.916968,4.614136e-12,1.829991e-11,SCYL3
4,4,787.754054,1.331132,0.072354,18.397566,1.373988e-75,5.873324e-74,C1orf112


Slice dataframe to get the two main columns (gene symbol and FC)

In [7]:
rank = data[['gene_symbol','log2FoldChange']].dropna()

In [8]:
rank.head()

Unnamed: 0,gene_symbol,log2FoldChange
0,TSPAN6,-0.611308
1,TNMD,-3.909312
2,DPM1,0.445773
3,SCYL3,0.384674
4,C1orf112,1.331132


Run GSEA with KEGG 2016 gene sets

In [9]:
pre_res = gseapy.prerank(
    rnk=rank,
    gene_sets='KEGG_2016',
    processes=4,
    permutation_num=100, # reduce number to speed up test
    outdir='test/prerank_report_kegg'
)

Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
  warn_deprecated("2.2", "Passing one of 'on', 'true', 'off', 'false' as a "
Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
  warn_deprecated("2.2", "Passing one of 'on', 'true', 'off', 'false' as a "
Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
  warn_deprecated("2.2", "Passing one of 'on', 'true', 'off', 'false' as a "
Passing one of 'on', 'true', 'off', 'false' as a boolean is deprecated; use an actual boolean (True/False) instead.
  warn_deprecated("2.2", "Passing one of 'on', 'true', 'off', 'false' as a "


In [10]:
pre_res.res2d.head()

Unnamed: 0_level_0,es,nes,pval,fdr,geneset_size,matched_size,genes
Term,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Systemic lupus erythematosus_Homo sapiens_hsa05322,0.721959,2.247015,0.0,0.0,135,130,"HIST1H2AI,HIST1H2BO,HIST1H3B,HIST1H3J,HIST1H3H..."
Alcoholism_Homo sapiens_hsa05034,0.662603,2.201954,0.0,0.0,179,178,"GNG13,GNGT1,HIST1H2AI,HIST1H2BO,HIST1H3B,HIST1..."
Viral carcinogenesis_Homo sapiens_hsa05203,0.541748,1.818396,0.0,0.001488,205,204,"HIST1H2BO,HIST1H2BM,HIST1H4D,HIST1H2BB,HIST1H2..."
Cell cycle_Homo sapiens_hsa04110,0.581447,1.849332,0.0,0.001984,124,123,"PKMYT1,CDC20,PLK1,CDC25C,BUB1,PTTG1,CDK1,CCNB2..."
DNA replication_Homo sapiens_hsa03030,0.673247,1.772368,0.0,0.003572,36,36,"MCM4,RNASEH2A,MCM2,DNA2,FEN1,POLE2,PCNA,RFC4,L..."


Using the Normalized Enrichment Scores (NES), which indicate the change of directionality in the pathway, we can build the vectors for each disease or drug.

In the following example, we assume that the pathway is activated or inhibited when the pathway NES is larger than 2 or smaller than -2, respectively. Pathways with NES between these two values are assumed to not have changed.

In [11]:
vector = []

for nes in pre_res.res2d["nes"]:
    
    if nes > 2:
        vector.append(1)
        
    elif nes < -2:
        vector.append(-1)
        
    vector.append(0)


In [12]:
print(vector)

[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
