In [3]:
!date
import pandas as pd
import numpy as np
import sys
sys.path.append('../dev')

import utils
import enrich

pd.options.display.max_colwidth = 100
import os

Wed Nov 22 11:26:35 PST 2023


# Hou 2018

Microarray data from two experiments retrieved from GEO where mice were fed a variety of high fat diets to induce fatty liver disease.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6290321/


Process the data:

Write the data copied from Table 2 to files

In [2]:
hou_raw = pd.read_csv('../test_data/unprocessed/Hou_2018.csv')

hou_up = hou_raw.loc[1].values[0].split(',')
hou_up = pd.Series(hou_up).apply(lambda x: x.strip())
hou_up.to_csv('../test_data/processed/Hou_2018_up.csv', header = False, index = False)

hou_down = hou_raw.loc[3].values[0].split(',')
hou_down = pd.Series(hou_down).apply(lambda x: x.strip())
pd.Series(hou_down).to_csv('../test_data/processed/Hou_2018_down.csv', header = False, index = False)

pd.concat([hou_up,hou_down],axis = 0).to_csv('../test_data/processed/Hou_2018_combined.csv', header = False, index = False)

In [3]:
hou_combined_set = enrich.enrich_wrapper('Hou_2018_combined.csv','Gene Symbol',method='set',fpath = '../test_data/processed/')

100%|██████████████| 225/225 [00:00<00:00, 13384.94it/s]


Analysis run on 153 entities from 107 out of 339 input genes


In [4]:
enrich.compare2standard(hou_combined_set, 'Hou_2018_combined.csv' , 'Gene Symbol',FDR=.05)

100%|██████████████| 225/225 [00:00<00:00, 17613.92it/s]
100%|██████████████| 225/225 [00:00<00:00, 17236.24it/s]


Standard method yields 0 results, 0 of which are unique


# Hoang 2019

NAFLD. Comparisons between samples with different histological staging. (different stages of fibrosis and NAS score)

3000 genes identified with 1% FDR, but I did 5% FDR and > 60% increase or > 30% decrease. I just have those values to show the MMP case, but I would probably choose > 100% increase or > 50% decrease.

## Process data

**Fibrosis**

In [5]:
fibrosis = pd.read_csv('../test_data/Hoang_2019_fibrosis.csv')
fibrosis = fibrosis[fibrosis.adj_P <.05]
fibrosis = fibrosis[fibrosis.range_log2FC.apply(lambda x: False if (x > -.5 and x < 0.7) else True)]
print(len(fibrosis))

fibrosis.gene_symbol.to_csv('../test_data/processed/Hoang_2019_fibrosis.csv', header = False, index = False)

440


**NAS**

In [6]:
NAS = pd.read_csv('../test_data/Hoang_2019_NAS.csv')
NAS = NAS[NAS.adj_P <.05]
NAS = NAS[NAS.range_log2FC.apply(lambda x: False if (x > -.7 and x < 0.7) else True)]
print(len(NAS))

NAS.gene_symbol.to_csv('../test_data/processed/Hoang_2019_NAS.csv', header = False, index = False)

551


## Analyze

**Fibrosis**

In [7]:
hoang_fibrosis_set = enrich.enrich_wrapper('Hoang_2019_fibrosis.csv','Gene Symbol',method='set',FDR = 0.1,fpath = '../test_data/processed/')
hoang_fibrosis_set['shared entities in gocam'].values

100%|██████████████| 171/171 [00:00<00:00, 16972.55it/s]


Analysis run on 122 entities from 103 out of 440 input genes


array([list(['MMP7', 'MMP2', 'sset:proMMP9 activating proteases', 'sset:MMP1,7', 'sset:proMMP8 initial activators', 'sset:MMP2,3,7,10,11', 'sset:MMP1 (2, 3, 7, 10, 13)']),
       list(['PLAU', 'F13A1', 'sset:SERPINB2,6,8', 'sset:SERPINE1-like proteins'])],
      dtype=object)

In [8]:
enrich.compare2standard(hoang_fibrosis_set, 'Hoang_2019_fibrosis.csv','Gene Symbol',FDR = 0.1)

100%|██████████████| 171/171 [00:00<00:00, 17222.38it/s]
100%|██████████████| 171/171 [00:00<00:00, 16401.61it/s]


Standard method yields 4 results, 3 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Activation of Matrix Metalloproteinases - Reactome,5.124311e-07,7,18,"[MMP7, MMP2, sset:proMMP9 activating proteases, sset:MMP1,7, sset:proMMP8 initial activators, ss...",http://model.geneontology.org/R-HSA-1592389,23


In [9]:
temp= enrich.enrich_wrapper('Hoang_2019_fibrosis.csv','Gene Symbol',method='standard',fpath = '../test_data/processed/',show_significant = False)
temp[temp['title'] == 'Activation of Matrix Metalloproteinases - Reactome']

100%|██████████████| 171/171 [00:00<00:00, 16208.50it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
6,Activation of Matrix Metalloproteinases - Reactome,0.009603,3,23,"[MMP7, MMP2, KLK2]",http://model.geneontology.org/R-HSA-1592389


In this model, several steps can be enabled by MMP2 (P08253) and MMP7 (P09237). The issue is that in some steps, MMP2 or MMP7 are individually required to enable steps, while in others, they are sufficient as members as sets, and my code doesn't check sets against individual genes. Those specific steps, by the way, are autocatalytic activation. So the model is describing in some areas how whole groups of MMPs can be activated and in other areas is describing how individual MMPs can autocatalyze their activation. P20151 is Kallikrein Related Peptidase 2. It does the steps enabled by 'proMMP9 activating proteases' and 'proMMP8 initial activators'. 

http://model.geneontology.org/R-HSA-1592389

In [10]:
temp[temp['title'] == 'Chromatin modifying enzymes - Reactome']

Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
32,Chromatin modifying enzymes - Reactome,0.093781,1,5,[PADI1],http://model.geneontology.org/R-HSA-3247509


This model really does have only 2 steps, both of which are enabled by Q9ULC6. How it got through the filters for causal models, I don't know.
http://noctua.geneontology.org/editor/graph/gomodel:R-HSA-3247509

**NAS**

In [11]:
hoang_NAS_set = enrich.enrich_wrapper('Hoang_2019_NAS.csv','Gene Symbol',method='ncHGT',FDR = 0.1,fpath = '../test_data/processed/')


100%|█████████████████| 224/224 [00:40<00:00,  5.59it/s]


Analysis run on 189 entities from 177 out of 551 input genes


In [12]:
enrich.compare2standard(hoang_NAS_set, 'Hoang_2019_NAS.csv','Gene Symbol',FDR = 0.1)

100%|██████████████| 224/224 [00:00<00:00, 17849.80it/s]
100%|██████████████| 224/224 [00:00<00:00, 16897.00it/s]


Standard method yields 7 results, 7 of which are unique


# Goavere 2023

NAFLD. Paired transcriptomics and proteomics in 306 patients to find signatures of active steatohepatitis and severe fibrosis.

NOTE:

"IGHM IGJ IGK@ IGL@" and  "ITGA1 ITGB1" are individual entries, but only the first element in the list is kept in the processing below.

In [15]:
GS2 = pd.read_csv('../test_data/unprocessed/Goavere_S2.csv',sep = '\t',header = None)
GS2[0] = GS2[0].apply(lambda x: x.split(' ')[1])
print(len(GS2))
GS2[0] = GS2[0].apply(lambda x: x.replace(u'\xa0', u''))
GS2.to_csv('../test_data/processed/Goavere_S2.csv', header = False, index = False)

121


In [16]:
GS2 = pd.read_csv('../test_data/processed/Goavere_S2.csv')

In [123]:
GS2_set = enrich.enrich_wrapper('Goavere_S2.csv','Gene Symbol',method='set',FDR = 0.1,fpath = '../test_data/processed/')
GS2_set

100%|████████████████████████████████████████| 98/98 [00:00<00:00, 17824.88it/s]


Analysis run on 68 entities from 48 out of 118 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Activation of Matrix Metalloproteinases - Reactome,8.471489e-09,7,18,"[sset:proMMP1 initial activators, sset:proMMP3 initial activators, sset:proMMP10 activators, sse...",http://model.geneontology.org/R-HSA-1592389
1,Cytosolic sulfonation of small molecules - Reactome,1.208035e-05,5,19,"[SULT2A1, sset:SULT1E1,2A1, sset:SULTs active on DHEA, sset:SULT dimers (T2), sset:SULT dimers (...",http://model.geneontology.org/R-HSA-156584


In [124]:
enrich.compare2standard(GS2_set, 'Goavere_S2.csv','Gene Symbol',FDR = 0.1)

100%|████████████████████████████████████████| 98/98 [00:00<00:00, 14871.27it/s]
100%|████████████████████████████████████████| 98/98 [00:00<00:00, 17563.64it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Activation of Matrix Metalloproteinases - Reactome,8.471489e-09,7,18,"[sset:proMMP1 initial activators, sset:proMMP3 initial activators, sset:proMMP10 activators, sse...",http://model.geneontology.org/R-HSA-1592389,23
1,Cytosolic sulfonation of small molecules - Reactome,1.208035e-05,5,19,"[SULT2A1, sset:SULT1E1,2A1, sset:SULTs active on DHEA, sset:SULT dimers (T2), sset:SULT dimers (...",http://model.geneontology.org/R-HSA-156584,22


1. different diseases
2. different organ systems
3. could be something that has been followed up on
4. studies that produced results, possibly higher cited or profile
5. 300ish genes
6. response to drug

Aging, GABAergic vs Glutamatergic, 

Heart, Aging brain, Cancer drug, psoriasis or something autoimmune, stem cell?, IBS/GI

# Dilated and Arrhythmogenic Cardiomyopathy

Reichart 2022 

https://www.science.org/doi/10.1126/science.abo1984?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed#supplementary-materials

This paper did snRNA seq on human left and right ventricles from healthy and diseased donors and identified 10 major cell types and 71 transcriptional states from 880,000 nuclei. They did snRNA seq likely because it is difficult to extract intact cells from some organs for single cell RNA seq. They found that fibrosis was expanded in ACM and DCM but the number of fibroblasts was not increased.

vCM cell states fig S5, S6A; tables S7-S12

fibroblast DEGs: figs S9-11; tables S15-21

smooth muscle cells: tables S23-28. KEGG enrichment analysis in Fig 3C
- use S23 (LV) and S25 (RV) for mural cells in each genotype versus control
- may look into the pericyte cell types in S27

DCM is linked with complement activatgion: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1850717/

In [5]:
logFC = 1
PV = .05

## Smooth Muscle Cells, LV mural

### Preprocess

In [9]:
for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df = pd.concat([
        pd.read_csv(f'../test_data/unprocessed/reichart/S23_LV_MC_Upregulated_Genes_Disease_Genotypes_All_States-{genotype}.csv'),
         pd.read_csv(f'../test_data/unprocessed/reichart/S24_LV_MC_Upregulated_Genes_Controls_All_States-{genotype}.csv')
    ])
    df = df.query(f'logFC > {logFC} and PValue < {PV}')
    df['Gene'].to_csv(f'../test_data/processed/{genotype}_SMC_comb.csv', header = False, index= False)
    print(genotype,':',len(df), 'genes')

LMNA : 633 genes
PKP2 : 309 genes
RBM20 : 435 genes
TTN : 396 genes
PVneg : 686 genes


In [10]:
for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df_up = pd.read_csv(f'../test_data/unprocessed/reichart/S23_LV_MC_Upregulated_Genes_Disease_Genotypes_All_States-{genotype}.csv')
    df_down = pd.read_csv(f'../test_data/unprocessed/reichart/S24_LV_MC_Upregulated_Genes_Controls_All_States-{genotype}.csv')
    
    
    for sign, df in {'up':df_up,'down':df_down}.items():
        df = df.query(f'logFC > {logFC} and PValue < {PV}')
        df['Gene'].to_csv(f'../test_data/processed/{genotype}_SMC_{sign}.csv', header = False, index= False)
        print(genotype,sign,':',len(df), 'genes')

LMNA up : 284 genes
LMNA down : 349 genes
PKP2 up : 142 genes
PKP2 down : 167 genes
RBM20 up : 213 genes
RBM20 down : 222 genes
TTN up : 151 genes
TTN down : 245 genes
PVneg up : 166 genes
PVneg down : 520 genes


### LMNA

In [5]:
LMNA_comb = enrich.enrich_wrapper('LMNA_comb.csv','Gene Symbol',method='set',FDR = 0.1,fpath = '../test_data/processed/')
LMNA_comb

100%|██████████████████████████████████████| 207/207 [00:00<00:00, 14965.20it/s]


Analysis run on 116 entities from 94 out of 349 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Peptide chain elongation - Reactome,4.188963e-09,15,79,"[RPL13A, RPLP2, RPS18, RPL34, RPLP0, RPL7, RPS15A, RPS20, RPL23A, RPS15, RPL27A, UBA52, RPL35, R...",http://model.geneontology.org/R-HSA-156902
1,Eukaryotic Translation Elongation - Reactome,8.574529e-09,15,83,"[RPL13A, RPLP2, RPS18, RPL34, RPLP0, RPL7, RPS15A, RPS20, RPL23A, RPS15, RPL27A, UBA52, RPL35, R...",http://model.geneontology.org/R-HSA-156842
2,Regulation of Complement cascade - Reactome,2.201032e-07,6,11,"[C3, C1R, sset:CD46, CR1:C4b:C3b complexes, sset:C4 activators, sset:C3 convertases, sset:Comple...",http://model.geneontology.org/R-HSA-977606


Unique results:

In [6]:
enrich.compare2standard(LMNA_comb, 'LMNA_comb.csv','Gene Symbol',FDR = 0.1)

100%|██████████████████████████████████████| 207/207 [00:00<00:00, 17530.61it/s]
100%|██████████████████████████████████████| 207/207 [00:00<00:00, 16745.83it/s]


Standard method yields 6 results, 4 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
2,Regulation of Complement cascade - Reactome,2.201032e-07,6,11,"[C3, C1R, sset:CD46, CR1:C4b:C3b complexes, sset:C4 activators, sset:C3 convertases, sset:Comple...",http://model.geneontology.org/R-HSA-977606,117


In [None]:
temp = enrich.enrich_wrapper('LMNA_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed/')
temp.query('title == "Regulation of Complement cascade - Reactome"')['shared entities in gocam'].values

100%|██████████████| 326/326 [00:00<00:00, 17277.09it/s]


array([list(['CFD', 'C3', 'C1R'])], dtype=object)

### RBM20

In [None]:
RBM20_comb = enrich.enrich_wrapper('RBM20_comb.csv','Gene Symbol',method='set',FDR = 0.1,fpath = '../test_data/processed/')
RBM20_comb

100%|██████████████| 195/195 [00:00<00:00, 18052.56it/s]


Analysis run on 135 entities from 103 out of 435 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Regulation of Complement cascade - Reactome,1.6e-05,5,11,"[C3, sset:C4 activators, sset:C3 convertases, sset:Complement factor D, sset:CD46, CR1:C4b:C3b c...",http://model.geneontology.org/R-HSA-977606


In [None]:
enrich.compare2standard(RBM20_comb, 'RBM20_comb.csv','Gene Symbol',FDR = 0.1)

100%|██████████████| 195/195 [00:00<00:00, 17708.21it/s]
100%|██████████████| 195/195 [00:00<00:00, 17733.55it/s]


Standard method yields 2 results, 2 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Regulation of Complement cascade - Reactome,1.6e-05,5,11,"[C3, sset:C4 activators, sset:C3 convertases, sset:Complement factor D, sset:CD46, CR1:C4b:C3b c...",http://model.geneontology.org/R-HSA-977606,117


### PKP2

In [None]:
PKP2_comb = enrich.enrich_wrapper('PKP2_comb.csv','Gene Symbol',method='set',FDR = 0.1,fpath = '../test_data/processed/')
PKP2_comb

100%|██████████████| 185/185 [00:00<00:00, 18532.71it/s]


Analysis run on 92 entities from 65 out of 309 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


In [None]:
enrich.compare2standard(PKP2_comb, 'PKP2_comb.csv','Gene Symbol',FDR = 0.1)

100%|██████████████| 185/185 [00:00<00:00, 17781.43it/s]
100%|██████████████| 185/185 [00:00<00:00, 17950.91it/s]


Standard method yields 0 results, 0 of which are unique


### PVneg

In [None]:
PVneg_comb = enrich.enrich_wrapper('PVneg_comb.csv','Gene Symbol',method='set',FDR = 0.1,fpath = '../test_data/processed/')
PVneg_comb

100%|██████████████| 324/324 [00:00<00:00, 17877.92it/s]


Analysis run on 256 entities from 221 out of 686 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Peptide chain elongation - Reactome,3.5491110000000003e-29,41,79,"[RPL13A, RPLP2, RPLP0, RPS18, RPL35, RPL27A, RPS15A, RPL7, RPS5, RPS2, RPS20, RPLP1, RPSA, RPL10...",http://model.geneontology.org/R-HSA-156902
1,Eukaryotic Translation Elongation - Reactome,4.67254e-28,41,83,"[RPL13A, RPLP2, RPLP0, RPS18, RPL35, RPL27A, RPS15A, RPL7, RPS5, RPS2, RPS20, RPLP1, RPSA, RPL10...",http://model.geneontology.org/R-HSA-156842
2,Formation of ATP by chemiosmotic coupling - Reactome,2.912307e-09,23,88,"[MT-ND4, MT-ND2, CYC1, UQCRQ, COX5B, ATP5MC1, COX7C, NDUFA1, MT-CO2, MT-CO1, ATP5F1D, ATP5MC3, U...",http://model.geneontology.org/R-HSA-163210
3,Respiratory electron transport - Reactome,3.162146e-06,19,91,"[MT-ND4, MT-ND2, CYC1, UQCRQ, COX5B, IDH3B, ACADVL, COX7C, NDUFA1, MT-CO2, MT-CO1, UQCRB, COX6B1...",http://model.geneontology.org/R-HSA-611105
4,The fatty acid cycling model - Reactome,5.667088e-06,16,70,"[MT-ND4, MT-ND2, CYC1, UQCRQ, COX5B, COX7C, NDUFA1, MT-CO2, MT-CO1, UQCRB, COX6B1, NDUFA4, NDUFA...",http://model.geneontology.org/R-HSA-167826
5,The proton buffering model - Reactome,5.667088e-06,16,70,"[MT-ND4, MT-ND2, CYC1, UQCRQ, COX5B, COX7C, NDUFA1, MT-CO2, MT-CO1, UQCRB, COX6B1, NDUFA4, NDUFA...",http://model.geneontology.org/R-HSA-167827
6,Cytoprotection by HMOX1 - Reactome,6.805078e-06,10,29,"[CYC1, UQCRQ, COX5B, COX7C, MT-CO2, MT-CO1, UQCRB, COX6B1, NDUFA4, COX4I1]",http://model.geneontology.org/R-HSA-9707564
7,Transport of nucleosides and free purine and pyrimidine bases across the plasma membrane - Reactome,9.636447e-06,10,30,"[ATP5MC1, ATP5F1D, ATP5MC3, SLC25A4, ATP5ME, ATP5PD, ATP5F1E, sset:SLC29A2-like proteins, sset:A...",http://model.geneontology.org/R-HSA-83936
8,TP53 Regulates Metabolic Genes - Reactome,2.089708e-05,12,46,"[CYC1, UQCRQ, COX5B, COX7C, MT-CO2, MT-CO1, UQCRB, COX6B1, NDUFA4, COX4I1, PRDX1, sset:PRDX1,2,5]",http://model.geneontology.org/R-HSA-5628897
9,Chondroitin sulfate biosynthesis - Reactome,0.0001073776,5,9,"[sset:CHST9,11,12,13, sset:PAPST1,2, sset:CHPF,CHPF2,CHSY3, sset:CHPF,CHSY3, sset:B3GAT dimers]",http://model.geneontology.org/R-HSA-2022870


In [35]:
enrich.compare2standard(PVneg_comb, 'PVneg_comb.csv','Gene Symbol',FDR = .1)

100%|██████████████| 324/324 [00:00<00:00, 16740.84it/s]
100%|██████████████| 324/324 [00:00<00:00, 16830.41it/s]


Standard method yields 10 results, 1 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
9,Chondroitin sulfate biosynthesis - Reactome,0.000107,5,9,"[sset:CHST9,11,12,13, sset:PAPST1,2, sset:CHPF,CHPF2,CHSY3, sset:CHPF,CHSY3, sset:B3GAT dimers]",http://model.geneontology.org/R-HSA-2022870,18
10,Hydrolysis of LPC - Reactome,0.0005,4,7,"[sset:PLA2(5), sset:PLA2(8), sset:PLA2(6), sset:PLA2(7)]",http://model.geneontology.org/R-HSA-1483115,13


In [37]:
temp = enrich.enrich_wrapper('PVneg_comb.csv','Gene Symbol', show_significant = False, method = 'standard', fpath = '../test_data/processed/')
temp.query('title == "Chondroitin sulfate biosynthesis - Reactome"')

100%|██████████████| 324/324 [00:00<00:00, 17324.32it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
46,Chondroitin sulfate biosynthesis - Reactome,0.005988,4,18,"[CHSY3, CHST11, B3GAT2, SLC35B2]",http://model.geneontology.org/R-HSA-2022870


In [20]:
temp = pd.DataFrame({'g':['CHSY3', 'CHST11', 'B3GAT2', 'SLC35B2']})
x = utils.convert_IDs(temp,'Gene Symbol')[0]
utils.convert_IDs(temp,'Gene Symbol')[1]

{'Q70JA7': 'CHSY3',
 'Q9NPF2': 'CHST11',
 'A0A024RBL0': 'CHST11',
 'F8VRG6': 'CHST11',
 'F8VXK3': 'CHST11',
 'F8VXK7': 'CHST11',
 'Q9NPZ5': 'B3GAT2',
 'A0A087WXU9': 'B3GAT2',
 'Q29RV3': 'B3GAT2',
 'A0A0A0MS46': 'SLC35B2',
 'Q8TB61': 'SLC35B2'}

In [22]:
for g in x:
    print(enrich.get_sets([g]))

([], ['sset:CHPF,CHSY3', 'sset:CHPF,CHPF2,CHSY3'], {'sset:CHPF,CHSY3': {'Q70JA7'}, 'sset:CHPF,CHPF2,CHSY3': {'Q70JA7'}})
([], ['sset:CHST9,11,12,13'], {'sset:CHST9,11,12,13': {'Q9NPF2'}})
(['A0A024RBL0'], [], {})
(['F8VRG6'], [], {})
(['F8VXK3'], [], {})
(['F8VXK7'], [], {})
([], ['sset:B3GAT dimers'], {'sset:B3GAT dimers': {'Q9NPZ5'}})
(['A0A087WXU9'], [], {})
(['Q29RV3'], [], {})
(['A0A0A0MS46'], [], {})
([], ['sset:PAPST1,2'], {'sset:PAPST1,2': {'Q8TB61'}})


In [24]:
temp = enrich.enrich_wrapper('PVneg_comb.csv','Gene Symbol', show_significant = False, method = 'standard', fpath = '../test_data/processed/')
temp.query('title == "Hydrolysis of LPC - Reactome"')

100%|██████████████████████████████████████| 324/324 [00:00<00:00, 14948.02it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
107,Hydrolysis of LPC - Reactome,0.102432,2,13,"[PLA2G4A, PLA2G2A]",http://model.geneontology.org/R-HSA-1483115


## fibroblast DEGs

fibroblasts and VEGF: https://www.nature.com/articles/s41598-022-23304-8#:~:text=Vascular%20endothelial%20growth%20factor%20(VEGF,%2C%20collagen%20deposition%2C%20and%20epithelialization.

### Preprocess

In [19]:
xls_down = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S16_LV_FB_Upregulated_Genes_Controls_All_States.xlsx')
xls_up = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S17_LV_FB_Upregulated_Genes_Disease_Genotypes_All_States.xlsx')

for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df = pd.concat([
        pd.read_excel(xls_up,f'{genotype}_control'),
         pd.read_excel(xls_down,f'control_{genotype}')
    ])
    df = df.query(f'logFC > {logFC} and PValue < {PV}')
    df['Gene'].to_csv(f'../test_data/processed/{genotype}_FB_comb.csv', header = False, index= False)
    print(genotype,':',len(df), 'genes')

LMNA : 872 genes
PKP2 : 725 genes
RBM20 : 625 genes
TTN : 1092 genes
PVneg : 1135 genes


In [10]:
xls_down = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S16_LV_FB_Upregulated_Genes_Controls_All_States.xlsx')
xls_up = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S17_LV_FB_Upregulated_Genes_Disease_Genotypes_All_States.xlsx')

for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df_up = pd.read_excel(xls_up,f'{genotype}_control')
    df_down = pd.read_excel(xls_down,f'control_{genotype}')
    
    
    for sign, df in {'up':df_up,'down':df_down}.items():
        df = df.query(f'logFC > {logFC} and PValue < {PV}')
        df['Gene'].to_csv(f'../test_data/processed/{genotype}_FB_{sign}.csv', header = False, index= False)
        print(genotype,sign,':',len(df), 'genes')

LMNA up : 347 genes
LMNA down : 525 genes
PKP2 up : 342 genes
PKP2 down : 383 genes
RBM20 up : 259 genes
RBM20 down : 366 genes
TTN up : 385 genes
TTN down : 707 genes
PVneg up : 398 genes
PVneg down : 737 genes


### LMNA

In [32]:
LMNA_comb = enrich.enrich_wrapper('LMNA_FB_comb.csv','Gene Symbol',method='set',FDR = .05,fpath = '../test_data/processed/')
LMNA_comb

100%|██████████████████████████████████████| 401/401 [00:00<00:00, 17557.63it/s]


Analysis run on 284 entities from 234 out of 872 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Peptide chain elongation - Reactome,3.358753e-07,20,79,"[RPS17, RPL13A, RPS18, RPLP0, RPLP2, RPL34, RPS20, RPL27A, RPL7, RPL35, RPL38, RPLP1, RPS15A, RP...",http://model.geneontology.org/R-HSA-156902
1,Eukaryotic Translation Elongation - Reactome,7.965288e-07,20,83,"[RPS17, RPL13A, RPS18, RPLP0, RPLP2, RPL34, RPS20, RPL27A, RPL7, RPL35, RPL38, RPLP1, RPS15A, RP...",http://model.geneontology.org/R-HSA-156842
2,Regulation of Complement cascade - Reactome,2.244272e-06,7,11,"[C3, C1R, sset:C3 convertases, sset:CD46, CR1:C4b:C3b complexes, sset:Activated thrombin, (ELANE...",http://model.geneontology.org/R-HSA-977606
3,G alpha (i) signalling events - Reactome,8.338874e-05,5,8,"[sset:Light-sensing opsins, sset:RGS1,3,4,5,6,7,8,9,10,11,12,13,14,16,17,18,19,20,21, sset:Ligan...",http://model.geneontology.org/R-HSA-418594


In [74]:
enrich.compare2standard(LMNA_comb, 'LMNA_FB_comb.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████████████████████████████████████| 401/401 [00:00<00:00, 17937.95it/s]
100%|████████████████████████████████████████████████████████████████████████| 401/401 [00:00<00:00, 17801.82it/s]


Standard method yields 3 results, 1 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
2,Regulation of Complement cascade - Reactome,2e-06,7,11,"[C3, C1R, sset:C4 activators, sset:Complement factor D, sset:Activated thrombin, (ELANE), sset:C...",http://model.geneontology.org/R-HSA-977606,117
3,G alpha (i) signalling events - Reactome,8.3e-05,5,8,"[sset:GNB, sset:Light-sensing opsins, sset:G alpha (i), sset:Ligand:GPCR complexes that activate...",http://model.geneontology.org/R-HSA-418594,250


In [75]:
temp = enrich.enrich_wrapper('LMNA_FB_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed/')
temp.query('title == "Regulation of Complement cascade - Reactome"')['shared entities in gocam'].values

100%|████████████████████████████████████████████████████████████████████████| 401/401 [00:00<00:00, 17746.60it/s]


array([list(['IGKV1D-16', 'IGKC', 'MASP1', 'ELANE', 'CFD', 'C3', 'C1R'])],
      dtype=object)

In [129]:
temp = enrich.enrich_wrapper('LMNA_FB_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed/')
temp.query('title == "Regulation of Complement cascade - Reactome"')

100%|██████████████████████████████████████| 401/401 [00:00<00:00, 16790.78it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
143,Regulation of Complement cascade - Reactome,0.268844,7,117,"[IGKV1D-16, IGKC, MASP1, ELANE, CFD, C3, C1R]",http://model.geneontology.org/R-HSA-977606


### RBM20

In [76]:
RBM20_comb = enrich.enrich_wrapper('RBM20_FB_comb.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
RBM20_comb

100%|████████████████████████████████████████████████████████████████████████| 283/283 [00:00<00:00, 18595.21it/s]


Analysis run on 193 entities from 163 out of 625 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Formation of ATP by chemiosmotic coupling - Reactome,1.5e-05,15,88,"[MT-ND4, MT-ND2, MT-ND1, MT-ATP6, MT-ND6, MT-CYB, MT-CO1, UQCRB, MT-CO2, MT-ND3, MT-CO3, MT-ND5,...",http://model.geneontology.org/R-HSA-163210
1,Peptide chain elongation - Reactome,1.9e-05,14,79,"[RPS17, RPL13A, RPS18, RPS15A, RPL27A, RPLP2, RPL34, RPS20, RPLP0, RPS5, RPL7, RPL38, RPL27, sse...",http://model.geneontology.org/R-HSA-156902
2,The proton buffering model - Reactome,2.2e-05,13,70,"[MT-ND4, MT-ND2, MT-ND1, MT-ND6, MT-CYB, MT-CO1, UQCRB, MT-CO2, MT-ND3, MT-CO3, MT-ND5, NDUFA1, ...",http://model.geneontology.org/R-HSA-167827
3,The fatty acid cycling model - Reactome,2.2e-05,13,70,"[MT-ND4, MT-ND2, MT-ND1, MT-ND6, MT-CYB, MT-CO1, UQCRB, MT-CO2, MT-ND3, MT-CO3, MT-ND5, NDUFA1, ...",http://model.geneontology.org/R-HSA-167826
4,Eukaryotic Translation Elongation - Reactome,3.4e-05,14,83,"[RPS17, RPL13A, RPS18, RPS15A, RPL27A, RPLP2, RPL34, RPS20, RPLP0, RPS5, RPL7, RPL38, RPL27, sse...",http://model.geneontology.org/R-HSA-156842
5,Regulation of Complement cascade - Reactome,9.2e-05,5,11,"[C3, sset:C4 activators, sset:Activated thrombin, (ELANE), sset:C3 convertases, sset:CD46, CR1:C...",http://model.geneontology.org/R-HSA-977606


In [77]:
enrich.compare2standard(RBM20_comb, 'RBM20_FB_comb.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████████████████████████████████████| 283/283 [00:00<00:00, 16847.22it/s]
100%|████████████████████████████████████████████████████████████████████████| 283/283 [00:00<00:00, 17257.25it/s]


Standard method yields 6 results, 1 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
5,Regulation of Complement cascade - Reactome,9.2e-05,5,11,"[C3, sset:C4 activators, sset:Activated thrombin, (ELANE), sset:C3 convertases, sset:CD46, CR1:C...",http://model.geneontology.org/R-HSA-977606,117


### PKP2

In [78]:
PKP2_comb = enrich.enrich_wrapper('PKP2_FB_comb.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
PKP2_comb

100%|████████████████████████████████████████████████████████████████████████| 358/358 [00:00<00:00, 16487.08it/s]


Analysis run on 239 entities from 191 out of 725 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Synthesis of PIPs at the plasma membrane - Reactome,3.019942e-10,12,21,"[PTEN, sset:INPP4A/B, sset:PI3K-regulatory subunit, sset:PIK3(2), sset:Activator:PI3K, sset:SYNJ...",http://model.geneontology.org/R-HSA-1660499
1,PI3K/AKT Signaling - Reactome,1.091417e-05,7,15,"[PIK3R1, PIK3CA, sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PI3K alpha, beta, gamma...",http://model.geneontology.org/R-HSA-1257604
2,Negative regulation of the PI3K/AKT network - Reactome,1.420146e-05,5,7,"[PTEN, sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PIK3C(1), sset:PI3K-catalytic sub...",http://model.geneontology.org/R-HSA-199418
3,CD28 dependent PI3K/Akt signaling - Reactome,1.841759e-05,7,16,"[PIK3R1, PIK3CA, sset:PI3K-regulatory subunit, sset:Active AKT, sset:Activator:PI3K, sset:PI3K a...",http://model.geneontology.org/R-HSA-389357
4,G alpha (i) signalling events - Reactome,3.601886e-05,5,8,"[sset:Light-sensing opsins, sset:G alpha (i), sset:Adenylate cyclase, sset:Ligand:GPCR complexes...",http://model.geneontology.org/R-HSA-418594
5,"PI5P, PP2A and IER3 Regulate PI3K/AKT Signaling - Reactome",5.267639e-05,6,13,"[PTEN, sset:Activated SRC,LCK,EGFR,INSR, sset:Activator:PI3K, sset:SYNJ/MTM(1), sset:SYNJ/INPP5(...",http://model.geneontology.org/R-HSA-6811558
6,VEGFR2 mediated cell proliferation - Reactome,0.000122421,8,27,"[PTEN, SPHK1, sset:PI3K-regulatory subunit, sset:G(q) alpha 11,14,15,Q, sset:G-protein alpha (q/...",http://model.geneontology.org/R-HSA-5218921
7,G beta:gamma signalling through PI3Kgamma - Reactome,0.0001466514,5,10,"[PIK3R1, PIK3CA, sset:PI3K-regulatory subunit, sset:PI3K alpha, beta, gamma, sset:PI3K-catalytic...",http://model.geneontology.org/R-HSA-392451


In [79]:
enrich.compare2standard(PKP2_comb, 'PKP2_FB_comb.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████████████████████████████████████| 358/358 [00:00<00:00, 18156.94it/s]
100%|████████████████████████████████████████████████████████████████████████| 358/358 [00:00<00:00, 17138.37it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Synthesis of PIPs at the plasma membrane - Reactome,3.019942e-10,12,21,"[PTEN, sset:INPP4A/B, sset:PI3K-regulatory subunit, sset:PIK3(2), sset:Activator:PI3K, sset:SYNJ...",http://model.geneontology.org/R-HSA-1660499,120
1,PI3K/AKT Signaling - Reactome,1.091417e-05,7,15,"[PIK3R1, PIK3CA, sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PI3K alpha, beta, gamma...",http://model.geneontology.org/R-HSA-1257604,114
2,Negative regulation of the PI3K/AKT network - Reactome,1.420146e-05,5,7,"[PTEN, sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PIK3C(1), sset:PI3K-catalytic sub...",http://model.geneontology.org/R-HSA-199418,95
3,CD28 dependent PI3K/Akt signaling - Reactome,1.841759e-05,7,16,"[PIK3R1, PIK3CA, sset:PI3K-regulatory subunit, sset:Active AKT, sset:Activator:PI3K, sset:PI3K a...",http://model.geneontology.org/R-HSA-389357,115
4,G alpha (i) signalling events - Reactome,3.601886e-05,5,8,"[sset:Light-sensing opsins, sset:G alpha (i), sset:Adenylate cyclase, sset:Ligand:GPCR complexes...",http://model.geneontology.org/R-HSA-418594,250
5,"PI5P, PP2A and IER3 Regulate PI3K/AKT Signaling - Reactome",5.267639e-05,6,13,"[PTEN, sset:Activated SRC,LCK,EGFR,INSR, sset:Activator:PI3K, sset:SYNJ/MTM(1), sset:SYNJ/INPP5(...",http://model.geneontology.org/R-HSA-6811558,115
6,VEGFR2 mediated cell proliferation - Reactome,0.000122421,8,27,"[PTEN, SPHK1, sset:PI3K-regulatory subunit, sset:G(q) alpha 11,14,15,Q, sset:G-protein alpha (q/...",http://model.geneontology.org/R-HSA-5218921,55
7,G beta:gamma signalling through PI3Kgamma - Reactome,0.0001466514,5,10,"[PIK3R1, PIK3CA, sset:PI3K-regulatory subunit, sset:PI3K alpha, beta, gamma, sset:PI3K-catalytic...",http://model.geneontology.org/R-HSA-392451,26


In [26]:
temp = enrich.enrich_wrapper('PKP2_FB_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed/')
temp.query('title == "Negative regulation of the PI3K/AKT network - Reactome"')['shared entities in gocam'].values

100%|██████████████████████████████████████| 358/358 [00:00<00:00, 18066.93it/s]


array([list(['IRS2', 'PIK3R1', 'PIK3CA', 'PTEN', 'EGFR', 'HGF', 'PDGFRB', 'FGFR4', 'FGF7'])],
      dtype=object)

In [28]:
temp = enrich.enrich_wrapper('PKP2_FB_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed/')
temp.query('title == "PI3K/AKT Signaling - Reactome"')['shared entities in gocam'].values

100%|██████████████████████████████████████| 358/358 [00:00<00:00, 16796.55it/s]


array([list(['IRS2', 'PIK3R1', 'PIK3CA', 'PIP4K2A', 'EGFR', 'HGF', 'PDGFRB', 'FGFR4', 'FGF7'])],
      dtype=object)

In [29]:
temp = enrich.enrich_wrapper('PKP2_FB_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed/')
temp.query('title == "VEGFR2 mediated cell proliferation - Reactome"')['shared entities in gocam'].values

100%|██████████████████████████████████████| 358/358 [00:00<00:00, 17249.80it/s]


array([list(['PIK3R1', 'PIK3CA', 'PTEN', 'SPHK1', 'GNA14', 'ORAI2', 'PLCD3'])],
      dtype=object)

### PVneg

In [81]:
PVneg_comb = enrich.enrich_wrapper('PVneg_FB_comb.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
PVneg_comb

100%|████████████████████████████████████████████████████████████████████████| 452/452 [00:00<00:00, 17692.86it/s]


Analysis run on 396 entities from 338 out of 1135 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Peptide chain elongation - Reactome,9.719755e-32,50,79,"[RPS17, RPL13A, RPLP0, RPS18, RPLP2, RPL27A, RPS15A, RPS2, RPL7, RPL35, RPLP1, RPS20, RPS28, RPS...",http://model.geneontology.org/R-HSA-156902
1,Eukaryotic Translation Elongation - Reactome,1.905043e-31,51,83,"[RPS17, RPL13A, RPLP0, RPS18, RPLP2, RPL27A, RPS15A, RPS2, RPL7, RPL35, RPLP1, RPS20, RPS28, RPS...",http://model.geneontology.org/R-HSA-156842
2,Formation of ATP by chemiosmotic coupling - Reactome,1.893885e-09,29,88,"[MT-CO1, MT-ND4, MT-ND2, ATP5MC1, CYC1, COX5B, MT-CYB, UQCRB, MT-ATP6, ATP5MC3, COX7C, NDUFA1, M...",http://model.geneontology.org/R-HSA-163210
3,Cytoprotection by HMOX1 - Reactome,1.492498e-08,15,29,"[MT-CO1, CYC1, COX5B, MT-CYB, UQCRB, COX7C, MT-CO2, MT-CO3, UQCRFS1, NDUFA4, UQCRQ, UQCR10, COX4...",http://model.geneontology.org/R-HSA-9707564
4,The proton buffering model - Reactome,1.016285e-07,23,70,"[MT-CO1, MT-ND4, MT-ND2, CYC1, COX5B, MT-CYB, UQCRB, COX7C, NDUFA1, MT-CO2, MT-CO3, MT-ND1, UQCR...",http://model.geneontology.org/R-HSA-167827
5,The fatty acid cycling model - Reactome,1.016285e-07,23,70,"[MT-CO1, MT-ND4, MT-ND2, CYC1, COX5B, MT-CYB, UQCRB, COX7C, NDUFA1, MT-CO2, MT-CO3, MT-ND1, UQCR...",http://model.geneontology.org/R-HSA-167826
6,TP53 Regulates Metabolic Genes - Reactome,1.262288e-07,18,46,"[G6PD, MT-CO1, CYC1, COX5B, MT-CYB, UQCRB, COX7C, PRDX1, MT-CO2, MT-CO3, UQCRFS1, NDUFA4, UQCRQ,...",http://model.geneontology.org/R-HSA-5628897
7,Respiratory electron transport - Reactome,1.332884e-06,25,91,"[MT-CO1, MT-ND4, MT-ND2, CYC1, COX5B, MT-CYB, UQCRB, COX7C, NDUFA1, MT-CO2, MT-CO3, MT-ND1, UQCR...",http://model.geneontology.org/R-HSA-611105
8,Regulation of Complement cascade - Reactome,2.112929e-05,7,11,"[C3, C1R, sset:C4 activators, sset:Complement factor D, sset:Activated thrombin, (ELANE), sset:C...",http://model.geneontology.org/R-HSA-977606


In [82]:
PVneg_comb.loc[8]['shared entities in gocam']

['C3',
 'C1R',
 'sset:C4 activators',
 'sset:Complement factor D',
 'sset:Activated thrombin, (ELANE)',
 'sset:C3 convertases',
 'sset:CD46, CR1:C4b:C3b complexes']

In [83]:
st = enrich.enrich_wrapper('PVneg_FB_comb.csv','Gene Symbol',method='standard',show_significant = False ,fpath = '../test_data/processed/')
st.query('title == "Regulation of Complement cascade - Reactome"')['shared entities in gocam']

100%|████████████████████████████████████████████████████████████████████████| 452/452 [00:00<00:00, 17158.81it/s]


445    [ELANE, CFD, C3, C1R]
Name: shared entities in gocam, dtype: object

In [84]:
enrich.compare2standard(PVneg_comb, 'PVneg_FB_comb.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████████████████████████████████████| 452/452 [00:00<00:00, 16756.75it/s]
100%|████████████████████████████████████████████████████████████████████████| 452/452 [00:00<00:00, 17283.64it/s]


Standard method yields 8 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
8,Regulation of Complement cascade - Reactome,2.1e-05,7,11,"[C3, C1R, sset:C4 activators, sset:Complement factor D, sset:Activated thrombin, (ELANE), sset:C...",http://model.geneontology.org/R-HSA-977606,117


## SMC2 DEGs

### Preprocess

In [18]:
xls_down = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S27_LV_MC_States/S27_LV_SMC2/LV_SMC2_Upregulated_genes_Controls.xlsx')
xls_up = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S27_LV_MC_States/S27_LV_SMC2/LV_SMC2_Upregulated_genes_Disease_Genotype.xlsx')

for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df_up = pd.read_excel(xls_up,f'{genotype}_control')
    df_down = pd.read_excel(xls_down,f'control_{genotype}')
    
    
    for sign, df in {'up':df_up,'down':df_down}.items():
        df = df.query(f'logFC > {logFC} and PValue < {PV}')
        df['Gene'].to_csv(f'../test_data/processed/{genotype}_SMC2_{sign}.csv', header = False, index= False)
        print(genotype,sign,':',len(df), 'genes')

LMNA up : 311 genes
LMNA down : 243 genes
PKP2 up : 189 genes
PKP2 down : 117 genes
RBM20 up : 191 genes
RBM20 down : 141 genes
TTN up : 65 genes
TTN down : 67 genes
PVneg up : 66 genes
PVneg down : 49 genes


### LMNA

In [40]:
cell = 'SMC2'
genotype = 'LMNA'

In [41]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 189/189 [00:00<00:00, 18110.70it/s]


Analysis run on 108 entities from 63 out of 311 input genes


100%|██████████████████████████████████████| 189/189 [00:00<00:00, 16893.05it/s]
100%|██████████████████████████████████████| 189/189 [00:00<00:00, 17059.19it/s]


Standard method yields 1 results, 1 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Synthesis of PIPs at the plasma membrane - Reactome,7.543267e-07,7,21,"[sset:SYNJ/MTM(1), sset:PI3K-regulatory subunit, sset:SYNJs,OCRL, sset:Activator:PI3K, sset:SYNJ...",http://model.geneontology.org/R-HSA-1660499,120


In [43]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 221/221 [00:00<00:00, 18159.29it/s]


Analysis run on 100 entities from 66 out of 243 input genes


100%|██████████████████████████████████████| 221/221 [00:00<00:00, 17875.98it/s]
100%|██████████████████████████████████████| 221/221 [00:00<00:00, 17918.14it/s]


Standard method yields 1 results, 1 of which are unique


### PKP2

In [44]:
cell = 'SMC2'
genotype = 'PKP2'

In [45]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 131/131 [00:00<00:00, 17972.45it/s]


Analysis run on 66 entities from 42 out of 189 input genes


100%|██████████████████████████████████████| 131/131 [00:00<00:00, 17657.67it/s]
100%|██████████████████████████████████████| 131/131 [00:00<00:00, 17509.12it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,PI3K events in ERBB2 signaling - Reactome,4.4e-05,4,13,"[PIK3R1, EGFR, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers]",http://model.geneontology.org/R-HSA-1963642,19
1,PI3K/AKT Signaling - Reactome,8.2e-05,4,15,"[PIK3R1, sset:PI3K alpha, beta, gamma, sset:PI3K-regulatory subunit, sset:Activator:PI3K]",http://model.geneontology.org/R-HSA-1257604,114
2,PLCG1 events in ERBB2 signaling - Reactome,8.4e-05,3,6,"[EGFR, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers]",http://model.geneontology.org/R-HSA-1251932,16
3,CD28 dependent PI3K/Akt signaling - Reactome,0.000108,4,16,"[PIK3R1, sset:PI3K alpha, beta, gamma, sset:PI3K-regulatory subunit, sset:Activator:PI3K]",http://model.geneontology.org/R-HSA-389357,115
4,Negative regulation of the PI3K/AKT network - Reactome,0.000145,3,7,"[sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PIK3C(1)]",http://model.geneontology.org/R-HSA-199418,95
5,SHC1 events in ERBB2 signaling - Reactome,0.000229,3,8,"[sset:p-ERBB2 heterodimers, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers]",http://model.geneontology.org/R-HSA-1250196,21
6,VEGFR2 mediated vascular permeability - Reactome,0.000273,4,20,"[PIK3R1, sset:PI3K alpha, beta, gamma, sset:PI3K-regulatory subunit, sset:Activator:PI3K]",http://model.geneontology.org/R-HSA-5218920,117


In [46]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 150/150 [00:00<00:00, 18006.97it/s]


Analysis run on 46 entities from 31 out of 117 input genes


100%|██████████████████████████████████████| 150/150 [00:00<00:00, 17540.09it/s]
100%|██████████████████████████████████████| 150/150 [00:00<00:00, 18074.22it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Synthesis of PIPs at the plasma membrane - Reactome,3e-06,5,21,"[BMX, sset:PIP5K1A/B, sset:PIP4K2/5K1, sset:PIP5K1A-C, sset:Activator:PI3K]",http://model.geneontology.org/R-HSA-1660499,120


### RBM20

In [47]:
cell = 'SMC2'
genotype = 'RBM20'

In [48]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 120/120 [00:00<00:00, 17468.99it/s]


Analysis run on 68 entities from 40 out of 191 input genes


100%|██████████████████████████████████████| 120/120 [00:00<00:00, 17189.77it/s]
100%|██████████████████████████████████████| 120/120 [00:00<00:00, 15546.45it/s]


Standard method yields 0 results, 0 of which are unique


In [49]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 122/122 [00:00<00:00, 17534.36it/s]


Analysis run on 61 entities from 46 out of 141 input genes


100%|██████████████████████████████████████| 122/122 [00:00<00:00, 17760.75it/s]
100%|██████████████████████████████████████| 122/122 [00:00<00:00, 16366.71it/s]


Standard method yields 1 results, 1 of which are unique


### TTN

In [50]:
cell = 'SMC2'
genotype = 'TTN'

In [51]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 18/18 [00:00<00:00, 13751.82it/s]


Analysis run on 12 entities from 7 out of 65 input genes


100%|████████████████████████████████████████| 18/18 [00:00<00:00, 15154.05it/s]
100%|████████████████████████████████████████| 18/18 [00:00<00:00, 13493.74it/s]


Standard method yields 0 results, 0 of which are unique


In [52]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 40/40 [00:00<00:00, 17093.44it/s]


Analysis run on 29 entities from 16 out of 67 input genes


100%|████████████████████████████████████████| 40/40 [00:00<00:00, 16836.14it/s]
100%|████████████████████████████████████████| 40/40 [00:00<00:00, 16841.21it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Regulation of Complement cascade - Reactome,7.226056e-07,4,11,"[C3, sset:C3 convertases, sset:Complement factor D, sset:CD46, CR1:C4b:C3b complexes]",http://model.geneontology.org/R-HSA-977606,117
1,Synthesis of PA - Reactome,1.825223e-05,4,23,"[PLA2G2A, sset:GPAM/GPAT2, sset:PLA2(15), sset:GPAM or GPAT2]",http://model.geneontology.org/R-HSA-1483166,45


### PVneg

In [53]:
cell = 'SMC2'
genotype = 'PVneg'

In [54]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 26/26 [00:00<00:00, 15749.84it/s]


Analysis run on 19 entities from 14 out of 66 input genes


100%|████████████████████████████████████████| 26/26 [00:00<00:00, 15940.93it/s]
100%|████████████████████████████████████████| 26/26 [00:00<00:00, 16013.50it/s]


Standard method yields 0 results, 0 of which are unique


In [55]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 29/29 [00:00<00:00, 16283.11it/s]


Analysis run on 19 entities from 17 out of 49 input genes


100%|████████████████████████████████████████| 29/29 [00:00<00:00, 16320.25it/s]
100%|████████████████████████████████████████| 29/29 [00:00<00:00, 16384.00it/s]


Standard method yields 0 results, 0 of which are unique


## LV cardiomyocyte

In [72]:
xls_up = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S7_LV_CM_Upregulated_Genes_Disease_Genotype_All_States.xlsx')
xls_down = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S8_LV_CM_Upregulated_Genes_Controls_All_States.xlsx')


for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df_up = pd.read_excel(xls_up,f'{genotype}_control')
    df_down = pd.read_excel(xls_down,f'control_{genotype}')
    
    
    for sign, df in {'up':df_up,'down':df_down}.items():
        df = df.query(f'logFC > {logFC} and PValue < {PV}')
        df['Gene'].to_csv(f'../test_data/processed/{genotype}_CM_{sign}.csv', header = False, index= False)
        print(genotype,sign,':',len(df), 'genes')

LMNA up : 290 genes
LMNA down : 399 genes
PKP2 up : 200 genes
PKP2 down : 203 genes
RBM20 up : 340 genes
RBM20 down : 193 genes
TTN up : 217 genes
TTN down : 385 genes
PVneg up : 135 genes
PVneg down : 43 genes


### LMNA

In [73]:
cell = 'CM'
genotype = 'LMNA'

In [74]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 179/179 [00:00<00:00, 18029.84it/s]


Analysis run on 99 entities from 55 out of 290 input genes


100%|██████████████████████████████████████| 179/179 [00:00<00:00, 17318.24it/s]
100%|██████████████████████████████████████| 179/179 [00:00<00:00, 18098.94it/s]


Standard method yields 0 results, 0 of which are unique


In [58]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 269/269 [00:00<00:00, 18412.88it/s]


Analysis run on 132 entities from 91 out of 328 input genes


100%|██████████████████████████████████████| 269/269 [00:00<00:00, 17950.04it/s]
100%|██████████████████████████████████████| 269/269 [00:00<00:00, 16500.93it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### PKP2

In [75]:
cell = 'CM'
genotype = 'PKP2'

In [76]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 140/140 [00:00<00:00, 18141.45it/s]


Analysis run on 67 entities from 31 out of 200 input genes


100%|██████████████████████████████████████| 140/140 [00:00<00:00, 17648.02it/s]
100%|██████████████████████████████████████| 140/140 [00:00<00:00, 17672.45it/s]


Standard method yields 3 results, 3 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Synthesis of PIPs at the plasma membrane - Reactome,1.9e-05,5,21,"[sset:SYNJ/MTM(1), sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PIK3C(1), sset:PIK3(2)]",http://model.geneontology.org/R-HSA-1660499,120
1,PI3K/AKT Signaling - Reactome,8.7e-05,4,15,"[PIK3R1, sset:PI3K alpha, beta, gamma, sset:PI3K-regulatory subunit, sset:Activator:PI3K]",http://model.geneontology.org/R-HSA-1257604,114
2,CD28 dependent PI3K/Akt signaling - Reactome,0.000114,4,16,"[PIK3R1, sset:PI3K alpha, beta, gamma, sset:PI3K-regulatory subunit, sset:Activator:PI3K]",http://model.geneontology.org/R-HSA-389357,115
3,Negative regulation of the PI3K/AKT network - Reactome,0.000152,3,7,"[sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PIK3C(1)]",http://model.geneontology.org/R-HSA-199418,95


In [77]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 216/216 [00:00<00:00, 16581.01it/s]


Analysis run on 95 entities from 63 out of 203 input genes


100%|██████████████████████████████████████| 216/216 [00:00<00:00, 16953.03it/s]
100%|██████████████████████████████████████| 216/216 [00:00<00:00, 16666.72it/s]


Standard method yields 0 results, 0 of which are unique


### RBM20

In [78]:
cell = 'CM'
genotype = 'RBM20'

In [79]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 146/146 [00:00<00:00, 17877.28it/s]


Analysis run on 100 entities from 68 out of 340 input genes


100%|██████████████████████████████████████| 146/146 [00:00<00:00, 17188.32it/s]
100%|██████████████████████████████████████| 146/146 [00:00<00:00, 16839.96it/s]


Standard method yields 0 results, 0 of which are unique


In [80]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 93/93 [00:00<00:00, 15541.27it/s]


Analysis run on 64 entities from 47 out of 193 input genes


100%|████████████████████████████████████████| 93/93 [00:00<00:00, 17164.81it/s]
100%|████████████████████████████████████████| 93/93 [00:00<00:00, 15545.60it/s]


Standard method yields 0 results, 0 of which are unique


### TTN

In [81]:
cell = 'CM'
genotype = 'TTN'

In [82]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 147/147 [00:00<00:00, 16798.24it/s]


Analysis run on 76 entities from 35 out of 217 input genes


100%|██████████████████████████████████████| 147/147 [00:00<00:00, 16132.36it/s]
100%|██████████████████████████████████████| 147/147 [00:00<00:00, 16653.50it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Synthesis of PIPs at the plasma membrane - Reactome,3.6e-05,5,21,"[sset:SYNJ/MTM(1), sset:PI3K-regulatory subunit, sset:Activator:PI3K, sset:PIK3C(1), sset:PIK3(2)]",http://model.geneontology.org/R-HSA-1660499,120


In [83]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 271/271 [00:00<00:00, 18246.64it/s]


Analysis run on 125 entities from 98 out of 385 input genes


100%|██████████████████████████████████████| 271/271 [00:00<00:00, 17018.36it/s]
100%|██████████████████████████████████████| 271/271 [00:00<00:00, 17913.64it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Regulation of Complement cascade - Reactome,3.445072e-07,6,11,"[C1R, C3, sset:C3 convertases, sset:Complement factor D, sset:C4 activators, sset:CD46, CR1:C4b:...",http://model.geneontology.org/R-HSA-977606,117


### PVneg

In [84]:
cell = 'CM'
genotype = 'PVneg'

In [89]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 57/57 [00:00<00:00, 17141.70it/s]


Analysis run on 33 entities from 25 out of 135 input genes


100%|████████████████████████████████████████| 57/57 [00:00<00:00, 16364.93it/s]
100%|████████████████████████████████████████| 57/57 [00:00<00:00, 16940.08it/s]


Standard method yields 0 results, 0 of which are unique


In [90]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 43/43 [00:00<00:00, 16758.51it/s]


Analysis run on 28 entities from 17 out of 43 input genes


100%|████████████████████████████████████████| 43/43 [00:00<00:00, 16653.28it/s]
100%|████████████████████████████████████████| 43/43 [00:00<00:00, 16487.35it/s]


Standard method yields 0 results, 0 of which are unique


## Myeloid

In [20]:
xls_up = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S40_LV_Myeloids_Upregulated_Genes_Disease_Genotypes_All_States.xlsx')
xls_down = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S41_LV_Myeloids_Upregulated_Genes_Controls_All_States.xlsx')


for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df_up = pd.read_excel(xls_up,f'{genotype}_control')
    df_down = pd.read_excel(xls_down,f'control_{genotype}')
    
    
    for sign, df in {'up':df_up,'down':df_down}.items():
        df = df.query(f'logFC > {logFC} and PValue < {PV}')
        df['Gene'].to_csv(f'../test_data/processed/{genotype}_MYL_{sign}.csv', header = False, index= False)
        print(genotype,sign,':',len(df), 'genes')

LMNA up : 187 genes
LMNA down : 328 genes
PKP2 up : 302 genes
PKP2 down : 287 genes
RBM20 up : 65 genes
RBM20 down : 106 genes
TTN up : 181 genes
TTN down : 225 genes
PVneg up : 139 genes
PVneg down : 110 genes


### LMNA

In [56]:
cell = 'MYL'
genotype = 'LMNA'

In [57]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 208/208 [00:00<00:00, 18303.06it/s]


Analysis run on 74 entities from 48 out of 187 input genes


100%|██████████████████████████████████████| 208/208 [00:00<00:00, 17189.09it/s]
100%|██████████████████████████████████████| 208/208 [00:00<00:00, 18040.02it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Role of phospholipids in phagocytosis - Reactome,2.508766e-09,8,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:PKC-delta/epsilon, sset:GNB, sset:PL...",http://model.geneontology.org/R-HSA-2029485,52
1,DAG and IP3 signaling - Reactome,1.549593e-08,7,18,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:PKC-delta/epsilon, sset:GNB, sset:PL...",http://model.geneontology.org/R-HSA-1489509,45
2,PLC beta mediated events - Reactome,1.469551e-07,6,15,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-112043,39
3,Antigen activates B Cell Receptor (BCR) leading to generation of second messengers - Reactome,2.109556e-07,7,25,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:LYN,FYN,BLK, sset:GNB, sset:PLC beta...",http://model.geneontology.org/R-HSA-983695,145
4,CLEC7A (Dectin-1) induces NFAT activation - Reactome,1.975895e-06,6,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5607763,47
5,Regulation of insulin secretion - Reactome,1.975895e-06,6,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-422356,51
6,FCERI mediated Ca+2 mobilization - Reactome,5.747889e-06,6,26,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-2871809,52
7,SHC1 events in ERBB2 signaling - Reactome,7.268563e-06,4,8,"[sset:p-ERBB2 heterodimers, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers, sset:Activa...",http://model.geneontology.org/R-HSA-1250196,21
8,Ca2+ pathway - Reactome,7.282055e-06,6,27,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-4086398,57
9,VEGFR2 mediated cell proliferation - Reactome,7.282055e-06,6,27,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5218921,55


In [58]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 269/269 [00:00<00:00, 18412.88it/s]


Analysis run on 132 entities from 91 out of 328 input genes


100%|██████████████████████████████████████| 269/269 [00:00<00:00, 17950.04it/s]
100%|██████████████████████████████████████| 269/269 [00:00<00:00, 16500.93it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### PKP2

In [59]:
cell = 'MYL'
genotype = 'PKP2'

In [60]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 249/249 [00:00<00:00, 17868.56it/s]


Analysis run on 108 entities from 73 out of 302 input genes


100%|██████████████████████████████████████| 249/249 [00:00<00:00, 17956.12it/s]
100%|██████████████████████████████████████| 249/249 [00:00<00:00, 17793.06it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,SHC1 events in ERBB2 signaling - Reactome,3.3e-05,4,8,"[sset:p-ERBB2 heterodimers, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers, sset:Activa...",http://model.geneontology.org/R-HSA-1250196,21
1,Signaling by ERBB2 - Reactome,7.6e-05,3,4,"[sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers, sset:p-Y419/420/426-N-myristoyl-SRC/FY...",http://model.geneontology.org/R-HSA-1227986,18


In [61]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 151/151 [00:00<00:00, 18194.72it/s]


Analysis run on 104 entities from 68 out of 287 input genes


100%|██████████████████████████████████████| 151/151 [00:00<00:00, 17660.47it/s]
100%|██████████████████████████████████████| 151/151 [00:00<00:00, 18036.16it/s]


Standard method yields 0 results, 0 of which are unique


### RBM20

In [62]:
cell = 'MYL'
genotype = 'RBM20'

In [63]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 99/99 [00:00<00:00, 16903.57it/s]


Analysis run on 26 entities from 15 out of 65 input genes


100%|████████████████████████████████████████| 99/99 [00:00<00:00, 17074.55it/s]
100%|████████████████████████████████████████| 99/99 [00:00<00:00, 16264.00it/s]


Standard method yields 0 results, 0 of which are unique


In [64]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 117/117 [00:00<00:00, 17835.78it/s]


Analysis run on 47 entities from 39 out of 106 input genes


100%|██████████████████████████████████████| 117/117 [00:00<00:00, 15878.78it/s]
100%|██████████████████████████████████████| 117/117 [00:00<00:00, 17646.57it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### TTN

In [65]:
cell = 'MYL'
genotype = 'TTN'

In [66]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 161/161 [00:00<00:00, 18007.06it/s]


Analysis run on 57 entities from 45 out of 181 input genes


100%|██████████████████████████████████████| 161/161 [00:00<00:00, 17713.27it/s]
100%|██████████████████████████████████████| 161/161 [00:00<00:00, 16535.25it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,CLEC7A (Dectin-1) induces NFAT activation - Reactome,4.106716e-07,6,22,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5607763,47
1,Antigen activates B Cell Receptor (BCR) leading to generation of second messengers - Reactome,9.429803e-07,6,25,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-983695,145
2,FCERI mediated Ca+2 mobilization - Reactome,1.212404e-06,6,26,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-2871809,52
3,PLC beta mediated events - Reactome,1.352519e-06,5,15,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-112043,39
4,VEGFR2 mediated cell proliferation - Reactome,1.541678e-06,6,27,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5218921,55
5,DAG and IP3 signaling - Reactome,3.734337e-06,5,18,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-1489509,45
6,Role of phospholipids in phagocytosis - Reactome,1.098634e-05,5,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-2029485,52
7,Regulation of insulin secretion - Reactome,1.098634e-05,5,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-422356,51
8,G alpha (s) signalling events - Reactome,1.155052e-05,4,11,"[PDE3B, sset:cAMP PDEs, sset:GNB, sset:Ligand:GPCR complexes that activate Gs]",http://model.geneontology.org/R-HSA-418555,139
9,Ca2+ pathway - Reactome,3.188866e-05,5,27,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-4086398,57


In [67]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 183/183 [00:00<00:00, 17436.96it/s]


Analysis run on 114 entities from 74 out of 225 input genes


100%|██████████████████████████████████████| 183/183 [00:00<00:00, 17553.80it/s]
100%|██████████████████████████████████████| 183/183 [00:00<00:00, 17074.29it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### PVneg

In [68]:
cell = 'MYL'
genotype = 'PVneg'

In [69]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 184/184 [00:00<00:00, 17250.86it/s]


Analysis run on 70 entities from 39 out of 139 input genes


100%|██████████████████████████████████████| 184/184 [00:00<00:00, 17799.94it/s]
100%|██████████████████████████████████████| 184/184 [00:00<00:00, 16766.65it/s]


Standard method yields 1 results, 1 of which are unique


In [70]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 101/101 [00:00<00:00, 18029.65it/s]


Analysis run on 41 entities from 31 out of 110 input genes


100%|██████████████████████████████████████| 101/101 [00:00<00:00, 17747.16it/s]
100%|██████████████████████████████████████| 101/101 [00:00<00:00, 17065.81it/s]


Standard method yields 0 results, 0 of which are unique


## LV FB

In [91]:
xls_down = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S16_LV_FB_Upregulated_Genes_Controls_All_States.xlsx')
xls_up = pd.ExcelFile('../test_data/unprocessed/science.abo1984_tables_s1_to_s71/S17_LV_FB_Upregulated_Genes_Disease_Genotypes_All_States.xlsx')


for genotype in ['LMNA','PKP2','RBM20','TTN','PVneg']:
    
    df_up = pd.read_excel(xls_up,f'{genotype}_control')
    df_down = pd.read_excel(xls_down,f'control_{genotype}')
    
    
    for sign, df in {'up':df_up,'down':df_down}.items():
        df = df.query(f'logFC > {logFC} and PValue < {PV}')
        df['Gene'].to_csv(f'../test_data/processed/{genotype}_MYL_{sign}.csv', header = False, index= False)
        print(genotype,sign,':',len(df), 'genes')

LMNA up : 347 genes
LMNA down : 525 genes
PKP2 up : 342 genes
PKP2 down : 383 genes
RBM20 up : 259 genes
RBM20 down : 366 genes
TTN up : 385 genes
TTN down : 707 genes
PVneg up : 398 genes
PVneg down : 737 genes


### LMNA

In [92]:
cell = 'FB'
genotype = 'LMNA'

In [93]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 208/208 [00:00<00:00, 18019.52it/s]


Analysis run on 74 entities from 48 out of 187 input genes


100%|██████████████████████████████████████| 208/208 [00:00<00:00, 17832.41it/s]
100%|██████████████████████████████████████| 208/208 [00:00<00:00, 15539.99it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Role of phospholipids in phagocytosis - Reactome,2.508766e-09,8,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:PKC-delta/epsilon, sset:GNB, sset:PL...",http://model.geneontology.org/R-HSA-2029485,52
1,DAG and IP3 signaling - Reactome,1.549593e-08,7,18,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:PKC-delta/epsilon, sset:GNB, sset:PL...",http://model.geneontology.org/R-HSA-1489509,45
2,PLC beta mediated events - Reactome,1.469551e-07,6,15,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-112043,39
3,Antigen activates B Cell Receptor (BCR) leading to generation of second messengers - Reactome,2.109556e-07,7,25,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:LYN,FYN,BLK, sset:GNB, sset:PLC beta...",http://model.geneontology.org/R-HSA-983695,145
4,CLEC7A (Dectin-1) induces NFAT activation - Reactome,1.975895e-06,6,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5607763,47
5,Regulation of insulin secretion - Reactome,1.975895e-06,6,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-422356,51
6,FCERI mediated Ca+2 mobilization - Reactome,5.747889e-06,6,26,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-2871809,52
7,SHC1 events in ERBB2 signaling - Reactome,7.268563e-06,4,8,"[sset:p-ERBB2 heterodimers, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers, sset:Activa...",http://model.geneontology.org/R-HSA-1250196,21
8,Ca2+ pathway - Reactome,7.282055e-06,6,27,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-4086398,57
9,VEGFR2 mediated cell proliferation - Reactome,7.282055e-06,6,27,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:PL(C)D4:3xCa2+, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5218921,55


In [94]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 269/269 [00:00<00:00, 18478.93it/s]


Analysis run on 132 entities from 91 out of 328 input genes


100%|██████████████████████████████████████| 269/269 [00:00<00:00, 18062.11it/s]
100%|██████████████████████████████████████| 269/269 [00:00<00:00, 17284.57it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### PKP2

In [95]:
cell = 'FB'
genotype = 'PKP2'

In [96]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 249/249 [00:00<00:00, 17028.05it/s]


Analysis run on 108 entities from 73 out of 302 input genes


100%|██████████████████████████████████████| 249/249 [00:00<00:00, 18101.14it/s]
100%|██████████████████████████████████████| 249/249 [00:00<00:00, 16822.07it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,SHC1 events in ERBB2 signaling - Reactome,3.3e-05,4,8,"[sset:p-ERBB2 heterodimers, sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers, sset:Activa...",http://model.geneontology.org/R-HSA-1250196,21
1,Signaling by ERBB2 - Reactome,7.6e-05,3,4,"[sset:p-Y877-ERBB2 heterodimers, sset:ERBB2 heterodimers, sset:p-Y419/420/426-N-myristoyl-SRC/FY...",http://model.geneontology.org/R-HSA-1227986,18


In [97]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 151/151 [00:00<00:00, 18318.92it/s]


Analysis run on 104 entities from 68 out of 287 input genes


100%|██████████████████████████████████████| 151/151 [00:00<00:00, 17889.44it/s]
100%|██████████████████████████████████████| 151/151 [00:00<00:00, 17965.56it/s]


Standard method yields 0 results, 0 of which are unique


### RBM20

In [98]:
cell = 'FB'
genotype = 'RBM20'

In [99]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|████████████████████████████████████████| 99/99 [00:00<00:00, 17876.53it/s]


Analysis run on 26 entities from 15 out of 65 input genes


100%|████████████████████████████████████████| 99/99 [00:00<00:00, 17133.74it/s]
100%|████████████████████████████████████████| 99/99 [00:00<00:00, 17364.45it/s]


Standard method yields 0 results, 0 of which are unique


In [100]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 117/117 [00:00<00:00, 16701.84it/s]


Analysis run on 47 entities from 39 out of 106 input genes


100%|██████████████████████████████████████| 117/117 [00:00<00:00, 17379.10it/s]
100%|██████████████████████████████████████| 117/117 [00:00<00:00, 17377.25it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### TTN

In [101]:
cell = 'FB'
genotype = 'TTN'

In [102]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 161/161 [00:00<00:00, 16860.16it/s]


Analysis run on 57 entities from 45 out of 181 input genes


100%|██████████████████████████████████████| 161/161 [00:00<00:00, 17668.78it/s]
100%|██████████████████████████████████████| 161/161 [00:00<00:00, 17335.84it/s]


Standard method yields 0 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,CLEC7A (Dectin-1) induces NFAT activation - Reactome,4.106716e-07,6,22,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5607763,47
1,Antigen activates B Cell Receptor (BCR) leading to generation of second messengers - Reactome,9.429803e-07,6,25,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-983695,145
2,FCERI mediated Ca+2 mobilization - Reactome,1.212404e-06,6,26,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-2871809,52
3,PLC beta mediated events - Reactome,1.352519e-06,5,15,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-112043,39
4,VEGFR2 mediated cell proliferation - Reactome,1.541678e-06,6,27,"[TRPC1, sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-5218921,55
5,DAG and IP3 signaling - Reactome,3.734337e-06,5,18,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-1489509,45
6,Role of phospholipids in phagocytosis - Reactome,1.098634e-05,5,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-2029485,52
7,Regulation of insulin secretion - Reactome,1.098634e-05,5,22,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-422356,51
8,G alpha (s) signalling events - Reactome,1.155052e-05,4,11,"[PDE3B, sset:cAMP PDEs, sset:GNB, sset:Ligand:GPCR complexes that activate Gs]",http://model.geneontology.org/R-HSA-418555,139
9,Ca2+ pathway - Reactome,3.188866e-05,5,27,"[sset:PLC-beta 1/2/3, sset:PLCbz, sset:GNB, sset:PLC beta1,2,3, sset:PLC-beta]",http://model.geneontology.org/R-HSA-4086398,57


In [103]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 183/183 [00:00<00:00, 16648.76it/s]


Analysis run on 114 entities from 74 out of 225 input genes


100%|██████████████████████████████████████| 183/183 [00:00<00:00, 17334.57it/s]
100%|██████████████████████████████████████| 183/183 [00:00<00:00, 16430.29it/s]


Standard method yields 1 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


### PVneg

In [104]:
cell = 'FB'
genotype = 'PVneg'

In [105]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 184/184 [00:00<00:00, 16994.82it/s]


Analysis run on 70 entities from 39 out of 139 input genes


100%|██████████████████████████████████████| 184/184 [00:00<00:00, 17857.19it/s]
100%|██████████████████████████████████████| 184/184 [00:00<00:00, 17784.76it/s]


Standard method yields 1 results, 1 of which are unique


In [106]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 101/101 [00:00<00:00, 17485.64it/s]


Analysis run on 41 entities from 31 out of 110 input genes


100%|██████████████████████████████████████| 101/101 [00:00<00:00, 17596.77it/s]
100%|██████████████████████████████████████| 101/101 [00:00<00:00, 17407.33it/s]


Standard method yields 0 results, 0 of which are unique


In [108]:
cell = 'SMC'
genotype = 'LMNA'

In [109]:
up = enrich.enrich_wrapper(f'{genotype}_{cell}_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(up, f'{genotype}_{cell}_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 174/174 [00:00<00:00, 17943.33it/s]


Analysis run on 102 entities from 67 out of 284 input genes


100%|██████████████████████████████████████| 174/174 [00:00<00:00, 17888.35it/s]
100%|██████████████████████████████████████| 174/174 [00:00<00:00, 16075.09it/s]


Standard method yields 0 results, 0 of which are unique


In [110]:
down = enrich.enrich_wrapper(f'{genotype}_{cell}_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
enrich.compare2standard(down, f'{genotype}_{cell}_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████████████████████████████| 207/207 [00:00<00:00, 18260.65it/s]


Analysis run on 116 entities from 94 out of 349 input genes


100%|██████████████████████████████████████| 207/207 [00:00<00:00, 18054.84it/s]
100%|██████████████████████████████████████| 207/207 [00:00<00:00, 17829.78it/s]


Standard method yields 6 results, 4 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
2,Regulation of Complement cascade - Reactome,2.201032e-07,6,11,"[C3, C1R, sset:C3 convertases, sset:Complement factor D, sset:C4 activators, sset:CD46, CR1:C4b:...",http://model.geneontology.org/R-HSA-977606,117


# Platelets in SARS-CoV-2

kanth manne 2020

https://ashpublications.org/blood/article/136/11/1317/461106/Platelet-gene-expression-and-function-in-patients

RNAseq in platelets from patients w covid vs healthy donors. Note: platelets don't have nuclei, presumably the RNA was transcribed in megakaryocytes. Platelets have a lifespan of 7-10 days, so if a patient is a 4-5 days into their infection, we could hypothesize that half of their platlets were generated post-infection, unless their destruction or genesis is affected by sars-cov-2. The data came from 6 ICU and 4 nonICU patients.

supp1 is non-ICU vs healthy donors
supp2 is ICU vs healthy donors
supp3 may be all covid-19 patients combined?

In [42]:
platelets = pd.read_csv('../test_data/unprocessed/bloodbld2020007214-suppl3.csv')
platelets_up = platelets.query('(log2FoldChange > 1 ) and padj < .05')
platelets_down = platelets.query('(log2FoldChange < -1 ) and padj < .05')
platelets_combined  = pd.concat([platelets_up, platelets_down])


In [43]:
platelets_up.gene_name.to_csv('../test_data/processed/platelets_up.csv',header=False, index=False)
platelets_down.gene_name.to_csv('../test_data/processed/platelets_down.csv',header=False, index=False)
platelets_combined.gene_name.to_csv('../test_data/processed/platelets_comb.csv',header=False, index=False)


In [47]:
platelets_upset = enrich.enrich_wrapper('platelets_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
platelets_upset

100%|██████████████████████████████████████| 482/482 [00:00<00:00, 18114.70it/s]


Analysis run on 424 entities from 365 out of 1172 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Collagen biosynthesis and modifying enzymes - Reactome,9.186593e-09,10,12,"[PPIB, P3H1, PLOD3, P4HB, sset:COLGALT1,COLGALT2, sset:Prolyl 3-hydroxylases, sset:Lysyl hydroxy...",http://model.geneontology.org/R-HSA-1650814
1,Synthesis of PE - Reactome,5.060424e-05,8,15,"[PTDSS2, PCYT2, PISD, PHOSPHO1, sset:LPIN, sset:PNPLA2/3, sset:AGPAT, sset:CHK/ETNK]",http://model.geneontology.org/R-HSA-1483213


In [125]:
enrich.enrich_wrapper('platelets_up.csv','Gene Symbol',method='standard',FDR = 0.05,fpath = '../test_data/processed/')


100%|██████████████████████████████████████| 482/482 [00:00<00:00, 17608.54it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,ER-Phagosome pathway - Reactome,4.5e-05,13,52,"[SEC61A1, PSMD13, PSMA5, PSMD11, SEC61B, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-1236974
1,Hedgehog ligand biogenesis - Reactome,4.5e-05,13,52,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, P4HB, PSMB6, SYVN1]",http://model.geneontology.org/R-HSA-5358346
2,Regulation of APC/C activators between G1/S and early anaphase - Reactome,6.9e-05,13,54,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CDC25B, PSMB7, CDK1, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-176408
3,Conversion from APC/C:Cdc20 to APC/C:Cdh1 in late anaphase - Reactome,0.000111,12,49,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CDK1, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-176407
4,Neddylation - Reactome,0.000113,14,64,"[UBE2M, PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CUL9, PSMB7, UCHL3, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-8951664
5,KEAP1-NFE2L2 pathway - Reactome,0.000126,13,57,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CSNK2B, PSMD8, PSMB5, PSMB6, PRDX2]",http://model.geneontology.org/R-HSA-9755511
6,SCF(Skp2)-mediated degradation of p27/p21 - Reactome,0.000136,12,50,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6, CKS1B]",http://model.geneontology.org/R-HSA-187577
7,The role of GTSE1 in G2/M progression after G2 checkpoint - Reactome,0.000136,12,50,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CDC25B, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-8852276
8,SCF-beta-TrCP mediated degradation of Emi1 - Reactome,0.000167,12,51,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CDK1, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-174113
9,GSK3B and BTRC:CUL1-mediated-degradation of NFE2L2 - Reactome,0.000175,11,44,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-9762114


In [46]:
platelets_combset = enrich.enrich_wrapper('platelets_up.csv','Gene Symbol',method='standard',FDR = 0.05,fpath = '../test_data/processed/')
platelets_combset

100%|████████████████████████████████████████████████████████████████████████| 482/482 [00:00<00:00, 17011.42it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,ER-Phagosome pathway - Reactome,4.7e-05,13,52,"[SEC61A1, PSMD13, PSMA5, PSMD11, SEC61B, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-1236974
1,Hedgehog ligand biogenesis - Reactome,4.7e-05,13,52,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, P4HB, PSMB6, SYVN1]",http://model.geneontology.org/R-HSA-5358346
2,Regulation of APC/C activators between G1/S and early anaphase - Reactome,7.1e-05,13,54,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CDC25B, PSMB7, CDK1, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-176408
3,Conversion from APC/C:Cdc20 to APC/C:Cdh1 in late anaphase - Reactome,0.000113,12,49,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CDK1, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-176407
4,Neddylation - Reactome,0.000116,14,64,"[UBE2M, PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CUL9, PSMB7, UCHL3, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-8951664
5,KEAP1-NFE2L2 pathway - Reactome,0.00013,13,57,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CSNK2B, PSMD8, PSMB5, PSMB6, PRDX2]",http://model.geneontology.org/R-HSA-9755511
6,SCF(Skp2)-mediated degradation of p27/p21 - Reactome,0.00014,12,50,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6, CKS1B]",http://model.geneontology.org/R-HSA-187577
7,The role of GTSE1 in G2/M progression after G2 checkpoint - Reactome,0.00014,12,50,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CDC25B, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-8852276
8,SCF-beta-TrCP mediated degradation of Emi1 - Reactome,0.000171,12,51,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CDK1, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-174113
9,GSK3B and BTRC:CUL1-mediated-degradation of NFE2L2 - Reactome,0.000179,11,44,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-9762114


In [39]:
platelets_upnc = enrich.enrich_wrapper('platelets_up.csv','Gene Symbol',method='ncHGT',FDR = 0.05,fpath = '../test_data/processed/')
platelets_upnc

100%|█████████████████| 482/482 [01:18<00:00,  6.11it/s]


Analysis run on 424 entities from 365 out of 1172 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Collagen biosynthesis and modifying enzymes - Reactome,4.680930e-07,10,12,"[PPIB, P3H1, PLOD3, P4HB, sset:Prolyl 3-hydroxylases, sset:Lysyl hydroxylases, sset:Procollagen ...",http://model.geneontology.org/R-HSA-1650814
1,Hedgehog ligand biogenesis - Reactome,4.690733e-06,13,50,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, P4HB, PSMB6, SYVN1]",http://model.geneontology.org/R-HSA-5358346
2,ER-Phagosome pathway - Reactome,5.123754e-06,13,51,"[PSMD13, PSMA5, PSMD11, SEC61B, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6, sset:SEC...",http://model.geneontology.org/R-HSA-1236974
3,Regulation of APC/C activators between G1/S and early anaphase - Reactome,7.226495e-06,13,52,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, CDK1, PSMD8, PSMB5, PSMB6, sset:CDC25]",http://model.geneontology.org/R-HSA-176408
4,Neddylation - Reactome,9.256658e-06,14,62,"[UBE2M, PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, CUL9, PSMB7, PSMD8, PSMB5, PSMB6, sse...",http://model.geneontology.org/R-HSA-8951664
...,...,...,...,...,...,...
61,APC/C:Cdh1 mediated degradation of Cdc20 and other APC/C:Cdh1 targeted proteins in late mitosis/...,7.640748e-04,11,62,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-174178
62,Separation of Sister Chromatids - Reactome,8.704360e-04,11,63,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-2467813
63,CDK-mediated phosphorylation and removal of Cdc6 - Reactome,1.441100e-03,11,65,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-69017
64,Interleukin-1 signaling - Reactome,1.727041e-03,11,64,"[PSMD13, PSMA5, PSMD11, PSMA7, PSME2, PSMD4, PSMB1, PSMB7, PSMD8, PSMB5, PSMB6]",http://model.geneontology.org/R-HSA-9020702


 PE derivatives activate platelts: https://ashpublications.org/blood/article/127/21/2618/35188/Novel-phosphatidylethanolamine-derivatives
 
platelets may produce collagen: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7928007/
- looking in the raw data, most, if not all, collagen mRNAs are unperturbed or strongly downregulated. COL1A2 has no significant change, just a mild decrease that is not significant (COL1A1 is not listed in the supplemental. Not sure if that means it was not detected.).
- this paper profiled collagen, procollagen, hydroxyproline, and procollagen mRNA in PCOS vs healthy donors (PCOS apparently has a platelet dystregulation component). They found that hydroxyproline and procollagen were increased at the protein levels, but the procollagen mRNA was not increased and was slightly decreased (not statistically significantly) in PCOS.
- activated platelets may release procollagen
- while we don't have collagen protein levels for this study, the increase in mRNA for collagen biosynthetic genes and the insignificant change in collagen mRNA are consistent with that paper.

paper from 1977 about collagen production in platelets: https://pubmed.ncbi.nlm.nih.gov/194639/
- may discuss ratio of collagen producing enzymes vs skin

platelet functions beyond clotting: https://onlinelibrary.wiley.com/doi/10.1111/j.1538-7836.2009.03586.x

In [45]:
platelets_downset = enrich.enrich_wrapper('platelets_down.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
platelets_downset

100%|██████████████| 436/436 [00:00<00:00, 17811.94it/s]


Analysis run on 310 entities from 267 out of 1088 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Role of phospholipids in phagocytosis - Reactome,1.730419e-07,11,22,"[sset:G-protein gamma subunit, sset:PLD, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:P...",http://model.geneontology.org/R-HSA-2029485
1,DAG and IP3 signaling - Reactome,1.766299e-07,10,18,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-1489509
2,PLC beta mediated events - Reactome,3.083672e-07,9,15,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-112043
3,Antigen activates B Cell Receptor (BCR) leading to generation of second messengers - Reactome,8.830625e-07,11,25,"[PIK3R1, sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:LYN, p-SYK, sset:G(q) alpha 11,...",http://model.geneontology.org/R-HSA-983695
4,G alpha (12/13) signalling events - Reactome,1.123783e-06,8,13,"[GNA13, sset:G-protein gamma subunit, sset:Ligand:GPCR complexes that activate Gi, sset:G-protei...",http://model.geneontology.org/R-HSA-416482
5,VEGFR2 mediated cell proliferation - Reactome,2.240362e-06,11,27,"[KDR, sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC be...",http://model.geneontology.org/R-HSA-5218921
6,Regulation of insulin secretion - Reactome,1.874356e-05,9,22,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-422356
7,CLEC7A (Dectin-1) induces NFAT activation - Reactome,1.874356e-05,9,22,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-5607763
8,VEGFR2 mediated vascular permeability - Reactome,6.762095e-05,8,20,"[PIK3R1, PIK3CG, sset:Activator:PI3K, sset:G-protein gamma subunit, sset:PI3K alpha, beta, gamma...",http://model.geneontology.org/R-HSA-5218920
9,FCERI mediated Ca+2 mobilization - Reactome,8.898693e-05,9,26,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-2871809


platelets release angiogenic factors: https://onlinelibrary.wiley.com/doi/10.1111/j.1538-7836.2009.03586.x
platlets and VEGF: https://pubmed.ncbi.nlm.nih.gov/11824377/#:~:text=Vascular%20Endothelial%20Growth%20Factor%20(VEGF)%20is%20one%20of%20the%20major,patients%2C%20correlated%20with%20worse%20prognosis.

In [46]:
enrich.compare2standard(platelets_upset, 'platelets_up.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 482/482 [00:00<00:00, 17127.44it/s]
100%|██████████████| 482/482 [00:00<00:00, 17303.95it/s]


Standard method yields 49 results, 48 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Collagen biosynthesis and modifying enzymes - Reactome,9.186593e-09,10,12,"[PPIB, P3H1, PLOD3, P4HB, sset:4-Hyp collagen propeptides, sset:Procollagen C-proteinases, sset:...",http://model.geneontology.org/R-HSA-1650814,65


In [47]:
temp = enrich.enrich_wrapper('platelets_up.csv','Gene Symbol',method='standard',show_significant= False,fpath = '../test_data/processed/')
temp = temp.query('title == "Collagen biosynthesis and modifying enzymes - Reactome"')
temp

100%|██████████████| 482/482 [00:00<00:00, 17258.01it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
121,Collagen biosynthesis and modifying enzymes - Reactome,0.08694,8,65,"[PPIB, P3H1, COL18A1, PLOD3, P3H2, BMP1, P4HB, COLGALT1]",http://model.geneontology.org/R-HSA-1650814


Listing the entities involved in collagen production that are upregulated:

In [48]:
platelets_upset.loc[0,:]['shared entities in gocam']

['PPIB',
 'P3H1',
 'PLOD3',
 'P4HB',
 'sset:4-Hyp collagen propeptides',
 'sset:Procollagen C-proteinases',
 'sset:COLGALT1,COLGALT2',
 'sset:Procollagen N-proteinases',
 'sset:Prolyl 3-hydroxylases',
 'sset:Lysyl hydroxylases']

In [49]:
setID2members = utils.csv2dict('../data/setID2members.csv')
len(setID2members.get('sset:4-Hyp collagen propeptides'))
setID2members.get('sset:4-Hyp collagen propeptides')

['P02458',
 'Q9BXS0',
 'Q8IZC6',
 'P12111',
 'P08123',
 'Q02388',
 'P12110',
 'Q9UMD9',
 'P39060',
 'Q07092',
 'P02452',
 'Q96A83',
 'Q14993',
 'P39059',
 'Q01955',
 'Q8NFW1',
 'Q86Y22',
 'Q17RW2',
 'P53420',
 'P12107',
 'Q03692',
 'P05997',
 'P25940',
 'A8TX70',
 'P20908',
 'P13942',
 'P12109',
 'Q14031',
 'P20849',
 'Q05707',
 'P02462',
 'Q96P44',
 'Q2UY09',
 'P02461',
 'Q9P218',
 'P08572',
 'P29400',
 'P27658',
 'Q14055',
 'Q5TAT6',
 'A6NMZ7',
 'Q99715',
 'P25067',
 'Q14050']

Here, we see that the GOCAM model size is inflated mainly by the set of '4-Hydroxyproline collagen propeptides', which represents 44 collagen genes. This is a clear case in which a weighted enrichment isn't representative of activity, because only the genes of one type of collagen need to be expressed at a time in order for collagen synthesis to proceed. Furthermore, the model size is clearly inflated by the standard gene list method.

Collagen modifying enzymes have been shown to be expressed in platelets, although their activity against mature collagen triple helices in the extracellular space was not observed, suggesting that platelets do not modify extracellular collagen triple helices produced by other cells (1977 paper). Recently, collagen 1 synthesis in activated platelets has been demonstrated. Platelets from patients with PCOS, an endocrine syndrome with low grade inflammation and platelet involvement|, were shown to contain higher levels of hydroxyproline than inactive platelets, COL1A1 mRNA was detected, and collagen was detected at higher levels in platlets from PCOS patients. To demonstrate that the detected collagen originated from within the platelets themselves as opposed to contamination, the authors showed that collagen levels increased when platelets were activated with thrombin and coated the platlets, and in the absence of activation, collagen was detected only after membrane permeablization with triton X-100. They note that procollagen mRNA was not significantly different between the healthy donors' and the PCOS patients' platelets, yet collagen levels were different, and they speculate that this may be due to increased activity of enzymes in the collagen synthesis and modification pathway. 

While "Utah et al" did not assay collagen levels, mRNA levels were not increased for collagen I, yet mRNA levels for enzymes at every step of the pathway were increased, consistent with their findings. While collagen I levels in Covid patients would need to be assayed to confirm this and the significance of this to increased platelet activity in Covid-19 remains unknown, collagen I is a known activator of platelets, and accounting for sets allowed us to generate a hypothesis that would be missed by the standard gene list method. 

In [50]:
enrich.compare2standard(platelets_downset, 'platelets_down.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 436/436 [00:00<00:00, 16957.21it/s]
100%|██████████████| 436/436 [00:00<00:00, 18091.95it/s]


Standard method yields 6 results, 6 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url,gene list size
0,Role of phospholipids in phagocytosis - Reactome,1.730419e-07,11,22,"[sset:G-protein gamma subunit, sset:PLD, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:P...",http://model.geneontology.org/R-HSA-2029485,52
1,DAG and IP3 signaling - Reactome,1.766299e-07,10,18,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-1489509,45
2,PLC beta mediated events - Reactome,3.083672e-07,9,15,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-112043,39
3,Antigen activates B Cell Receptor (BCR) leading to generation of second messengers - Reactome,8.830625e-07,11,25,"[PIK3R1, sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:LYN, p-SYK, sset:G(q) alpha 11,...",http://model.geneontology.org/R-HSA-983695,145
4,G alpha (12/13) signalling events - Reactome,1.123783e-06,8,13,"[GNA13, sset:G-protein gamma subunit, sset:Ligand:GPCR complexes that activate Gi, sset:G-protei...",http://model.geneontology.org/R-HSA-416482,268
5,VEGFR2 mediated cell proliferation - Reactome,2.240362e-06,11,27,"[KDR, sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC be...",http://model.geneontology.org/R-HSA-5218921,55
6,Regulation of insulin secretion - Reactome,1.874356e-05,9,22,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-422356,51
7,CLEC7A (Dectin-1) induces NFAT activation - Reactome,1.874356e-05,9,22,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-5607763,47
8,VEGFR2 mediated vascular permeability - Reactome,6.762095e-05,8,20,"[PIK3R1, PIK3CG, sset:Activator:PI3K, sset:G-protein gamma subunit, sset:PI3K alpha, beta, gamma...",http://model.geneontology.org/R-HSA-5218920,117
9,FCERI mediated Ca+2 mobilization - Reactome,8.898693e-05,9,26,"[sset:G-protein gamma subunit, sset:PL(C)D4:3xCa2+, sset:G(q) alpha 11,14,15,Q, sset:PLC beta1,2...",http://model.geneontology.org/R-HSA-2871809,52


In [127]:
temp = enrich.enrich_wrapper('platelets_down.csv','Gene Symbol',method='standard',show_significant= False,fpath = '../test_data/processed/')
temp = temp.query('title == "Synthesis of very long-chain fatty acyl-CoAs - Reactome"')
temp

100%|██████████████████████████████████████| 436/436 [00:00<00:00, 17975.65it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
19,Synthesis of very long-chain fatty acyl-CoAs - Reactome,0.006681,5,24,"[HSD17B3, ACSL4, ELOVL7, HACD4, ACSBG1]",http://model.geneontology.org/R-HSA-75876


# Astrocytes in Aging Mouse Brains across 4 brain regions

Boisvert 2018

https://www.sciencedirect.com/science/article/pii/S221112471731848X?via%3Dihub#app3

RNA seq on astrocytes isolated from different brain regions in 4 month and 2 year old mice. Most brain regions had ~200 up and ~200 downregulated genes, but most of those were also differentially expressed between brain regions.

per fig 2B, it appears that cerebellum and hypothalamus are the best candidates, as the 3 biological replicates (3 mice) cluster together by region and age. The hypothalamus looks a little cleaner than cerebellum.

Fig 4F shows complement cascade upregulation in astrocytes

## Preprocess

In [51]:
xls = pd.ExcelFile('../test_data/unprocessed/1-s2.0-S221112471731848X-mmc5.xlsx')

for region in ['HTH','CB']:
    
    df = pd.concat([
        pd.read_excel(xls,f'{region} up aged', header = 0),
        pd.read_excel(xls,f'{region} down aged', header = 0)
    ])
    #df = df.query(f'(`4mo {region} FPKM` >= 1 or `2yo {region} FPKM` >= 1) and `Adjusted P-value` < .05')
    #df = df.query(f'`4mo astro/input` > 0.75 or `2yo astro/input` > 0.75')
    #df = df.query(f'(`2yo/4mo FC` > 2 or `2yo/4mo FC` < 0.5)') #the paper didnt do this
    df['Gene'].to_csv(f'../test_data/processed/astro_{region}_comb.csv', header = False, index= False)
    print(region,':',len(df), 'genes')

HTH : 416 genes
CB : 605 genes


In [49]:
xls = pd.ExcelFile('../test_data/unprocessed/1-s2.0-S221112471731848X-mmc5.xlsx')

for region in ['HTH','CB']:

    df_up = pd.read_excel(xls,f'{region} up aged', header = 0)
    df_down = pd.read_excel(xls,f'{region} down aged', header = 0)


    for sign, df in {'up':df_up,'down':df_down}.items():
        df['Gene'].to_csv(f'../test_data/processed/astro_{region}_{sign}.csv', header = False, index= False)
        print(region,sign,':',len(df), 'genes')

HTH up : 130 genes
HTH down : 286 genes
CB up : 415 genes
CB down : 190 genes


The paper says they filtered based on "FPKM > 1, Adj P value < 0.05, and pulldown/input > 0.75"

In [52]:
df

Unnamed: 0,Gene,Transcript,2yo astro/input,4mo CB FPKM,2yo CB FPKM,2yo/4mo FC,P-value,Adjusted P-value,4mo astro/input
0,Serpina3m,NM_009253,79.851852,0.005000,1.437333,287.466667,3.855396e-08,5.966835e-06,
1,Lcn2,NM_008491,1.415182,0.068333,2.144000,31.375610,7.779326e-15,3.804557e-12,
2,Serpina3n,NM_009252,14.156755,1.866000,46.179333,24.747767,1.556746e-22,2.239242e-19,
3,C3,NM_009778,4.773523,0.269000,4.363000,16.219331,4.146451e-42,5.069659e-38,
4,S1pr3,NM_010101,2.852315,0.246000,2.279000,9.264228,7.685823e-33,3.132357e-29,
...,...,...,...,...,...,...,...,...,...
185,Wdtc1,NM_199306,,16.230667,12.123333,0.746940,2.272928e-03,4.298523e-02,0.764228
186,Acvr1,NM_007394,,10.592000,7.937000,0.749339,2.086076e-03,4.058082e-02,1.021802
187,Dnajb4,NM_025926,,26.123333,19.617333,0.750951,2.555709e-03,4.674252e-02,0.866331
188,Eml4,NM_001114361,,5.125667,3.898333,0.760551,2.632956e-03,4.779784e-02,0.831144


## Hypothalamus

In [53]:
astro_HTH_set = enrich.enrich_wrapper('astro_CB_up.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
astro_HTH_set

100%|████████████████████████████████████| 254/254 [00:00<00:00, 16479.55it/s]


Analysis run on 149 entities from 127 out of 415 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


In [54]:
enrich.compare2standard(astro_HTH_set, 'astro_HTH_comb.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 246/246 [00:00<00:00, 18085.87it/s]
100%|██████████████| 246/246 [00:00<00:00, 17874.38it/s]


Standard method yields 6 results, 5 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


**Repeat for ncHGT:**

In [55]:
astro_HTH_nc = enrich.enrich_wrapper('astro_HTH_comb.csv','Gene Symbol',method='ncHGT',FDR = 0.05,fpath = '../test_data/processed/')
astro_HTH_nc

100%|█████████████████| 246/246 [00:30<00:00,  8.02it/s]


Analysis run on 140 entities from 115 out of 416 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Cholesterol biosynthesis - Reactome,2.074783e-11,10,22,"[HMGCS1, HSD17B7, SQLE, MSMO1, DHCR24, MVD, ACLY, FDFT1, LSS, CYP51A1]",http://model.geneontology.org/R-HSA-191273


In [56]:
enrich.compare2standard(astro_HTH_nc, 'astro_HTH_comb.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 246/246 [00:00<00:00, 17863.55it/s]
100%|██████████████| 246/246 [00:00<00:00, 17964.01it/s]


Standard method yields 6 results, 5 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


## Cerebellum

In [57]:
astro_CB_set = enrich.enrich_wrapper('astro_CB_comb.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
astro_CB_set

100%|██████████████| 310/310 [00:00<00:00, 18410.66it/s]


Analysis run on 201 entities from 172 out of 605 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


In [58]:
astro_CB_nc = enrich.enrich_wrapper('astro_CB_comb.csv','Gene Symbol',method='ncHGT',FDR = 0.05,fpath = '../test_data/processed/')
astro_CB_nc

100%|█████████████████| 310/310 [00:28<00:00, 10.81it/s]


Analysis run on 201 entities from 172 out of 605 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


In [59]:
enrich.compare2standard(astro_CB_set, 'astro_CB_comb.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 310/310 [00:00<00:00, 15965.35it/s]
100%|██████████████| 310/310 [00:00<00:00, 17978.41it/s]


Standard method yields 0 results, 0 of which are unique


In [60]:
enrich.compare2standard(astro_CB_nc, 'astro_CB_comb.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 310/310 [00:00<00:00, 18135.89it/s]
100%|██████████████| 310/310 [00:00<00:00, 16357.21it/s]


Standard method yields 0 results, 0 of which are unique


In [61]:
enrich.enrich_wrapper('astro_CB_comb.csv','Gene Symbol',method='standard',show_significant = False,fpath = '../test_data/processed').query('title == "Activation of NF-kappaB in B cells - Reactome"')

100%|██████████████| 310/310 [00:00<00:00, 17883.94it/s]


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
110,Activation of NF-kappaB in B cells - Reactome,0.127697,4,58,"[PSMB8, PSMB9, PSMB10, PSME1]",http://model.geneontology.org/R-HSA-1169091


# Macrophage activation

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6543837/pdf/fimmu-10-01084.pdf

In vitro classically activated vs alternatively activated

In [62]:
mac = pd.read_excel('../test_data/unprocessed/orecchioni_table1.xlsx', header = 4)[['Gene.symbol','logFC','adj.P.Val']]
mac = mac.query(f'(logFC > 1 or logFC < -1) and `adj.P.Val` < 0.05')
mac['Gene.symbol'].to_csv('../test_data/processed/mac_comb.csv', header = False, index= False)
mac

Unnamed: 0,Gene.symbol,logFC,adj.P.Val
0,Arg1,8.731029,2.028913e-10
1,Mgl2,8.591388,6.685017e-11
2,Tmem26,7.926308,6.278790e-11
3,Rnase2a,7.748465,2.028913e-10
4,Mrc1,7.575531,7.028655e-10
...,...,...,...
4723,Cd38,-6.954007,1.505552e-07
4724,Lcn2,-7.370210,1.505552e-07
4725,Cxcl9,-7.388341,7.028655e-10
4726,Fpr2,-7.506493,1.527622e-09


In [63]:
mac_comb_set = enrich.enrich_wrapper('mac_comb.csv','Gene Symbol',method='set',FDR = 0.05,fpath = '../test_data/processed/')
mac_comb_set

100%|██████████████| 667/667 [00:00<00:00, 17523.45it/s]


Analysis run on 620 entities from 511 out of 1519 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Interleukin-35 Signalling - Reactome,5.8e-05,7,9,"[IL6ST, JAK2, STAT4, EBI3, STAT1, IL12A, sset:JAK1, JAK2, (TYK2)]",http://model.geneontology.org/R-HSA-8984722
1,Resolution of Sister Chromatid Cohesion - Reactome,5.9e-05,30,95,"[BIRC5, CDC20, KIF2C, CENPF, CENPA, NUF2, ZWILCH, AURKB, CENPN, SPC24, KNTC1, CENPK, PLK1, SKA1,...",http://model.geneontology.org/R-HSA-2500257
2,Unwinding of DNA - Reactome,8.5e-05,6,7,"[MCM5, MCM7, MCM2, MCM3, MCM4, MCM6]",http://model.geneontology.org/R-HSA-176974


In [64]:
enrich.compare2standard(mac_comb_set, 'mac_comb.csv','Gene Symbol',FDR = 0.05)

100%|██████████████| 667/667 [00:00<00:00, 16929.40it/s]
100%|██████████████| 667/667 [00:00<00:00, 17421.20it/s]


Standard method yields 3 results, 0 of which are unique


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url


In [65]:
mac_comb_nc = enrich.enrich_wrapper('mac_comb.csv','Gene Symbol',method='ncHGT',FDR = 0.05,fpath = '../test_data/processed/')
mac_comb_nc

100%|█████████████████| 667/667 [03:17<00:00,  3.38it/s]


Analysis run on 620 entities from 511 out of 1519 input genes


Unnamed: 0,title,pval (uncorrected),# entities in list,#entities in model,shared entities in gocam,url
0,Resolution of Sister Chromatid Cohesion - Reactome,2.795107e-11,30,95,"[BIRC5, CDC20, KIF2C, CENPF, CENPA, NUF2, ZWILCH, AURKB, CENPN, SPC24, KNTC1, CENPK, PLK1, SKA1,...",http://model.geneontology.org/R-HSA-2500257
1,Interleukin-35 Signalling - Reactome,1.562032e-05,7,9,"[IL6ST, JAK2, STAT4, EBI3, STAT1, IL12A, sset:JAK1, JAK2, (TYK2)]",http://model.geneontology.org/R-HSA-8984722
2,Unwinding of DNA - Reactome,2.943151e-05,6,7,"[MCM5, MCM7, MCM2, MCM3, MCM4, MCM6]",http://model.geneontology.org/R-HSA-176974
