# Evaluating PGRMC1 ssGSEA differential Expression

Data: TCGA Breast Cancer 83 Samples
Proteomics Expression, Phosphorylation TRAQi

## Gene cluster for single sample GSEA analysis
#### Method: Differantial Expression Correlation
Cluster PGRMC1 correlations based on Pearson and BH adjusted p-values > 0.05

1. PGRMC1 - Proteomics
2. PGRMC1 - Phosphorylation data averaged per Gene

Cluster correlated on PGRMC1 Phosphorylation Sites S181s, S57t

3. PGRMC1 S181s - Phosphorylation data averaged per Gene (compare results: Corr per Phosphorylation Sites per Gene)
4. PGRMC1 S57t - Phosphorylation data averaged per Gene (compare results: Corr per Phosphorylation Sites per Gene)

In [2]:
import pandas as pd
import numpy as np
import helper_functions as helper

In [22]:
# PGRMC1 - Protein Genes
cor_prot_genes = pd.read_csv("output/PGRMC1_Protein_correlating_Genes.csv")
# PGRMC1 - Phosphorylation data averaged per Gene
cor_agg_phos_genes = pd.read_csv("output/Pearson_above_0_8_agg_Phosphor.csv")
# PGRMC1_Psides - All Psides
cor_p_sides = pd.read_csv("output/PGRMC1_Phosphosite_correlating_Genes.csv")

In [28]:
def filter_correlating_genes(df: pd.DataFrame, columns: dict, pearson: float=0.8, p_value:float=0.05):
    target_col = columns["target"]
    pears_col = columns["pearson"]
    pval_col = columns["pval"]
    out = []
    for index, row in df.iterrows():
        if abs(row[pears_col]) >= pearson and row[pval_col] <= p_value:
            out.append(row[target_col])
    return np.unique(out)



In [32]:
prot_genes = filter_correlating_genes(cor_prot_genes, {"target":"geneName", "pearson":"Pears_PGRMC1", "pval":"pValue"}, pearson=0.5)
phos_genes = filter_correlating_genes(cor_agg_phos_genes, {"target":"geneName", "pearson":"Pearson", "pval":"Pval"}, pearson=0.8)
S181_genes = filter_correlating_genes(cor_p_sides, {"target":"geneName", "pearson":"Pears_PGRMC1_S181", "pval":"pValue_PGRMC1_S181"}, pearson=0.8)
S57_genes = filter_correlating_genes(cor_p_sides, {"target":"geneName", "pearson":"Pears_PGRMC1_S57", "pval":"pValue_PGRMC1_S57"}, pearson=0.72)

print(len(prot_genes), len(phos_genes), len(S181_genes), len(S57_genes))

gene_cluster = {"prot":prot_genes,
                "phos":phos_genes,
                "S181":S181_genes,
                "S57":S57_genes}




251 343 305 226


In [36]:
# find intersections
inter_phos_prot = np.intersect1d(gene_cluster["phos"], gene_cluster["prot"], assume_unique=True)
print("Intersections Prot-Phos", inter_phos_prot)

inter_phos_prot = np.intersect1d(gene_cluster["phos"], gene_cluster["S181"], assume_unique=True)
print("Intersections Phos-S181", len(inter_phos_prot))

inter_phos_prot = np.intersect1d(gene_cluster["phos"], gene_cluster["S57"], assume_unique=True)
print("Intersections Phos-S57", len(inter_phos_prot))

inter_phos_prot = np.intersect1d(gene_cluster["S181"], gene_cluster["S57"], assume_unique=True)
print("Intersections S57-S181", len(inter_phos_prot))

inter_phos_prot = np.intersect1d(gene_cluster["prot"], gene_cluster["S181"], assume_unique=True)
print("Intersections Prot-S181", inter_phos_prot)

inter_phos_prot = np.intersect1d(gene_cluster["prot"], gene_cluster["S57"], assume_unique=True)
print("Intersections Prot-S57", inter_phos_prot)


Intersections Prot-Phos ['ATL3' 'EIF3A' 'MTDH' 'PAK4' 'RAB12' 'SETX' 'SMARCA4' 'SMG1' 'STIM1'
 'TCOF1' 'TOP2B' 'UBR4']
Intersections Phos-S181 224
Intersections Phos-S57 132
Intersections S57-S181 122
Intersections Prot-S181 ['ATL3' 'HUWE1' 'MTDH' 'RAB12' 'SETX' 'SF3B1' 'SMARCA4' 'SMG1' 'TOP2B'
 'UBR4']
Intersections Prot-S57 ['BAT2L2' 'HUWE1' 'MTDH' 'RAB12' 'SF3B1']
