# DIRAC Analyses of LC M001 Proteomics and Related Transcriptomics — Comparisons in Common GOBP Modules

***by Kengo Watanabe***  

The differential rank conservation (DIRAC; Eddy, J.A. et al. PLoS Comput. Biol. 2010) analyses were performed on (1) the Longevity Consortium (LC) M001 proteomics dataset using a priori module set (Gene Ontology Biological Process (GOBP) derived by EMBL-EBI QuickGO API; ≥4 analytes and ≥50% coverage) and (2) the LC M001-related transcriptomics dataset (Tyshkovskiy, A. et al. Cell Metab. 2019) using a priori module set (GOBP derived by R org.Mm.eg.db package; ≥4 analytes and ≥50% coverage).  
–> In this notebook, these results are further compared between the datasets.  
> To maintain the consistency with the other DIRAC analyses, statistical tests are performed in a different notebook with R kernel.  

Input:  
* Cleaned module metadata (proteomics): 220520_LCproteomics-M001-DIRAC-ver6-4_DIRAC-GOBP_module-metadata.tsv  
* Cleaned module metadata (transcriptomics): 220522_LC-M001-related-transcriptomics-DIRAC_DIRAC-GOBP_ver2-4_module-metadata.tsv  
* Sample–mouse metadata (proteomics): 210126_LCprotomics-M001-DIRAC-ver6_preprocessing_metadata-sample.tsv  
* Sample–mouse metadata (transcriptomics): 201221_LC-M001-related-transcriptomics-DIRAC_metadata.tsv  
* Combined table of DIRAC RMSs (proteomics): 210127_LCproteomics-M001-DIRAC-ver6_DIRAC-GOBP_QuickGO-GOBP_min-n4-cov50_RankMatchingScore-BS.tsv  
* Combined table of DIRAC RCIs (proteomics): 210127_LCproteomics-M001-DIRAC-ver6_DIRAC-GOBP_QuickGO-GOBP_min-n4-cov50_RankConservationIndex-BS.tsv  
* Tables of DIRAC RMSs (transcriptomics): 210429_LC-M001-related-transcriptomics-DIRAC_DIRAC-GOBP_ver2_orgMmegdb-GOBP_min-n4-cov50_RankMatchingScore-BS-[digit].tsv  
* Tables of DIRAC RCIs (transcriptomics): 210429_LC-M001-related-transcriptomics-DIRAC_DIRAC-GOBP_ver2_orgMmegdb-GOBP_min-n4-cov50_RankConservationIndex-BS-[digit].tsv  
* Statistical test summary: 220525_LC-M001-DIRAC-prot-vs-txn_StatisticalTest-GOBP_ver2-2_inter-group-comparison.xlsx (Supplementary Data 6)  

Output:  
* Combined module metadata, which is incorporated into Supplementary Data 6 in R sub-notebook  
* Combined sample–mouse metadata, which is used in R sub-notebook  
* Cleaned tables of DIRAC measures, which are used in statistical analysis (R sub-notebook)  
* Figure 5b–d  

Original notebook (memo for my future tracing):  
* dalek:[JupyterLab HOME]/220523_LC-M001-DIRAC-prot-vs-txn/220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2.ipynb  

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
#For Arial font
#!conda install -c conda-forge -y mscorefonts
##-> The below was also needed in matplotlib 3.4.2
#import shutil
#import matplotlib
#shutil.rmtree(matplotlib.get_cachedir())
import warnings
warnings.filterwarnings('ignore')
from IPython.display import display
import time

from decimal import Decimal, ROUND_HALF_UP
import re
import matplotlib.patches as mpatches
#!pip install venn
from venn import venn
#!conda install -c conda-forge -y matplotlib-venn
from matplotlib_venn import venn3, venn3_circles, venn2, venn2_circles
from mpl_toolkits.axes_grid1 import make_axes_locatable
from matplotlib.offsetbox import AnchoredText
from textwrap import wrap

!conda list

## 1. Prepare metadata and DIRAC results

### 1-1. Module metadata

In [None]:
#Import module metadata
fileDir = '../210126_LCproteomics-M001-DIRAC-ver6/ExportData/'
ipynbName = '220520_LCproteomics-M001-DIRAC-ver6-4_DIRAC-GOBP_'
fileName = 'module-metadata.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('ModuleID')
print('Proteomics:', len(tempDF1))

fileDir = '../201221_LC-M001-related-transcriptomics-DIRAC/ExportData/'
ipynbName = '220522_LC-M001-related-transcriptomics-DIRAC_DIRAC-GOBP_ver2-4_'
fileName = 'module-metadata.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('ModuleID')
print('Transcriptomics:', len(tempDF2))

#Retrieve common modules
tempL = ['ModuleName', 'ModuleType']
tempDF = tempDF1.loc[tempDF1.index.isin(tempDF2.index.tolist()), tempL]
tempDF1 = tempDF1.drop(columns=tempL)
tempDF1.columns = 'Prot_'+tempDF1.columns
tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF2 = tempDF2.drop(columns=tempL)
tempDF2.columns = 'Tran_'+tempDF2.columns
tempDF = pd.merge(tempDF, tempDF2, left_index=True, right_index=True, how='left')
print('Common:', len(tempDF))
display(tempDF)

#Save for using in the sub-notebook
fileDir = './ExportData/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
fileName = 'module-metadata.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

moduleDF = tempDF

### 1-2. Sample–mouse metadata

In [None]:
#Import sample-mouse metadata
fileDir = '../210126_LCproteomics-M001-DIRAC-ver6/ExportData/'
ipynbName = '210126_LCprotomics-M001-DIRAC-ver6_preprocessing_'
fileName = 'metadata-sample.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF1 = tempDF1.rename(columns={'MouseID':'SampleID', 'Treatment':'Intervention'})
tempDF1 = tempDF1.set_index('SampleID')
tempDF1['Age'] = '12m'
tempDF1['Dataset'] = 'Proteomics'

fileDir = '../201221_LC-M001-related-transcriptomics-DIRAC/ExportData/'
ipynbName = '201221_LC-M001-related-transcriptomics-DIRAC_'
fileName = 'metadata.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF2 = tempDF2.rename(columns={'ID':'SampleID'})
tempDF2 = tempDF2.set_index('SampleID')
tempD = {'-':'Control', 'Acarbose':'Acarbose', 'Rapamycin':'Rapamycin', 'CR':'Calorie restriction'}
tempDF2['Intervention'] = tempDF2['Intervention'].map(tempD)
tempD = {6:'6m', 12:'12m'}
tempDF2['Age'] = tempDF2['Age'].map(tempD)
tempDF2['Dataset'] = 'Transcriptomics'

#Take only the groups to be comparaed
tempL1 = ['Control', 'Acarbose', 'Rapamycin']
tempL2 = ['Dataset', 'Intervention', 'Sex', 'Age']
tempDF1 = tempDF1.loc[tempDF1['Intervention'].isin(tempL1), tempL2]
tempDF2 = tempDF2.loc[tempDF2['Intervention'].isin(tempL1), tempL2]
tempDF = pd.concat([tempDF1, tempDF2], axis=0)
tempDF['Phenotype'] = tempDF['Intervention']#For DIRAC template (rank consensus)
tempDF['Group'] = tempDF['Dataset'].str.slice(start=0, stop=1)+'-'+tempDF['Phenotype']

display(tempDF)
display(tempDF['Group'].value_counts())

#Save for using in the sub-notebook
fileDir = './ExportData/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
fileName = 'sample-metadata.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

sampleDF = tempDF

### 1-3. DIRAC results (sex/age-pooled rank consensus)

In [None]:
#Import DIRAC results
fileDir = '../210126_LCproteomics-M001-DIRAC-ver6/ExportData/'
ipynbName = '210127_LCproteomics-M001-DIRAC-ver6_DIRAC-GOBP_'
fileName = 'QuickGO-GOBP_min-n4-cov50_RankMatchingScore-BS.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF1 = tempDF1.rename(columns={'NetworkID':'ModuleID'})
fileName = 'QuickGO-GOBP_min-n4-cov50_RankConservationIndex-BS.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF2 = tempDF2.rename(columns={'NetworkID':'ModuleID'})

#Take only the modules and samples/groups to be compared
print('Original RMS:', tempDF1.shape)
tempDF1 = tempDF1.loc[tempDF1['ModuleID'].isin(moduleDF.index.tolist())]
tempDF1 = tempDF1.loc[tempDF1['Template'].isin(sampleDF['Phenotype'].unique())]
tempDF1 = tempDF1.set_index(['ModuleID', 'Template'])
tempDF1 = tempDF1.loc[:, tempDF1.columns.isin(sampleDF.index.tolist())]
tempDF1 = tempDF1.reset_index()
print(' -> After:', tempDF1.shape)
print('Original RCI:', tempDF2.shape)
tempDF2 = tempDF2.loc[tempDF2['ModuleID'].isin(moduleDF.index.tolist())]
tempDF2 = tempDF2.loc[tempDF2['Template'].isin(sampleDF['Phenotype'].unique())]
tempDF2 = tempDF2.set_index(['ModuleID', 'Template'])
tempDF2 = tempDF2.loc[:, tempDF2.columns.isin(sampleDF['Phenotype'].unique())]
tempDF2 = tempDF2.reset_index()
print(' -> After:', tempDF2.shape)

#Clean to handle across data
dataset = 'Proteomics'
tempDF = sampleDF.loc[sampleDF['Dataset']==dataset]
tempDF = tempDF.reset_index()[['Phenotype', 'Group']].drop_duplicates(keep='first')
tempD = {}
for row_i in range(len(tempDF)):
    key = tempDF['Phenotype'].iloc[row_i]
    value = tempDF['Group'].iloc[row_i]
    tempD[key] = value
tempDF1['Template'] = tempDF1['Template'].map(tempD)
tempDF2['Template'] = tempDF2['Template'].map(tempD)
tempDF2 = tempDF2.set_index(['ModuleID', 'Template'])
tempDF2.columns = tempDF2.columns.map(tempD)
tempDF2 = tempDF2.reset_index()

display(tempDF1)
display(tempDF2)

#Save for using in the sub-notebook
fileDir = './ExportData/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
fileName = dataset.lower()+'-RMS.tsv'
tempDF1.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')
fileName = dataset.lower()+'-RCI.tsv'
tempDF2.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')

rmsDF_prot = tempDF1
rciDF_prot = tempDF2

In [None]:
#Import DIRAC results
fileDir = '../201221_LC-M001-related-transcriptomics-DIRAC/ExportData/'
ipynbName = '210429_LC-M001-related-transcriptomics-DIRAC_DIRAC-GOBP_ver2_'
nSub = 25
tempDF1 = pd.DataFrame()
tempDF2 = pd.DataFrame()
for list_i in range(nSub):
    fileName = 'orgMmegdb-GOBP_min-n4-cov50_RankMatchingScore-BS-'+str(list_i+1).zfill(2)+'.tsv'
    tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
    tempDF = tempDF.rename(columns={'NetworkID':'ModuleID'})
    tempDF1 = pd.concat([tempDF1, tempDF], axis=0, ignore_index=True)
    
    fileName = 'orgMmegdb-GOBP_min-n4-cov50_RankConservationIndex-BS-'+str(list_i+1).zfill(2)+'.tsv'
    tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
    tempDF = tempDF.rename(columns={'NetworkID':'ModuleID'})
    tempDF2 = pd.concat([tempDF2, tempDF], axis=0, ignore_index=True)

#Take only the modules and samples/groups to be compared
print('Original RMS:', tempDF1.shape)
tempDF1 = tempDF1.loc[tempDF1['ModuleID'].isin(moduleDF.index.tolist())]
tempDF1 = tempDF1.loc[tempDF1['Template'].isin(sampleDF['Phenotype'].unique())]
tempDF1 = tempDF1.set_index(['ModuleID', 'Template'])
tempDF1 = tempDF1.loc[:, tempDF1.columns.isin(sampleDF.index.tolist())]
tempDF1 = tempDF1.reset_index()
print(' -> After:', tempDF1.shape)
print('Original RCI:', tempDF2.shape)
tempDF2 = tempDF2.loc[tempDF2['ModuleID'].isin(moduleDF.index.tolist())]
tempDF2 = tempDF2.loc[tempDF2['Template'].isin(sampleDF['Phenotype'].unique())]
tempDF2 = tempDF2.set_index(['ModuleID', 'Template'])
tempDF2 = tempDF2.loc[:, tempDF2.columns.isin(sampleDF['Phenotype'].unique())]
tempDF2 = tempDF2.reset_index()
print(' -> After:', tempDF2.shape)

#Clean to handle across data
dataset = 'Transcriptomics'
tempDF = sampleDF.loc[sampleDF['Dataset']==dataset]
tempDF = tempDF.reset_index()[['Phenotype', 'Group']].drop_duplicates(keep='first')
tempD = {}
for row_i in range(len(tempDF)):
    key = tempDF['Phenotype'].iloc[row_i]
    value = tempDF['Group'].iloc[row_i]
    tempD[key] = value
tempDF1['Template'] = tempDF1['Template'].map(tempD)
tempDF2['Template'] = tempDF2['Template'].map(tempD)
tempDF2 = tempDF2.set_index(['ModuleID', 'Template'])
tempDF2.columns = tempDF2.columns.map(tempD)
tempDF2 = tempDF2.reset_index()

display(tempDF1)
display(tempDF2)

#Save for using in the sub-notebook
fileDir = './ExportData/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
fileName = dataset.lower()+'-RMS.tsv'
tempDF1.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')
fileName = dataset.lower()+'-RCI.tsv'
tempDF2.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')

rmsDF_txn = tempDF1
rciDF_txn = tempDF2

## 2. Rank conservation index: general pattern

> This would NOT be used for the manuscript; hence, statistical test is skipped in this version.  

In [None]:
#Extract RCI whose template phenotype corresponds to the own phenotype
rciDF_kk = pd.DataFrame(index=moduleDF.index)
for tempDF in [rciDF_prot, rciDF_txn]:
    phenotypeL = tempDF.drop(columns=['ModuleID', 'Template']).columns.tolist()
    tempDF = tempDF.set_index('ModuleID')
    for k in phenotypeL:
        tempS = tempDF[k].loc[tempDF['Template']==k]
        rciDF_kk = pd.merge(rciDF_kk, tempS, left_index=True, right_index=True, how='left')

#Order
tempL1 = ['Control', 'Acarbose', 'Rapamycin']
tempL2 = ['Proteomics', 'Transcriptomics']
tempL = [dataset[0]+'-'+intervention for dataset in tempL2 for intervention in tempL1]
rciDF_kk = rciDF_kk[tempL]
display(rciDF_kk)
display(rciDF_kk.describe())

In [None]:
#Visualization (simple version for checking)
tempDF = rciDF_kk.reset_index().melt(var_name='Group', value_name='RCI', id_vars='ModuleID')
tempD = {'P-Control':'tab:blue', 'P-Acarbose':'tab:red', 'P-Rapamycin':'tab:purple',
         'T-Control':plt.get_cmap('tab20')(1), 'T-Acarbose':plt.get_cmap('tab20')(7),
         'T-Rapamycin':plt.get_cmap('tab20')(9)}
sns.set(style='ticks', font='Arial', context='talk')
plt.figure(figsize=(3, 4))
p = sns.boxplot(data=tempDF, y='RCI', x='Group', order=list(tempD.keys()), palette=tempD, dodge=False,
                showfliers=True, flierprops={'marker':'o', 'markerfacecolor':'gray', 'alpha':0.4},
                showcaps=True, notch=True)
p.set(ylim=(0.49, 1.01), yticks=np.arange(0.5, 1.01, 0.1))
sns.despine()
plt.xlabel('')
plt.ylabel('Module RCI')
plt.xticks(rotation=70, horizontalalignment='right', verticalalignment='center', rotation_mode='anchor')
plt.show()

## 3. Rank conservation index: inter-group module comparison

> Test specific hypothesis: control RCI == intervention RCI (i.e., inter-group module comparison).  
> 1. Testing the main effect of intervention on RMSs for each module using ANOVA model  
> 2. Then, performing post-hoc comparisons of RMSs between control vs. each intervention using the repeated Student's t-tests  
>  
> Basically, statistical strategy is same with the one used in each dataset analysis. Because RMS/RCI was not normalized (i.e., the expected mean and variance could be different between datasets due to different number of mapped analytes), dataset and its interaction term are NOT included in ANOVA model; instead, ANOVA model is generated per dataset. The p-value adjustment is performed in a conservative manner: the P-values in ANOVA tests are adjusted across all models (= modules x datasets), and those in post-hoc tests are adjusted across datasets only within the module (not across modules). Not Dunnett's test but Student's t-test (i.e., t-test with pooled variance) is used as the post-hoc test because further adjustment of the Dunnett's test p-values with the Holm-Bonferroni method is too much (incorrect) adjustment for family-wise error rate (FWER).  

### 3-1. Extract RMS under the own phenotype consensus

In [None]:
#Extract RMS whose template phenotype corresponds to the own phenotype
rmsDF_kk = pd.DataFrame(index=moduleDF.index)
for tempDF1, tempDF2 in [[rmsDF_prot, rciDF_prot], [rmsDF_txn, rciDF_txn]]:
    phenotypeL = tempDF2.drop(columns=['ModuleID', 'Template']).columns.tolist()
    tempDF1 = tempDF1.set_index('ModuleID')
    for k in phenotypeL:
        tempL = sampleDF.loc[sampleDF['Group']==k].index.tolist()
        tempDF = tempDF1[tempL].loc[tempDF1['Template']==k]
        rmsDF_kk = pd.merge(rmsDF_kk, tempDF, left_index=True, right_index=True, how='left')

display(rmsDF_kk)
display(rmsDF_kk.describe())

# — Go to the R sub-notebook —  

### 3-2. ANOVA test (RMS ~ Intervention), followed by repeated Student's t-tests (Intervention)

#### 3-2-1/2/3. Import the summary table

In [None]:
#Import the summary table
fileDir = './ExportData/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_StatisticalTest-GOBP_ver2-2_'
fileName = 'inter-group-comparison.xlsx'
tempDF = pd.DataFrame()
for dataset in ['Proteomics', 'Transcriptomics']:
    sheetName = 'RCI-'+dataset
    tempDF1 = pd.read_excel(fileDir+ipynbName+fileName, sheet_name=sheetName, engine='openpyxl')
    tempDF1['Dataset'] = dataset
    tempDF1 = tempDF1.set_index(['Dataset', 'ModuleID'])
    tempDF = pd.concat([tempDF, tempDF1], axis=0)
tempDF = tempDF.sort_values(by='Intervention_Pval', ascending=True)
display(tempDF)

statDF = tempDF

#### 3-2-4. Changed modules (ANOVA)

In [None]:
#Prepare variables in the model
tempS = statDF.loc[:, statDF.columns.str.contains('_Fstat')].columns.to_series()
variableL = tempS.str.replace('_Fstat', '').tolist()

#Changed modules
for variable in variableL:
    tempDF = statDF.loc[statDF[variable+'_AdjPval']<0.05]
    tempDF = tempDF.sort_values(by=variable+'_AdjPval', ascending=True)
    tempL1 = tempDF.loc[:, tempDF.columns.str.contains('_RMSmean')].columns.tolist()
    tempL2 = tempDF.loc[:, tempDF.columns.str.contains('^'+variable+'_')].columns.tolist()
    tempDF = tempDF[[col_n for subL in [['ModuleName'], tempL1, tempL2] for col_n in subL]]
    print(variable+' (adjusted P < 0.05):', len(tempDF))
    tempL = tempDF.index.to_frame()['ModuleID'].unique().tolist()
    print(' -> Unique module:', len(tempL))
    tempL = tempDF.index.to_frame()['Dataset'].unique().tolist()
    for dataset in tempL:
        tempDF1 = tempDF.loc[dataset]
        print(' -> '+dataset+':', len(tempDF1))
    display(tempDF)

#### 3-2-5. Changed modules by each intervention (Student's t-tests)

In [None]:
#Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')

#Extract only the changed modules
variable = 'Intervention'
tempS1 = tempDF['Prot_'+variable+'_AdjPval']<0.05
tempS2 = tempDF['Tran_'+variable+'_AdjPval']<0.05
tempDF = tempDF.loc[tempS1|tempS2]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print(variable+' (adjusted P < 0.05):', len(tempDF))
print(' -> Proteomics:', sum(tempS1))
print(' -> Transcriptomics:', sum(tempS2))

#Take adjusted P-value
tempDF1 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_AdjPval$')]
tempDF1.columns = tempDF1.columns.str.replace('_AdjPval$', '')
tempDF1 = pd.merge(tempDF[['ModuleName', 'Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval']], tempDF1,
                   left_index=True, right_index=True, how='left')
print('Adjusted P-value:')
display(tempDF1)
display(tempDF1.describe())

#Take effect size
tempDF2 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_Coef$')]
tempDF2.columns = tempDF2.columns.str.replace('_Coef$', '')
tempDF2 = pd.merge(tempDF[['ModuleName', 'Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval']], tempDF2,
                   left_index=True, right_index=True, how='left')
print('Changed direction (effect size):')
display(tempDF2)
display(tempDF2.describe())

pvalDF = tempDF1
diffDF = tempDF2

> Check the changed modules (based on the nominal P-value for the main effect and the adjusted P-values for the post-hoc tests) as reference.  

In [None]:
#Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')

#Extract only the changed modules
variable = 'Intervention'
tempS1 = tempDF['Prot_'+variable+'_Pval']<0.05
tempS2 = tempDF['Tran_'+variable+'_Pval']<0.05
tempDF = tempDF.loc[tempS1|tempS2]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_Pval', 'Tran_'+variable+'_Pval'], ascending=True)
print(variable+' (nominal P < 0.05):', len(tempDF))
print(' -> Proteomics:', sum(tempS1))
print(' -> Transcriptomics:', sum(tempS2))

#Take adjusted P-value
tempDF1 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_AdjPval$')]
tempDF1.columns = tempDF1.columns.str.replace('_AdjPval$', '')
tempDF1 = pd.merge(tempDF[['ModuleName', 'Prot_'+variable+'_Pval', 'Tran_'+variable+'_Pval']], tempDF1,
                   left_index=True, right_index=True, how='left')
print('Adjusted P-value:')
display(tempDF1)
display(tempDF1.describe())

#Take effect size
tempDF2 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_Coef$')]
tempDF2.columns = tempDF2.columns.str.replace('_Coef$', '')
tempDF2 = pd.merge(tempDF[['ModuleName', 'Prot_'+variable+'_Pval', 'Tran_'+variable+'_Pval']], tempDF2,
                   left_index=True, right_index=True, how='left')
#print('Changed direction (effect size):')
#display(tempDF2)
#display(tempDF2.describe())

pvalDF_ref = tempDF1
diffDF_ref = tempDF2

### 3-3. Visualization: clustermap

In [None]:
#Prepare color labels for tightened module set
regulation = 'Tightened'
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Control':plt.get_cmap('tab20')(1), 'Acarbose':plt.get_cmap('tab20')(7),
          'Rapamycin':plt.get_cmap('tab20')(9)}
tempDF = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains('-vs-')]#Based on nominal ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF_ref[col_n]#Adjusted P-values in post-hoc tests
    tempS2 = diffDF_ref[col_n]
    tempS3 = pvalDF[col_n]#Adjusted P-values in post-hoc tests
    tempS4 = diffDF[col_n]
    if regulation=='Changed':
        tempS2 = tempS2.loc[(tempS1<0.05)]
        tempS4 = tempS4.loc[(tempS3<0.05)]
    elif regulation=='Tightened':
        tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        tempS4 = tempS4.loc[(tempS3<0.05)&(tempS4>0)]
    elif regulation=='Loosened':
        tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempS4 = tempS4.loc[(tempS3<0.05)&(tempS4<0)]
    label = re.sub('-vs-.*', '', col_n)
    group = tempD0[re.sub('.*_', '', label)]
    dataset = re.sub('_.*', '', label)
    tempL = []
    for module in tempDF.index.tolist():
        if module in tempS2.index.tolist():
            if module in tempS4.index.tolist():#Based on adjusted ANOVA P-values
                tempL.append(tempD1[group])
            else:#Based on nominal ANOVA P-values
                tempL.append(tempD2[group])
        else:
            tempL.append('white')
    tempDF[dataset[0]+'-'+group] = tempL
    print(regulation+' module in '+col_n)
    print(' -> in adjusted ANOVA P < 0.05:', len(tempS4))
    print(' -> in nominal ANOVA P < 0.05:', len(tempS2))
tempDF = tempDF[['P-Acarbose', 'P-Rapamycin', 'T-Acarbose', 'T-Rapamycin']]

#Prepare color labels for samples
tempL = [dataset[0]+'-'+group for dataset in ['Proteomics', 'Transcriptomics'] for group in list(tempD1.keys())]
tempA = np.repeat(['tab:pink', 'tab:cyan'], 3)
tempDF1 = pd.DataFrame({'Data':tempA, 'Group':np.tile(list(tempD1.values()), 2)}, index=tempL)

#Clustermap
sns.set(style='ticks', font='Arial', context='talk')
cm = sns.clustermap(rciDF_kk.T, method='ward', metric='euclidean', cmap='afmhot',
                    row_cluster=True, col_cluster=True, row_linkage=None, col_linkage=None,
                    row_colors=tempDF1, col_colors=tempDF, xticklabels=False, yticklabels=True,
                    dendrogram_ratio=(0.025, 0.2), colors_ratio=(0.025, 0.075),
                    cbar_pos=(0.05, -0.05, 0.3, 0.075), cbar_kws={'orientation': 'horizontal'},
                    figsize=(12, 4), **{'vmin':0.5, 'vmax':1})
cm.cax.set_title('Module RCI', size='medium',
                 verticalalignment='bottom', horizontalalignment='center')
cm.cax.tick_params(labelsize='small')
bottom, top = cm.ax_heatmap.get_ylim()
#cm.ax_heatmap.set_ylim(bottom + 0.5, top - 0.5)##To avoid half cut of first and last rows
hm = cm.ax_heatmap.get_position()
rd = cm.ax_row_dendrogram.get_position()
cd = cm.ax_col_dendrogram.get_position()
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width, hm.height])
cm.ax_row_dendrogram.set_position([rd.x0, rd.y0, rd.width, rd.height])
cm.ax_col_dendrogram.set_position([cd.x0, cd.y0, cd.width, cd.height])
cm.ax_heatmap.set_xlabel('GOBP module')
cm.ax_heatmap.set_ylabel('')
##row/column color bar legend (axis is same with cm.cax!)
tempL = []
for group in tempD1.keys():
    if group!='Control':
        tempL.append(mpatches.Patch(color=tempD1[group],
                                    label='by '+group+' (adjusted '+r'$P$'+' < 0.05)'))
for group in tempD2.keys():
    if group!='Control':
        tempL.append(mpatches.Patch(color=tempD2[group], label='(nominal '+r'$P$'+' < 0.05)'))
legend1 = plt.legend(handles=tempL, fontsize='small', labelspacing=0.2, ncol=2,
                     title=regulation+' module (vs. Control)', title_fontsize='medium',
                     bbox_to_anchor=(1, 0.5), loc='center left', borderaxespad=3.5, frameon=False)
plt.gca().add_artist(legend1)
##Save
fileDir = './ExportFigures/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
fileName = 'RCI-clustermap.tif'
plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                  pil_kwargs={'compression':'tiff_lzw'})
plt.show()

#Save label order
tempDF = moduleDF.loc[rciDF_kk.index[cm.dendrogram_col.reordered_ind]]
tempD = {'P-Control':'Prot-Cont', 'P-Acarbose':'Prot-Acar', 'P-Rapamycin':'Prot-Rapa',
         'T-Control':'Tran-Cont', 'T-Acarbose':'Tran-Acar', 'T-Rapamycin':'Tran-Rapa'}
tempDF1 = rciDF_kk.copy()
tempDF1.columns = tempDF1.columns.map(tempD)
tempDF = pd.merge(tempDF['ModuleName'], tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.reset_index()
tempDF.index.name = 'Xcoord'
display(tempDF)
fileDir = './ExportData/'
ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
fileName = 'RCI-clustermap-xticks.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

### 3-4. Visualization: venn diagram

In [None]:
#Prepare label and color
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin'}
tempD1 = {'T-Acarbose':plt.get_cmap('tab20')(7), 'P-Acarbose':'tab:red',
          'P-Rapamycin':'tab:purple', 'T-Rapamycin':plt.get_cmap('tab20')(9)}

#Visualization per direction
for regulation in ['Changed', 'Tightened', 'Loosened']:
    #Prepare module sets
    tempD2 = {}
    tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
    for col_n in tempDF.columns.tolist():
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if regulation=='Changed':
            tempS2 = tempS2.loc[(tempS1<0.05)]
        elif regulation=='Tightened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif regulation=='Loosened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        label = re.sub('-vs-.*', '', col_n)
        group = tempD0[re.sub('.*_', '', label)]
        dataset = re.sub('_.*', '', label)
        tempD2[dataset[0]+'-'+group] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()#Based on adjusted ANOVA P-values
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(tempDF)
    print(regulation+' modules (vs. Control):')
    print(' -> Not significant in all contrasts:', count)
    
    #Skip the followings if no significant module
    if count==len(pvalDF):
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.1, 0.875))#Otherwise, weird space...
    ##Add legend annotation
    x_coord = [0.1, 0.1, 0.9, 0.9]
    y_coord = [0.25, 0.7, 0.7, 0.25]
    h_align = ['right', 'right', 'left', 'left']
    v_align = ['top', 'bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = regulation+' modules (vs. Control)'
    ax.set_title(title, fontsize='medium')
    ##Save
    if regulation!='Changed':
        fileDir = './ExportFigures/'
        ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
        fileName = 'RCI-inter-group-comparison_venn-'+regulation.lower()+'.tif'
        plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                          pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

In [None]:
#Export module list in each subset in the venn diagram
for regulation in ['Tightened', 'Loosened']:
    #Prepare module sets
    tempD = {}
    tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
    for col_n in tempDF.columns.tolist():
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if regulation=='Changed':
            tempS2 = tempS2.loc[(tempS1<0.05)]
        elif regulation=='Tightened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif regulation=='Loosened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempD[col_n] = set(tempS2.index.tolist())
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()#Based on adjusted ANOVA P-values
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(tempDF)
    print(regulation+' modules (vs. Control):')
    print(' -> Not significant in all contrasts:', count)
    
    #Skip the followings if no significant module
    if count==len(pvalDF):
        continue
    
    #Prepare a new .xlsx file (dummy README)
    tempL1 = [len(tempD[key]) for key in tempD.keys()]
    tempDF = pd.DataFrame({'Group':tempD.keys(), 'nModules':tempL1})
    tempDF = tempDF.reset_index().rename(columns={'index':'VennOrder'})
    tempDF['VennOrder'] = tempDF['VennOrder'] + 1
    fileDir = './ExportData/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_venn-'+regulation.lower()+'.xlsx'
    tempDF.to_excel(fileDir+ipynbName+fileName, sheet_name='README', header=True, index=False)
    display(tempDF)#Check
    
    #Prepare saving data (statDF spread by dataset)
    tempDF = moduleDF['ModuleName']#pd.Series() for now
    for dataset in ['Proteomics', 'Transcriptomics']:
        tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
        tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
        tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
    
    t_start = time.time()
    #Extract overall set
    for key_i in range(len(tempD)):
        key = list(tempD.keys())[key_i]
        tempS = tempD[key]
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)].sort_index(ascending=True)
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        tempL1 = ['NA' for i in range(len(tempD))]
        tempL1[key_i] = '1'
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    #Extract subset
    tempL1 = ['1', '0']
    tempL2 = [[k1, k2, k3, k4] for k1 in tempL1 for k2 in tempL1 for k3 in tempL1 for k4 in tempL1]
    #tempL2.remove(['0', '0', '0', '0'])
    for tempL1 in tempL2:
        #Positive module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='1']
        tempS1 = set(pvalDF.index.tolist())#Initialize
        for tempS in tempL3:
            tempS1 = tempS1 & tempS
        #Negative module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='0']
        tempS2 = set()#Initialize
        for tempS in tempL3:
            tempS2 = tempS2 | tempS
        #Extract subset
        tempS = tempS1 - tempS2
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)].sort_index(ascending=True)
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    t_elapsed = time.time() - t_start
    print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

### 3-5. Visualization: pointplot

#### 3-5-1. Modules tightened by all interventions and datasets

In [None]:
#Prepare the target module set
posL = ['Prot_Acar-vs-Cont', 'Prot_Rapa-vs-Cont', 'Tran_Acar-vs-Cont', 'Tran_Rapa-vs-Cont']
negL = ['']
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-2. Modules tightened by all interventions specifically in proteomics

In [None]:
#Prepare the target module set
posL = ['Prot_Acar-vs-Cont', 'Prot_Rapa-vs-Cont']
negL = ['Tran_Acar-vs-Cont', 'Tran_Rapa-vs-Cont']
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-3. Modules tightened by all interventions specifically in transcriptomics

> Because proteins are the players for cellular functions, this module is not important even if positive, though.  

In [None]:
#Prepare the target module set
posL = ['Tran_Acar-vs-Cont', 'Tran_Rapa-vs-Cont']
negL = ['Prot_Acar-vs-Cont', 'Prot_Rapa-vs-Cont']
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-4. Modules tightened by Aca in both datasets

In [None]:
#Prepare the target module set
posL = ['Prot_Acar-vs-Cont', 'Tran_Acar-vs-Cont']
negL = []
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-5. Modules tightened by Aca specifically in proteomics

In [None]:
#Prepare the target module set
posL = ['Prot_Acar-vs-Cont']
negL = ['Tran_Acar-vs-Cont']
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-6. Modules tightened by Rapa in both datasets

In [None]:
#Prepare the target module set
posL = ['Prot_Rapa-vs-Cont', 'Tran_Rapa-vs-Cont']
negL = []
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-7. Modules tightened by Rapa specifically in proteomics

In [None]:
#Prepare the target module set
posL = ['Prot_Rapa-vs-Cont']
negL = ['Tran_Rapa-vs-Cont']
regulation = 'Tightened'
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['Prot_'+variable+'_AdjPval', 'Tran_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF.loc[module, pvalDF.columns.str.contains('-vs-')]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 3-5-8. Modules of special interest

In [None]:
plotL = ['GO:0006635', 'GO:0031998', 'GO:0016558', 'GO:0006625',#Common in M001
         'GO:0006637', 'GO:0006734', 'GO:0001561',#Related to fatty acid oxidation (mentioned in previous manuscript)
         'GO:0023035', 'GO:0039529',#Immune response (mentioned in previous manuscript)
         'GO:0010638', 'GO:0002181', 'GO:0045899',#Mentioned in previous manuscript
         'GO:0006635', 'GO:1990126', 'GO:0098761',#Common in proteomics and transcriptomics
         'GO:0061732', 'GO:0006086', 'GO:0019441',#Specific to proteomics (Aca)
         'GO:0031998', 'GO:0044794', 'GO:0006625', 'GO:0016558', 'GO:0019441', 'GO:0033572',#Specific to proteomics (Rapa)
         'GO:0006635', 'GO:0002181', 'GO:0000028', 'GO:0016558', 'GO:0006734', 'GO:0098761',#Common in M001 + M004
         'GO:0015986', 'GO:0042776', 'GO:0070934', 'GO:0006703', 'GO:0034354', 'GO:0045899']#Similar to 4EGI-1
plotL = list(set(plotL))
plotL.sort()
for module in plotL:
    if module not in moduleDF.index.tolist():
        print(module+' was NOT included in this analysis.')
        plotL.remove(module)

#Prepare the target module set
##Spread statDF by dataset to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['Proteomics', 'Transcriptomics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset[0:4]+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[plotL]
display(tempDF)

#Prepare DF for plot
tempDF = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF1 = sampleDF.reset_index()[['SampleID', 'Dataset', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Proteomics':'Proteins', 'Transcriptomics':'Transcripts'}
tempDF['Dataset'] = tempDF['Dataset'].map(tempD)
tempDF['Group'] = tempDF['Phenotype']
tempD0 = {'Cont':'Control', 'Acar':'Acarbose', 'Rapa':'Rapamycin', 'Prot':'Proteins', 'Tran':'Transcripts'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Module '+str(rank_i+1)+':')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Dataset'] = pd.Categorical(tempDF2['Dataset'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Re-prepare p-values because these modules can be filtered out in pvalDF
    tempDF2 = moduleDF['ModuleName']#pd.Series() for now
    for dataset in ['Proteomics', 'Transcriptomics']:
        tempDF3 = statDF.loc[dataset].drop(columns=['ModuleName'])
        tempDF3.columns = dataset[0:4]+'_'+tempDF3.columns
        tempDF2 = pd.merge(tempDF2, tempDF3, left_index=True, right_index=True, how='left')
    tempDF2 = tempDF2.loc[:, tempDF2.columns.str.contains('-vs-.*_AdjPval$')]
    tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')
    ##Retrieve statistical significance
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF2['Dataset']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Dataset'] = tempDF2['Dataset'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Dataset']==dataset]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Dataset']==dataset]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220525_LC-M001-DIRAC-prot-vs-txn_Comparison-GOBP_ver2-2_'
    fileName = 'RCI-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

## 4. Rank matching score under a fixed consensus: inter-group module comparison

> The comparison of RMSs under a fixed rank consensus is skipped, since the pattern similarity across datasets is out of interest.  

# — End of notebook —