# DIRAC Comparisons between LC M001 Liver Proteomics and Related Transcriptomics in GOBP Modules

***by Kengo Watanabe***  

The differential rank conservation (DIRAC; Eddy, J.A. et al. PLoS Comput. Biol. 2010) analyses were performed on (1) the preprocessed Longevity Consortium (LC) M001 proteomics dataset (adjusted with sex and age; analytes detected in all samples; sample-based robust Z-score followed by analyte-based robust Z-score) using the retrieved a priori module set (Gene Ontology (Biological Process) (GOBP) derived by EMBL-EBI QuickGO API; ≥4 analytes and ≥50% coverage) and (2) the preprocessed LC M001-related transcriptomics dataset (Tyshkovskiy, A. et al. Cell Metab. 2019; adjusted with sex and age; analytes detected in all samples; sample-based robust Z-score followed by analyte-based robust Z-score) using the retrieved a priori module set (GOBP derived by R org.Mm.eg.db package; ≥4 analytes and ≥50% coverage).  
–> This Jupyter Notebook (with Python 3 kernel) compared the DIRAC results between the LC M001 proteomics and M001-related transcriptomics data.  

Input files:  
- Module metadata (proteomics): 230214_LC-M001-proteomics-DIRAC-ver7-2_DIRAC-GOBP_module-metadata.tsv  
- Module metadata (transcriptomics): 230217_LC-M001-related-TrOmics-DIRAC-ver3_DIRAC-GOBP_onWenc_module-metadata.tsv  
- Sample–mouse metadata (proteomics): 230213_LC-M001-proteomics-DIRAC-ver7-2_Preprocessing_sample-metadata.tsv  
–> Preprocessed analyte data (proteomics): 230213_LC-M001-proteomics-DIRAC-ver7-2_Preprocessing_normalized-data.tsv  
- Sample–mouse metadata (transcriptomics): 230215_LC-M001-related-TrOmics-DIRAC-ver3_Preprocessing_onWenc_sample-metadata.tsv  
–> Preprocessed analyte data (transcriptomics): 230215_LC-M001-related-TrOmics-DIRAC-ver3_Preprocessing_onWenc_normalized-data.tsv  
- DIRAC RMS data (proteomics): 230214_LC-M001-proteomics-DIRAC-ver7-2_DIRAC-GOBP_RankMatchingScore-BS-combined.tsv  
- DIRAC RCI data (proteomics): 230214_LC-M001-proteomics-DIRAC-ver7-2_DIRAC-GOBP_RankConservationIndex-BS-combined.tsv  
- DIRAC RMS data (transcriptomics): 230217_LC-M001-related-TrOmics-DIRAC-ver3_DIRAC-GOBP_onWenc_RankMatchingScore-BS-combined.tsv  
- DIRAC RCI data (transcriptomics): 230217_LC-M001-related-TrOmics-DIRAC-ver3_DIRAC-GOBP_onWenc_RankConservationIndex-BS-combined.tsv  

Output figures and tables:  
- Figure 4b–d, 5d, 5g  
- Supplementary Data 6  

Original notebook (memo for my future tracing):  
- dalek:\[JupyterLab HOME\]/230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3/230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP.ipynb  

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
#For Arial font
#!conda install -c conda-forge -y mscorefonts
##-> The below was also needed in matplotlib 3.4.2
#import shutil
#import matplotlib
#shutil.rmtree(matplotlib.get_cachedir())
import warnings
warnings.filterwarnings('ignore')
from IPython.display import display
import time
#For exporting .pdf file with editable text
import matplotlib
matplotlib.rcParams['pdf.fonttype']=42
matplotlib.rcParams['ps.fonttype']=42

import statsmodels.formula.api as smf
from statsmodels.stats.anova import anova_lm
from statsmodels.stats import weightstats
from statsmodels.stats import multitest as multi
from decimal import Decimal, ROUND_HALF_UP
import re
import matplotlib.patches as mpatches
from matplotlib.offsetbox import AnchoredText
#!pip install venn
from venn import venn
from textwrap import wrap

!conda list

# packages in environment at /opt/conda/envs/arivale-py3:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
analytics                 0.1                      pypi_0    pypi
argon2-cffi               21.1.0           py39h3811e60_0    conda-forge
arivale-data-interface    0.1.0                    pypi_0    pypi
async_generator           1.10                       py_0    conda-forge
atk-1.0                   2.36.0               h3371d22_4    conda-forge
attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
biopython                 1.79             py39h3811e60_0    conda-forge
bleach 

## 1. Prepare metadata and DIRAC results

> The transcriptomics-related files were copied from wenceslaus server in advance.  

### 1-1. Module–analyte metadata

In [None]:
#Import the cleaned module metadata for proteomics
fileDir = '../230206_LC-M001-proteomics-DIRAC-ver7/ExportData/'
ipynbName = '230214_LC-M001-proteomics-DIRAC-ver7-2_DIRAC-GOBP_'
fileName = 'module-metadata.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('ModuleID')
print('Proteomics modules:', len(tempDF1))

#Import the cleaned module metadata for tran
fileDir = './ImportData/'
ipynbName = '230217_LC-M001-related-TrOmics-DIRAC-ver3_DIRAC-GOBP_onWenc_'
fileName = 'module-metadata.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('ModuleID')
print('Transcriptomics modules:', len(tempDF2))

#Retrieve common modules
tempL = ['ModuleName', 'ModuleType']
tempDF = tempDF1.loc[tempDF1.index.isin(tempDF2.index.tolist()), tempL]
##Clean
tempDF1 = tempDF1.drop(columns=tempL)
tempDF1.columns = 'PrOmics_'+tempDF1.columns
tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF2 = tempDF2.drop(columns=tempL)
tempDF2.columns = 'TrOmics_'+tempDF2.columns
tempDF = pd.merge(tempDF, tempDF2, left_index=True, right_index=True, how='left')
print('Common modules:', len(tempDF))
display(tempDF)

#Save
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'module-metadata.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

moduleDF = tempDF

### 1-2. Sample–mouse metadata

In [None]:
tempL1 = ['Ctrl', 'Aca', 'Rapa']#Target groups to be comparaed
tempL2 = ['Dataset', 'Intervention', 'Sex', 'Age', 'Phenotype']

#Prepare sample-mouse metadata for proteomics
fileDir = '../230206_LC-M001-proteomics-DIRAC-ver7/ExportData/'
ipynbName = '230213_LC-M001-proteomics-DIRAC-ver7-2_Preprocessing_'
fileName = 'sample-metadata.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('SampleID')
##Clean
tempDF1['Phenotype'] = tempDF1['Intervention']#Sex-and-age-pooled for DIRAC template (rank consensus)
tempDF1['Dataset'] = 'PrOmics'
tempDF1['Age'] = '12M'#Forced conversion to month-level
tempDF1 = tempDF1[tempL2]
##Select the assessed samples
fileName = 'normalized-data.tsv'
tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('UniProtID')
tempDF1 = tempDF1.loc[tempDF1.index.isin(tempDF.columns)]
##Take only the target groups
tempDF1 = tempDF1.loc[tempDF1['Intervention'].isin(tempL1)]

#Prepare sample-mouse metadata for transcriptomics
fileDir = './ImportData/'
ipynbName = '230215_LC-M001-related-TrOmics-DIRAC-ver3_Preprocessing_onWenc_'
fileName = 'sample-metadata.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('SampleID')
##Clean
tempDF2['Phenotype'] = tempDF2['Intervention']#Sex-and-age-pooled for DIRAC template (rank consensus)
tempDF2['Dataset'] = 'TrOmics'
tempDF2['Intervention'] = tempDF2['Intervention'].str.replace('Ctrl1', 'Ctrl')#For consistency
tempDF2 = tempDF2[tempL2]
##Select the assessed samples
fileName = 'normalized-data.tsv'
tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('EnsemblID')
tempDF2 = tempDF2.loc[tempDF2.index.isin(tempDF.columns)]
##Take only the target groups
tempDF2 = tempDF2.loc[tempDF2['Intervention'].isin(tempL1)]

#Merge
tempDF = pd.concat([tempDF1, tempDF2], axis=0)
tempDF['Group'] = tempDF['Dataset'].str.slice(start=0, stop=1)+':'+tempDF['Intervention']

display(tempDF)
display(tempDF['Group'].value_counts())

#Save
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'sample-metadata.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

sampleDF = tempDF

### 1-3. DIRAC metrics

In [None]:
#Prepare the DIRAC results for proteomics
fileDir = '../230206_LC-M001-proteomics-DIRAC-ver7/ExportData/'
ipynbName = '230214_LC-M001-proteomics-DIRAC-ver7-2_DIRAC-GOBP_'
tempD1 = {'RMS':'RankMatchingScore-BS-combined.tsv',
          'RCI':'RankConservationIndex-BS-combined.tsv'}
dataset = 'PrOmics'
tempD2 = {}
for metric in tempD1.keys():
    #Import the combined DIRAC results
    fileName = tempD1[metric]
    tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
    print(metric)
    print('Original DF:', tempDF.shape)
    print(' - Unique modules:', len(tempDF['ModuleID'].unique()))
    print(' - Unique templates:', len(tempDF['Template'].unique()))
    
    #Take only the target modules and samples/groups
    tempDF = tempDF.loc[tempDF['ModuleID'].isin(moduleDF.index.tolist())]
    tempDF1 = sampleDF.loc[sampleDF['Dataset']==dataset]
    tempDF = tempDF.loc[tempDF['Template'].isin(tempDF1['Phenotype'].unique())]
    tempDF = tempDF.set_index(['ModuleID', 'Template'])
    if metric=='RMS':
        tempDF = tempDF.loc[:, tempDF.columns.isin(tempDF1.index.tolist())]
    elif metric=='RCI':
        tempDF = tempDF.loc[:, tempDF.columns.isin(tempDF1['Phenotype'].unique())]
    tempDF = tempDF.reset_index()
    
    #Rename labels
    tempS = tempDF1.reset_index().set_index('Phenotype')['Group'].drop_duplicates(keep='first')
    tempDF['Template'] = tempDF['Template'].map(tempS)
    if metric=='RCI':
        tempDF = tempDF.set_index(['ModuleID', 'Template'])
        tempDF.columns = tempDF.columns.map(tempS)
        tempDF = tempDF.reset_index()
    
    print('Cleaned DF:', tempDF.shape)
    print(' - Unique modules:', len(tempDF['ModuleID'].unique()))
    print(' - Unique templates:', len(tempDF['Template'].unique()))
    display(tempDF)
    tempD2[metric] = tempDF
    print('')

#Save
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
for metric in tempD2.keys():
    tempDF = tempD2[metric]
    fileName = dataset+'-'+metric+'.tsv'
    tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

rmsDF_p = tempD2['RMS']
rciDF_p = tempD2['RCI']

In [None]:
#Prepare the DIRAC results for transcriptomics
fileDir = './ImportData/'
ipynbName = '230217_LC-M001-related-TrOmics-DIRAC-ver3_DIRAC-GOBP_onWenc_'
tempD1 = {'RMS':'RankMatchingScore-BS-combined.tsv',
          'RCI':'RankConservationIndex-BS-combined.tsv'}
dataset = 'TrOmics'
tempD2 = {}
for metric in tempD1.keys():
    #Import the combined DIRAC results
    fileName = tempD1[metric]
    tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
    print(metric)
    print('Original DF:', tempDF.shape)
    print(' - Unique modules:', len(tempDF['ModuleID'].unique()))
    print(' - Unique templates:', len(tempDF['Template'].unique()))
    
    #Take only the target modules and samples/groups
    tempDF = tempDF.loc[tempDF['ModuleID'].isin(moduleDF.index.tolist())]
    tempDF1 = sampleDF.loc[sampleDF['Dataset']==dataset]
    tempDF = tempDF.loc[tempDF['Template'].isin(tempDF1['Phenotype'].unique())]
    tempDF = tempDF.set_index(['ModuleID', 'Template'])
    if metric=='RMS':
        tempDF = tempDF.loc[:, tempDF.columns.isin(tempDF1.index.tolist())]
    elif metric=='RCI':
        tempDF = tempDF.loc[:, tempDF.columns.isin(tempDF1['Phenotype'].unique())]
    tempDF = tempDF.reset_index()
    
    #Rename labels
    tempS = tempDF1.reset_index().set_index('Phenotype')['Group'].drop_duplicates(keep='first')
    tempDF['Template'] = tempDF['Template'].map(tempS)
    if metric=='RCI':
        tempDF = tempDF.set_index(['ModuleID', 'Template'])
        tempDF.columns = tempDF.columns.map(tempS)
        tempDF = tempDF.reset_index()
    
    print('Cleaned DF:', tempDF.shape)
    print(' - Unique modules:', len(tempDF['ModuleID'].unique()))
    print(' - Unique templates:', len(tempDF['Template'].unique()))
    display(tempDF)
    tempD2[metric] = tempDF
    print('')

#Save
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
for metric in tempD2.keys():
    tempDF = tempD2[metric]
    fileName = dataset+'-'+metric+'.tsv'
    tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

rmsDF_t = tempD2['RMS']
rciDF_t = tempD2['RCI']

## 2. Rank conservation index: general pattern

### 2-1. Extract RCI (the mean of RMSs under the own phenotype consensus)

In [None]:
#Extract RCI whose template phenotype corresponds to the own phenotype
rciDF_kk = pd.DataFrame(index=moduleDF.index)
for tempDF in [rciDF_p, rciDF_t]:
    phenotypeL = tempDF.drop(columns=['ModuleID', 'Template']).columns.tolist()
    tempDF = tempDF.set_index('ModuleID')
    for k in phenotypeL:
        tempS = tempDF[k].loc[tempDF['Template']==k]
        rciDF_kk = pd.merge(rciDF_kk, tempS, left_index=True, right_index=True, how='left')
##Sort
tempL1 = ['Ctrl', 'Aca', 'Rapa']
tempL2 = ['PrOmics', 'TrOmics']
tempL = [dataset[0]+':'+intervention for dataset in tempL2 for intervention in tempL1]
rciDF_kk = rciDF_kk[tempL]
display(rciDF_kk)
display(rciDF_kk.describe())

### 2-2. Mann–Whitney U-test

> Note that the scipy API (scipy.stats.mannwhitneyu) is used, because only the one-sided test seems implemented in the current statsmodels API (statsmodels.stats.nonparametric.rank_compare_2indep). Actually, the output objects are same b/w the two APIs, which is contrast to the case of t-test (degrees of freedom is not reported in the scipy API).  

In [None]:
tempDF = rciDF_kk
tempL1 = ['Ctrl', 'Aca', 'Rapa']
tempL2 = ['PrOmics', 'TrOmics']
tempL = [dataset[0]+':'+intervention for dataset in tempL2 for intervention in tempL1]
control_suffix = ':Ctrl'

#Statistical tests
tempDF1 = pd.DataFrame(columns=['Ustat', 'Pval'])
for contrast in tempL:
    dataset = re.sub(':.*$', '', contrast)
    control = dataset+control_suffix
    if control!=contrast:
        tempS1 = tempDF[control]
        tempS2 = tempDF[contrast]
        #Two-sided Mann–Whitney U-test
        ustat, pval = stats.mannwhitneyu(tempS2, tempS1,#U-statistic corresponds to the contrast
                                         use_continuity=True, alternative='two-sided', method='auto')
        tempDF1.loc[contrast+'-vs-'+control] = [ustat, pval]
##P-value adjustment by using Benjamini–Hochberg method
tempDF1['AdjPval'] = multi.multipletests(tempDF1['Pval'], alpha=0.05, method='fdr_bh',
                                         is_sorted=False, returnsorted=False)[1]
tempDF1.index.rename('ComparisonLabel', inplace=True)
display(tempDF1)

#Calculate general statistics
tempDF2 = pd.DataFrame(columns=['N', 'RCImedian', 'RCImad'])
for group in tempL:
    tempS = tempDF[group]
    size = len(tempS)
    median = tempS.median()
    mad = stats.median_absolute_deviation(tempS)#Cf. pd.Series.mad() is not median absolute deviation but mean absolute deviation
    tempDF2.loc[group] = [size, median, mad]
tempDF2.index.rename('GroupLabel', inplace=True)
display(tempDF2)

#Clean
##Reformat while renaming column names
tempD1 = {'Comparison':tempDF1, 'Group':tempDF2}
tempD2 = {}
for target in tempD1.keys():
    tempDF3 = tempD1[target]
    tempL1 = tempDF3.index.tolist()#For sorting later
    tempL2 = tempDF3.columns.tolist()#For sorting later
    tempDF3 = tempDF3.reset_index().melt(var_name='Variable', value_name='Value', id_vars=target+'Label')
    tempDF3['Variable'] = tempDF3[target+'Label']+'_'+tempDF3['Variable']
    tempDF3['ModuleID'] = 'All'#Dummy
    tempDF3 = tempDF3.pivot(index='ModuleID', columns='Variable', values='Value')
    tempDF3.columns.name = None#Erase 'Variable'
    tempL3 = [label+'_'+variable for label in tempL1 for variable in tempL2]
    tempDF3 = tempDF3[tempL3]#Sort
    tempD2[target] = tempDF3
##Merge
tempDF3 = pd.merge(tempD2['Group'], tempD2['Comparison'], left_index=True, right_index=True, how='inner')
##Convert data type
for col_n in tempDF3.columns.tolist():
    if re.search('_N$', col_n):
        tempDF3[col_n] = tempDF3[col_n].astype(int)
#Add dummy module name
tempL1 = tempDF3.columns.tolist()#For sorting later
tempDF3['ModuleName'] = 'General pattern'
tempL2 = [col_n for sublist in [['ModuleName'], tempL1] for col_n in sublist]
tempDF3 = tempDF3[tempL2]
display(tempDF3)

#Save
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'inter-group-comparison_RCIdistribution.tsv'
tempDF3.to_csv(fileDir+ipynbName+fileName, sep='\t', index=True)

statDF = tempDF3

### 2-3. Visualization: boxplot

In [None]:
#Prepare DF and color
tempD0 = {'P:Ctrl':'P-Control', 'P:Aca':'P-Acarbose', 'P:Rapa':'P-Rapamycin',
          'T:Ctrl':'T-Control', 'T:Aca':'T-Acarbose', 'T:Rapa':'T-Rapamycin'}
tempDF = rciDF_kk.rename(columns=tempD0)
tempDF = tempDF.reset_index().melt(var_name='Group', value_name='RCI', id_vars='ModuleID')
tempD = {'P-Control':'tab:blue', 'P-Acarbose':'tab:red', 'P-Rapamycin':'tab:purple',
         'T-Control':plt.get_cmap('tab20')(1), 'T-Acarbose':plt.get_cmap('tab20')(7),
         'T-Rapamycin':plt.get_cmap('tab20')(9)}

#Prepare significance labels
##Retrieve statistical significance
module = 'All'
tempS = statDF.loc[module, statDF.columns.str.contains('AdjPval')]
tempS.index = tempS.index.str.replace('_AdjPval', '')
tempS.name = 'AdjPval'
##Clean
tempDF1 = tempS.index.to_series().str.split(pat='-vs-', expand=True)
tempDF1 = tempDF1.rename(columns={0:'Contrast', 1:'Baseline'})
tempDF1 = pd.merge(tempDF1, tempS, left_index=True, right_index=True, how='left')
tempDF1['Contrast'] = tempDF1['Contrast'].map(tempD0)
tempDF1['Baseline'] = tempDF1['Baseline'].map(tempD0)
##Convert p-value to label
tempL = []
for row_i in range(len(tempDF1)):
    pval = tempDF1['AdjPval'].iloc[row_i]
    if pval<0.001:
        tempL.append('***')
    elif pval<0.01:
        tempL.append('**')
    elif pval<0.05:
        tempL.append('*')
    else:
        pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
        tempL.append(r'$P$ = '+str(pval_text))
tempDF1['SignifLabel'] = tempL
##Add the y-position level in figure
tempDF1['YposLevel'] = [0, 1, 0, 1]
display(tempDF1)

#Visualization
ymax = 1.0
ymin = 0.5
yinter = 0.1
ymargin_t = 0.06
ymargin_b = 0.01
aline_ymin = 0.95
aline_ymargin = 0.05
sns.set(style='ticks', font='Arial', context='talk')
plt.figure(figsize=(3, 4))
p = sns.boxplot(data=tempDF, y='RCI', x='Group', order=list(tempD.keys()), palette=tempD, dodge=False,
                showfliers=True, flierprops={'marker':'o', 'markerfacecolor':'gray', 'alpha':0.4},
                showcaps=True, notch=True)
p.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
##Add border line
p.axvline(x=2.5, **{'linestyle':'dotted', 'color':'black', 'zorder':0})
##Add significance labels
lines = p.axes.get_lines()#Line2D: [[Q1, Q1-1.5IQR], [Q3, Q3+1.5IQR], [Q1, Q1], [Q3, Q3], [Med, Med], [flier]]
lines_unit = 5 + int(True)#showfliers=True
for row_i in range(len(tempDF1)):
    #Baseline
    group_0 = tempDF1['Baseline'].iloc[row_i]
    index_0 = list(tempD.keys()).index(group_0)
    whisker_0 = lines[index_0*lines_unit + 1]
    xcoord_0 = whisker_0._x[1]#Q3+1.5IQR
    #ycoord_0 = whisker_0._y[1]#Q3+1.5IQR
    #Contrast
    group_1 = tempDF1['Contrast'].iloc[row_i]
    index_1 = list(tempD.keys()).index(group_1)
    whisker_1 = lines[index_1*lines_unit + 1]
    xcoord_1 = whisker_1._x[1]#Q3+1.5IQR
    #ycoord_1 = whisker_1._y[1]#Q3+1.5IQR
    #Standard point of marker
    xcoord = (xcoord_0+xcoord_1)/2
    #ycoord = max(ycoord_0, ycoord_1)
    ycoord = aline_ymin + aline_ymargin*tempDF1['YposLevel'].iloc[row_i]
    label = tempDF1['SignifLabel'].iloc[row_i]
    #Add annotation lines
    aline_offset = yinter/10
    aline_length = yinter/10 + aline_offset
    plt.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
             [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
             lw=1.5, c='k')
    #Add annotation text
    if label in ['***', '**', '*']:
        text_offset = yinter/25
        p.annotate(label, xy=(xcoord, ycoord+text_offset),
                   horizontalalignment='center', verticalalignment='bottom',
                   fontsize='medium', color='k')
    else:
        text_offset = yinter/5
        p.annotate(label, xy=(xcoord, ycoord+text_offset),
                   horizontalalignment='center', verticalalignment='bottom',
                   fontsize='x-small', color='k')
sns.despine()
plt.xlabel('')
plt.ylabel('Module RCI')
plt.xticks(rotation=70, horizontalalignment='right', verticalalignment='center', rotation_mode='anchor')
##Save
fileDir = './ExportFigures/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'RCI-boxplot.pdf'
plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
plt.show()

## 3. Rank conservation index: inter-group module comparison

> Test specific hypothesis: control RCI == intervention RCI (i.e., inter-group module comparison).  
> 1. Testing the main effect of intervention on rank mathing scores (RMSs) for each module using ANOVA model  
> 2. Then, performing post-hoc comparisons of RMSs between control vs. each intervention using Welch's t-tests  
>  
> Basically, statistical strategy is same with the one used in each dataset analysis. Because RMS/RCI was not normalized (i.e., the expected mean and variance could be different between datasets due to different number of mapped analytes), dataset and its interaction term are NOT included in ANOVA model; instead, ANOVA model is generated per dataset. The p-value adjustment is performed in a conservative manner: the P-values in ANOVA tests are adjusted across all models (= modules x datasets), and those in post-hoc tests are adjusted across all comparisons (interventions x datasets) only within the module (not across modules).  

### 3-1. Extract RMS under the own phenotype consensus

In [None]:
#Extract RMS whose template phenotype corresponds to the own phenotype
rmsDF_kk = pd.DataFrame(index=moduleDF.index)
for tempDF1, tempDF2 in zip([rmsDF_p, rmsDF_t], [rciDF_p, rciDF_t]):
    phenotypeL = tempDF2.drop(columns=['ModuleID', 'Template']).columns.tolist()
    tempDF1 = tempDF1.set_index('ModuleID')
    for k in phenotypeL:
        tempL = sampleDF.loc[sampleDF['Group']==k].index.tolist()
        tempDF = tempDF1[tempL].loc[tempDF1['Template']==k]
        rmsDF_kk = pd.merge(rmsDF_kk, tempDF, left_index=True, right_index=True, how='left')

display(rmsDF_kk)
display(rmsDF_kk.describe())

### 3-2. ANOVA test (RMS ~ Intervention), followed by Welch's t-tests (Intervention)

#### 3-2-1. Perform all statistical tests

In [None]:
tempL1 = ['PrOmics', 'TrOmics']#Target dataset
tempL2 = ['Ctrl', 'Aca', 'Rapa']#Target sample groups to be assessed
control = 'Ctrl'#For post-hoc comparisons
tempDF1 = rmsDF_kk
tempDF2 = sampleDF.loc[sampleDF['Intervention'].isin(tempL2)]
tempI = moduleDF.index
formula = 'RMS ~ C(Intervention)'
tempL3 = ['C(Intervention)']#For variables of interest in ANOVA

#Statistical tests per dataset
tempD1 = {}
tempD2 = {}
for dataset in tempL1:
    #Statistical tests per module
    t_start = time.time()
    tempL4 = []#For ANOVA table
    tempL5 = []#For post-hoc test table
    for module in tempI.tolist():
        #Select the target module RMSs
        tempS = tempDF1.loc[module]
        tempS.name = 'RMS'
        #Add metadata while selecting the target samples
        tempDF = tempDF2.loc[tempDF2['Dataset']==dataset]
        tempDF = pd.merge(tempS, tempDF, left_index=True, right_index=True, how='inner')
        
        #ANOVA
        model = smf.ols(formula, data=tempDF).fit()
        anovaDF = anova_lm(model, typ=2)#ANOVA type doesn't matter in this case
        ##Take the results per variable
        tempDF3 = pd.DataFrame(columns=['DoF', 'Fstat', 'Pval'])
        for variable in tempL3:
            dof1 = int(anovaDF.at[variable, 'df'])#Between-groups
            dof2 = int(anovaDF.at['Residual', 'df'])#Within-groups
            dof = (dof1, dof2)
            fstat = anovaDF.at[variable, 'F']
            pval = anovaDF.at[variable, 'PR(>F)']
            tempDF3.loc[variable] = [dof, fstat, pval]
        tempDF3['AdjPval'] = 1.0#Add dummy column for now
        ##Convert to wide-format
        tempS = pd.Series(len(tempDF), index=['N'], name=module)
        tempL = [tempS]
        for variable in tempDF3.index.tolist():
            tempS = tempDF3.loc[variable]
            tempS.index = variable+'_'+tempS.index
            tempS.name = module
            tempL.append(tempS)
        tempS = pd.concat(tempL, axis=0)
        tempL4.append(tempS)
        
        #Post-hoc tests per control vs. contrast
        tempDF4 = pd.DataFrame(columns=['DoF', 'tStat', 'Pval'])
        for contrast in tempL2:
            if control!=contrast:
                tempS1 = tempDF['RMS'].loc[tempDF['Intervention']==control]
                tempS2 = tempDF['RMS'].loc[tempDF['Intervention']==contrast]
                #Two-sided Welch's t-test
                tstat, pval, dof = weightstats.ttest_ind(tempS2, tempS1,#t-statistic reflects direction from the baseline
                                                         alternative='two-sided', usevar='unequal')
                tempDF4.loc[contrast+'-vs-'+control] = [dof, tstat, pval]
        tempDF4['AdjPval'] = 1.0#Add dummy column for now
        ##Convert to wide-format
        tempL = []
        for comparison in tempDF4.index.tolist():
            tempS = tempDF4.loc[comparison]
            tempS.index = comparison+'_'+tempS.index
            tempS.name = module
            tempL.append(tempS)
        tempS = pd.concat(tempL, axis=0)
        tempL5.append(tempS)
    t_elapsed = time.time() - t_start
    print(dataset)
    print('Elapsed time for', len(tempI), 'ANOVA and',
          (len(tempL2)-1)*len(tempI), 'post-hoc tests (',
          len(tempL2)-1, 'comparisons x', len(tempI), 'modules):',
          round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')
    
    #Generate ANOVA table
    tempDF3 = pd.concat(tempL4, axis=1).T
    tempDF3.index.name = tempI.name
    tempD1[dataset] = tempDF3
    
    #Generate post-hoc test table
    tempDF4 = pd.concat(tempL5, axis=1).T
    tempDF4.index.name = tempI.name
    tempD2[dataset] = tempDF4

#Clean all ANOVA tables
tempDF3 = pd.DataFrame()
for dataset in tempD1.keys():
    tempDF = tempD1[dataset]
    tempDF['Dataset'] = dataset
    tempDF = tempDF.reset_index().set_index(['Dataset', 'ModuleID'])
    tempDF3 = pd.concat([tempDF3, tempDF], axis=0)
##P-value adjustment across all tests (modules x datasets) by using Benjamini–Hochberg method
for variable in tempL3:
    #Overwrite the dummy values
    tempDF3[variable+'_AdjPval'] = multi.multipletests(tempDF3[variable+'_Pval'], alpha=0.05, method='fdr_bh',
                                                       is_sorted=False, returnsorted=False)[1]
##Convert back dtypes (due to the forced change during wide-format)
for col_n in tempDF3.columns.tolist():
    if 'N'==col_n:
        tempDF3[col_n] = tempDF3[col_n].astype(int)
    elif 'DoF' in col_n:
        tempDF3[col_n] = tempDF3[col_n].astype(str)
    else:
        tempDF3[col_n] = tempDF3[col_n].astype(float)
##Rename columns (because only one variable in this case)
tempDF3.columns = 'ANOVA_'+tempDF3.columns.str.replace('^.*_', '', regex=True)
display(tempDF3)

#Clean all post-hoc test tables
tempDF4 = pd.DataFrame()
for dataset in tempD2.keys():
    tempDF = tempD2[dataset]
    tempDF['Dataset'] = dataset
    tempDF = tempDF.reset_index().set_index(['Dataset', 'ModuleID'])
    tempDF4 = pd.concat([tempDF4, tempDF], axis=0)
##P-value adjustment across all tests per module (comparisons x datasets) by using Benjamini–Hochberg method
for module in tempI.tolist():
    tempL = tempDF4.loc[:, tempDF4.columns.str.contains('_Pval$', regex=True)].columns.tolist()
    tempDF = tempDF4.reset_index().melt(var_name='Comparison', value_name='Pval', value_vars=tempL,
                                        id_vars=['Dataset', 'ModuleID'])
    tempDF = tempDF.loc[tempDF['ModuleID']==module]
    tempDF['AdjPval'] = multi.multipletests(tempDF['Pval'], alpha=0.05, method='fdr_bh',
                                            is_sorted=False, returnsorted=False)[1]
    tempDF = tempDF.pivot(index=['Dataset', 'ModuleID'], columns='Comparison', values='AdjPval')
    tempDF.columns = tempDF.columns.str.replace('_Pval', '_AdjPval')
    #Replace the dummy values with the adjusted p-values
    tempL = tempDF4.loc[:, tempDF4.columns.str.contains('_AdjPval$', regex=True)].columns.tolist()
    for col_n in tempL:
        for dataset in tempL1:
            tempDF4.loc[(dataset, module), col_n] = tempDF.loc[(dataset, module), col_n]
display(tempDF4)

statDF1 = tempDF3
statDF2 = tempDF4

In [None]:
tempL1 = ['PrOmics', 'TrOmics']#Target dataset
tempL2 = ['Ctrl', 'Aca', 'Rapa']#Target sample groups to be summarized
tempDF1 = rmsDF_kk
tempDF2 = sampleDF.loc[sampleDF['Intervention'].isin(tempL2)]

#Calculate general statistics per dataset
tempD = {}
for dataset in tempL1:
    #Calculate general statistics per intervention group
    tempL3 = []
    for intervention in tempL2:
        #Select the target samples
        tempL = tempDF2.loc[(tempDF2['Dataset']==dataset)&
                            (tempDF2['Intervention']==intervention)].index.tolist()
        tempDF = tempDF1[tempL]
        #Calculate general statistics
        tempS1 = len(tempL) - tempDF.isnull().sum(axis=1)
        tempS1.name = intervention+'_N'
        tempS2 = tempDF.mean(axis=1)
        tempS2.name = intervention+'_RMSmean'
        tempS3 = tempDF.sem(axis=1, ddof=1)
        tempS3.name = intervention+'_RMSsem'
        #Merge
        tempDF = pd.concat([tempS1, tempS2, tempS3], axis=1)
        tempL3.append(tempDF)
    tempDF = pd.concat(tempL3, axis=1)
    tempD[dataset] = tempDF
##Clean all general statistics tables
tempDF3 = pd.DataFrame()
for dataset in tempD.keys():
    tempDF = tempD[dataset]
    tempDF['Dataset'] = dataset
    tempDF = tempDF.reset_index().set_index(['Dataset', 'ModuleID'])
    tempDF3 = pd.concat([tempDF3, tempDF], axis=0)
display(tempDF3)

#Merge all the tables
print('General statistics table:', tempDF3.shape)
print('ANOVA table:', statDF1.shape)
print('Post-hoc test table:', statDF2.shape)
tempDF = pd.merge(moduleDF['ModuleName'], tempDF3.reset_index(), on='ModuleID', how='right')
tempDF = pd.concat([tempDF.set_index(['Dataset', 'ModuleID']), statDF1, statDF2], axis=1)

#Sort
tempDF = tempDF.sort_values(by='ANOVA_Pval', ascending=True)
display(tempDF)

#Save per rank consensus
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
for dataset in tempL1:
    fileName = 'inter-group-comparison_'+dataset+'-RCI.tsv'
    tempDF3 = tempDF.loc[dataset]
    tempDF3.to_csv(fileDir+ipynbName+fileName, sep='\t', index=True)
    print('Saved .tsv table for '+dataset)
    display(tempDF3)

statDF = tempDF

#### 3-2-2. Changed modules (ANOVA)

In [None]:
#Prepare variables in the model
#-> In this case, only the one variable (intervention) was included.
variableL = ['ANOVA']

#Changed modules
for variable in variableL:
    tempDF = statDF.loc[statDF[variable+'_AdjPval']<0.05]
    tempDF = tempDF.sort_values(by=variable+'_AdjPval', ascending=True)
    tempL1 = tempDF.loc[:, tempDF.columns.str.contains('_RMSmean')].columns.tolist()
    tempL2 = tempDF.loc[:, tempDF.columns.str.contains('^'+variable+'_')].columns.tolist()
    tempDF = tempDF[[col_n for subL in [['ModuleName'], tempL1, tempL2] for col_n in subL]]
    print(variable+' (adjusted P < 0.05):', len(tempDF))
    tempL = tempDF.index.to_frame()['ModuleID'].unique().tolist()
    print(' -> Unique module:', len(tempL))
    tempL = tempDF.index.to_frame()['Dataset'].unique().tolist()
    for dataset in tempL:
        tempDF1 = tempDF.loc[dataset]
        print(' -> '+dataset+':', len(tempDF1))
    display(tempDF)

#### 4-2-3. Changed modules by each intervention (Welch's t-test)

In [None]:
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['PrOmics', 'TrOmics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')

#Extract only the changed modules
variable = 'ANOVA'
tempS1 = tempDF['PrOmics_'+variable+'_AdjPval']<0.05
tempS2 = tempDF['TrOmics_'+variable+'_AdjPval']<0.05
tempDF = tempDF.loc[tempS1|tempS2]
tempDF = tempDF.sort_values(by=['PrOmics_'+variable+'_AdjPval', 'TrOmics_'+variable+'_AdjPval'], ascending=True)
print(variable+' (adjusted P < 0.05):', len(tempDF))
print(' -> Proteomics:', sum(tempS1))
print(' -> Transcriptomics:', sum(tempS2))

#Take adjusted P-value
tempDF1 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_AdjPval$')]
tempDF1.columns = tempDF1.columns.str.replace('_AdjPval$', '')
tempDF1 = pd.merge(tempDF[['ModuleName', 'PrOmics_'+variable+'_AdjPval', 'TrOmics_'+variable+'_AdjPval']],
                   tempDF1, left_index=True, right_index=True, how='left')
print('Adjusted P-value:')
display(tempDF1)
display(tempDF1.describe())

#Take effect size
tempDF2 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_tStat$')]
tempDF2.columns = tempDF2.columns.str.replace('_tStat$', '')
tempDF2 = pd.merge(tempDF[['ModuleName', 'PrOmics_'+variable+'_AdjPval', 'TrOmics_'+variable+'_AdjPval']],
                   tempDF2, left_index=True, right_index=True, how='left')
print('Changed direction (t-statistic):')
display(tempDF2)
display(tempDF2.describe())

pvalDF = tempDF1
diffDF = tempDF2

> Check the changed modules (based on the nominal P-value for the main effect and the adjusted P-values for the post-hoc tests) as reference.  

In [None]:
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['PrOmics', 'TrOmics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')

#Extract only the changed modules
variable = 'ANOVA'
tempS1 = tempDF['PrOmics_'+variable+'_Pval']<0.05
tempS2 = tempDF['TrOmics_'+variable+'_Pval']<0.05
tempDF = tempDF.loc[tempS1|tempS2]
tempDF = tempDF.sort_values(by=['PrOmics_'+variable+'_Pval',
                                'TrOmics_'+variable+'_Pval'], ascending=True)
print(variable+' (nominal P < 0.05):', len(tempDF))
print(' -> Proteomics:', sum(tempS1))
print(' -> Transcriptomics:', sum(tempS2))

#Take adjusted P-value
tempDF1 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_AdjPval$')]
tempDF1.columns = tempDF1.columns.str.replace('_AdjPval$', '')
tempDF1 = pd.merge(tempDF[['ModuleName', 'PrOmics_'+variable+'_AdjPval', 'TrOmics_'+variable+'_AdjPval']],
                   tempDF1, left_index=True, right_index=True, how='left')
print('Adjusted P-value:')
display(tempDF1)
display(tempDF1.describe())

#Take effect size
tempDF2 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_tStat$')]
tempDF2.columns = tempDF2.columns.str.replace('_tStat$', '')
tempDF2 = pd.merge(tempDF[['ModuleName', 'PrOmics_'+variable+'_AdjPval', 'TrOmics_'+variable+'_AdjPval']],
                   tempDF2, left_index=True, right_index=True, how='left')
#print('Changed direction (t-statistic):')
#display(tempDF2)
#display(tempDF2.describe())

pvalDF_ref = tempDF1
diffDF_ref = tempDF2

### 3-3. Visualization: clustermap

In [None]:
tempD = {'P:Ctrl':'P-Control', 'P:Aca':'P-Acarbose', 'P:Rapa':'P-Rapamycin',
         'T:Ctrl':'T-Control', 'T:Aca':'T-Acarbose', 'T:Rapa':'T-Rapamycin'}
tempDF1 = rciDF_kk.rename(columns=tempD)
regulation = 'Tightened'
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Control':plt.get_cmap('tab20')(1), 'Acarbose':plt.get_cmap('tab20')(7),
          'Rapamycin':plt.get_cmap('tab20')(9)}

#Prepare color labels for changed module set
tempDF2 = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains('-vs-')]#Based on nominal ANOVA P-values
for col_n in tempDF2.columns.tolist():
    tempS1 = pvalDF_ref[col_n]#Adjusted P-values in post-hoc tests
    tempS2 = diffDF_ref[col_n]
    tempS3 = pvalDF[col_n]#Adjusted P-values in post-hoc tests
    tempS4 = diffDF[col_n]
    if regulation=='Changed':
        tempS2 = tempS2.loc[(tempS1<0.05)]
        tempS4 = tempS4.loc[(tempS3<0.05)]
    elif regulation=='Tightened':
        tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        tempS4 = tempS4.loc[(tempS3<0.05)&(tempS4>0)]
    elif regulation=='Loosened':
        tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempS4 = tempS4.loc[(tempS3<0.05)&(tempS4<0)]
    label = re.sub('-vs-.*', '', col_n)
    group = tempD0[re.sub('.*_', '', label)]
    dataset = re.sub('_.*', '', label)
    tempL = []
    for module in tempDF2.index.tolist():
        if module in tempS2.index.tolist():
            if module in tempS4.index.tolist():#Based on adjusted ANOVA P-values
                tempL.append(tempD1[group])
            else:#Based on nominal ANOVA P-values
                tempL.append(tempD2[group])
        else:
            tempL.append('white')
    tempDF2[dataset[0]+'-'+group] = tempL
    print(regulation+' module in '+col_n)
    print(' -> in adjusted ANOVA P < 0.05:', len(tempS4))
    print(' -> in nominal ANOVA P < 0.05:', len(tempS2))
tempDF2 = tempDF2[['P-Acarbose', 'P-Rapamycin', 'T-Acarbose', 'T-Rapamycin']]#Sort

#Prepare color labels for samples
tempA = np.repeat(['tab:pink', 'tab:cyan'], 3)
tempDF3 = pd.DataFrame({'Data':tempA, 'Group':np.tile(list(tempD1.values()), 2)},
                       index=list(tempD.values()))

#Clustermap
sns.set(style='ticks', font='Arial', context='talk')
cm = sns.clustermap(tempDF1.T, method='ward', metric='euclidean', cmap='afmhot',
                    row_cluster=True, col_cluster=True, row_linkage=None, col_linkage=None,
                    row_colors=tempDF3, col_colors=tempDF2, xticklabels=False, yticklabels=True,
                    dendrogram_ratio=(0.025, 0.2), colors_ratio=(0.025, 0.075),
                    cbar_pos=(0.05, -0.05, 0.3, 0.075), cbar_kws={'orientation': 'horizontal'},
                    figsize=(12, 4), **{'vmin':0.5, 'vmax':1})
cm.cax.set_title('Module RCI', size='medium',
                 verticalalignment='bottom', horizontalalignment='center')
cm.cax.tick_params(labelsize='small')
bottom, top = cm.ax_heatmap.get_ylim()
#cm.ax_heatmap.set_ylim(bottom + 0.5, top - 0.5)##To avoid half cut of first and last rows
hm = cm.ax_heatmap.get_position()
rd = cm.ax_row_dendrogram.get_position()
cd = cm.ax_col_dendrogram.get_position()
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width, hm.height])
cm.ax_row_dendrogram.set_position([rd.x0, rd.y0, rd.width, rd.height])
cm.ax_col_dendrogram.set_position([cd.x0, cd.y0, cd.width, cd.height])
cm.ax_heatmap.set_xlabel('GOBP module')
cm.ax_heatmap.set_ylabel('')
##row/column color bar legend (axis is same with cm.cax!)
tempL = []
for group in tempD1.keys():
    if group!='Control':
        tempL.append(mpatches.Patch(color=tempD1[group],
                                    label='by '+group+' (adjusted '+r'$P$'+' < 0.05)'))
for group in tempD2.keys():
    if group!='Control':
        tempL.append(mpatches.Patch(color=tempD2[group], label='(nominal '+r'$P$'+' < 0.05)'))
legend1 = plt.legend(handles=tempL, fontsize='small', labelspacing=0.2, ncol=2,
                     title='Tightened module (vs. Control)', title_fontsize='medium',
                     bbox_to_anchor=(1, 0.5), loc='center left', borderaxespad=3.5, frameon=False)
plt.gca().add_artist(legend1)
##Save
fileDir = './ExportFigures/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'RCI-clustermap.pdf'
#plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
plt.show()

#Save label order
tempDF = moduleDF.loc[rciDF_kk.index[cm.dendrogram_col.reordered_ind]]
tempDF1 = rciDF_kk.copy()
tempDF = pd.merge(tempDF['ModuleName'], tempDF1, left_index=True, right_index=True, how='left')
tempDF1 = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Adjusted P-value
tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.reset_index()
tempDF.index.name = 'Xcoord'
display(tempDF)
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'RCI-clustermap-xticks.tsv'
#tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

> –> Simplify.  

In [None]:
tempD = {'P:Ctrl':'P-Control', 'P:Aca':'P-Acarbose', 'P:Rapa':'P-Rapamycin',
         'T:Ctrl':'T-Control', 'T:Aca':'T-Acarbose', 'T:Rapa':'T-Rapamycin'}
tempDF1 = rciDF_kk.rename(columns=tempD)
regulation = 'Tightened'
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin'}
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}

#Prepare color labels for changed module set
tempDF2 = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF2.columns.tolist():
    tempS1 = pvalDF[col_n]#Adjusted P-values in post-hoc tests
    tempS2 = diffDF[col_n]
    if regulation=='Changed':
        tempS2 = tempS2.loc[(tempS1<0.05)]
    elif regulation=='Tightened':
        tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
    elif regulation=='Loosened':
        tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
    label = re.sub('-vs-.*', '', col_n)
    group = tempD0[re.sub('.*_', '', label)]
    dataset = re.sub('_.*', '', label)
    tempL = []
    for module in tempDF2.index.tolist():
        if module in tempS2.index.tolist():
            tempL.append(tempD1[group])
        else:
            tempL.append('white')
    tempDF2[dataset[0]+'-'+group] = tempL
    print(regulation+' module in '+col_n)
    print(' -> in adjusted ANOVA P < 0.05:', len(tempS2))
tempDF2 = tempDF2[['P-Acarbose', 'P-Rapamycin', 'T-Acarbose', 'T-Rapamycin']]#Sort

#Prepare color labels for samples
tempA = np.repeat(['tab:pink', 'tab:cyan'], 3)
tempDF3 = pd.DataFrame({'Data':tempA, 'Group':np.tile(list(tempD1.values()), 2)},
                       index=list(tempD.values()))

#Clustermap
sns.set(style='ticks', font='Arial', context='talk')
cm = sns.clustermap(tempDF1.T, method='ward', metric='euclidean', cmap='afmhot',
                    row_cluster=True, col_cluster=True, row_linkage=None, col_linkage=None,
                    row_colors=tempDF3, col_colors=tempDF2, xticklabels=False, yticklabels=True,
                    dendrogram_ratio=(0.025, 0.2), colors_ratio=(0.025, 0.075),
                    cbar_pos=(0.175, -0.15, 0.3, 0.075), cbar_kws={'orientation': 'horizontal'},
                    figsize=(12, 4), **{'vmin':0.5, 'vmax':1})
cm.cax.set_title('Module RCI', size='medium',
                 verticalalignment='bottom', horizontalalignment='center')
cm.cax.tick_params(labelsize='small')
bottom, top = cm.ax_heatmap.get_ylim()
#cm.ax_heatmap.set_ylim(bottom + 0.5, top - 0.5)##To avoid half cut of first and last rows
hm = cm.ax_heatmap.get_position()
rd = cm.ax_row_dendrogram.get_position()
cd = cm.ax_col_dendrogram.get_position()
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width, hm.height])
cm.ax_row_dendrogram.set_position([rd.x0, rd.y0, rd.width, rd.height])
cm.ax_col_dendrogram.set_position([cd.x0, cd.y0, cd.width, cd.height])
cm.ax_heatmap.set_xlabel('GOBP module')
cm.ax_heatmap.set_ylabel('')
##row/column color bar legend (axis is same with cm.cax!)
tempL = []
for group in tempD1.keys():
    if group!='Control':
        tempL.append(mpatches.Patch(color=tempD1[group], label='by '+group))
legend1 = plt.legend(handles=tempL, fontsize='small', labelspacing=0.2,
                     title='Tightened module (vs. Control)', title_fontsize='medium',
                     bbox_to_anchor=(1, 0.5), loc='center left', borderaxespad=3.5, frameon=False)
plt.gca().add_artist(legend1)
##Save
fileDir = './ExportFigures/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'RCI-clustermap.pdf'
plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
plt.show()

#Save label order
tempDF = moduleDF.loc[rciDF_kk.index[cm.dendrogram_col.reordered_ind]]
tempDF1 = rciDF_kk.copy()
tempDF = pd.merge(tempDF['ModuleName'], tempDF1, left_index=True, right_index=True, how='left')
tempDF1 = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Adjusted P-value
tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.reset_index()
tempDF.index.name = 'Xcoord'
display(tempDF)
fileDir = './ExportData/'
ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
fileName = 'RCI-clustermap-xticks.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

### 3-4. Visualization: venn diagram

In [None]:
#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin'}
tempD1 = {'T-Acarbose':plt.get_cmap('tab20')(7), 'P-Acarbose':'tab:red',
          'P-Rapamycin':'tab:purple', 'T-Rapamycin':plt.get_cmap('tab20')(9)}

#Visualization per direction
for regulation in ['Changed', 'Tightened', 'Loosened']:
    #Prepare module sets
    tempD2 = {}
    tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
    for col_n in tempDF.columns.tolist():
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if regulation=='Changed':
            tempS2 = tempS2.loc[(tempS1<0.05)]
        elif regulation=='Tightened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif regulation=='Loosened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        label = re.sub('-vs-.*', '', col_n)
        group = tempD0[re.sub('.*_', '', label)]
        dataset = re.sub('_.*', '', label)
        tempD2[dataset[0]+'-'+group] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()#Based on adjusted ANOVA P-values
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(tempDF)
    print(regulation+' modules (vs. Control):')
    print(' -> Not significant in all contrasts:', count)
    
    #Skip the followings if no significant module
    if count==len(pvalDF):
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.1, 0.875))#Otherwise, weird space...
    ##Add legend annotation
    x_coord = [0.1, 0.1, 0.9, 0.9]
    y_coord = [0.25, 0.7, 0.7, 0.25]
    h_align = ['right', 'right', 'left', 'left']
    v_align = ['top', 'bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = regulation+' modules (vs. Control)'
    ax.set_title(title, fontsize='medium')
    ##Save
    if regulation!='Changed':
        fileDir = './ExportFigures/'
        ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
        fileName = 'RCI-venn-'+regulation.lower()+'.pdf'
        plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()

In [None]:
#Export module list in each subset in the venn diagram
for regulation in ['Tightened', 'Loosened']:
    #Prepare module sets
    tempD = {}
    tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
    for col_n in tempDF.columns.tolist():
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if regulation=='Changed':
            tempS2 = tempS2.loc[(tempS1<0.05)]
        elif regulation=='Tightened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif regulation=='Loosened':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempD[col_n] = set(tempS2.index.tolist())
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()#Based on adjusted ANOVA P-values
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(tempDF)
    print(regulation+' modules (vs. Control):')
    print(' -> Not significant in all contrasts:', count)
    
    #Skip the followings if no significant module
    if count==len(pvalDF):
        continue
    
    #Prepare a new .xlsx file (dummy README)
    tempL1 = [len(tempD[key]) for key in tempD.keys()]
    tempDF = pd.DataFrame({'Group':tempD.keys(), 'nModules':tempL1})
    tempDF = tempDF.reset_index().rename(columns={'index':'VennOrder'})
    tempDF['VennOrder'] = tempDF['VennOrder'] + 1
    fileDir = './ExportData/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-venn-'+regulation.lower()+'.xlsx'
    tempDF.to_excel(fileDir+ipynbName+fileName, sheet_name='README', header=True, index=False)
    display(tempDF)#Check
    
    #Prepare saving data (statDF spread by dataset)
    tempDF = moduleDF['ModuleName']#pd.Series() for now
    for dataset in ['PrOmics', 'TrOmics']:
        tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
        tempDF1.columns = dataset+'_'+tempDF1.columns
        tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
    
    t_start = time.time()
    #Extract overall set
    for key_i in range(len(tempD)):
        key = list(tempD.keys())[key_i]
        tempS = tempD[key]
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        tempL1 = ['NA' for i in range(len(tempD))]
        tempL1[key_i] = '1'
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    #Extract subset
    tempL1 = ['1', '0']
    tempL2 = [[k1, k2, k3, k4] for k1 in tempL1 for k2 in tempL1 for k3 in tempL1 for k4 in tempL1]
    #tempL2.remove(['0', '0', '0', '0'])
    for tempL1 in tempL2:
        #Positive module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='1']
        tempS1 = set(pvalDF.index.tolist())#Initialize
        for tempS in tempL3:
            tempS1 = tempS1 & tempS
        #Negative module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='0']
        tempS2 = set()#Initialize
        for tempS in tempL3:
            tempS2 = tempS2 | tempS
        #Extract subset
        tempS = tempS1 - tempS2
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    t_elapsed = time.time() - t_start
    print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

### 3-5. Visualization: pointplot

In [None]:
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for dataset in ['PrOmics', 'TrOmics']:
    tempDF1 = statDF.loc[dataset].drop(columns=['ModuleName'])
    tempDF1.columns = dataset+'_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
display(tempDF)

statDF_flatten = tempDF

#### 3-5-1. Modules tightened by all interventions and datasets

In [None]:
posL = ['PrOmics_Aca-vs-Ctrl', 'PrOmics_Rapa-vs-Ctrl', 'TrOmics_Aca-vs-Ctrl', 'TrOmics_Rapa-vs-Ctrl']
negL = ['']
regulation = 'Tightened'
nPlots = 10
sort_varL = ['PrOmics_ANOVA_AdjPval', 'TrOmics_ANOVA_AdjPval']

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-2. Modules tightened by all interventions specifically in proteomics

In [None]:
posL = ['PrOmics_Aca-vs-Ctrl', 'PrOmics_Rapa-vs-Ctrl']
negL = ['TrOmics_Aca-vs-Ctrl', 'TrOmics_Rapa-vs-Ctrl']
regulation = 'Tightened'
nPlots = 10
sort_varL = ['PrOmics_ANOVA_AdjPval', 'TrOmics_ANOVA_AdjPval']

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-3. Modules tightened by all interventions specifically in transcriptomics

> Because proteins are the players for cellular functions, this module is not important even if positive, though.  

In [None]:
posL = ['TrOmics_Aca-vs-Ctrl', 'TrOmics_Rapa-vs-Ctrl']
negL = ['PrOmics_Aca-vs-Ctrl', 'PrOmics_Rapa-vs-Ctrl']
regulation = 'Tightened'
nPlots = 10
sort_varL = ['PrOmics_ANOVA_AdjPval', 'TrOmics_ANOVA_AdjPval']

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-4. Modules tightened by Aca in both datasets

In [None]:
posL = ['PrOmics_Aca-vs-Ctrl', 'TrOmics_Aca-vs-Ctrl']
negL = ['']
regulation = 'Tightened'
nPlots = 3
sort_varL = [col_prefix+'_AdjPval' for col_prefix in posL]

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-5. Modules tightened by Aca specifically in proteomics

In [None]:
posL = ['PrOmics_Aca-vs-Ctrl']
negL = ['TrOmics_Aca-vs-Ctrl']
regulation = 'Tightened'
nPlots = 3
sort_varL = [col_prefix+'_AdjPval' for col_prefix in posL]

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-6. Modules tightened by Rapa in both datasets

In [None]:
posL = ['PrOmics_Rapa-vs-Ctrl', 'TrOmics_Rapa-vs-Ctrl']
negL = ['']
regulation = 'Tightened'
nPlots = 3
sort_varL = [col_prefix+'_AdjPval' for col_prefix in posL]

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-7. Modules tightened by Rapa specifically in proteomics

In [None]:
posL = ['PrOmics_Rapa-vs-Ctrl']
negL = ['TrOmics_Rapa-vs-Ctrl']
regulation = 'Tightened'
nPlots = 3
sort_varL = [col_prefix+'_AdjPval' for col_prefix in posL]

#Prepare the target module set
tempS = pd.Series(np.repeat(True, len(pvalDF)), index=pvalDF.index)#Initialize
tempDF = pvalDF.loc[:, pvalDF.columns.str.contains('-vs-')]#Based on adjusted ANOVA P-values
for col_n in tempDF.columns.tolist():
    tempS1 = pvalDF[col_n]
    tempS2 = diffDF[col_n]
    if col_n in posL:
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if regulation=='Tightened':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif regulation=='Loosened':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), regulation.lower()+' modules with significance in', posL, 'but not in', negL)

#Select representatives and sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    #plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

#### 3-5-8. Manually selected modules

In [None]:
#Manual selection based on all the DIRAC results
##Previous list
tempL = ['GO:0006635', 'GO:0031998', 'GO:0016558', 'GO:0006625',#Common in M001
         'GO:0006637', 'GO:0006734', 'GO:0001561',#Related to fatty acid oxidation (mentioned in earlier version)
         'GO:0023035', 'GO:0039529',#Immune response (mentioned in earlier version)
         'GO:0010638', 'GO:0002181', 'GO:0045899',#Mentioned in earlier version
         'GO:0006635', 'GO:1990126', 'GO:0098761',#Common in proteomics and transcriptomics
         'GO:0061732', 'GO:0006086', 'GO:0019441',#Specific to proteomics (Aca)
         'GO:0031998', 'GO:0044794', 'GO:0006625', 'GO:0016558', 'GO:0019441', 'GO:0033572',#Specific to proteomics (Rapa)
         'GO:0006635', 'GO:0002181', 'GO:0000028', 'GO:0016558', 'GO:0006734', 'GO:0098761',#Common in M001 + M004
         'GO:0015986', 'GO:0042776', 'GO:0070934', 'GO:0006703', 'GO:0034354', 'GO:0045899']#Similar to 4EGI-1
print('Previous list:', len(tempL))
##New list (based on adjusted dataset and parametric test)
###Common among interventions in M001 proteomics (version 7-2)
tempS1 = {'GO:0017144', 'GO:0031998', 'GO:0006102', 'GO:0006749', 'GO:0006625',
          'GO:0016558', 'GO:0006635', 'GO:0034063', 'GO:0006103', 'GO:0042866',
          'GO:0002181', 'GO:0098761', 'GO:0006637', 'GO:0140374', 'GO:1902416'}
print('Common among interventions in M001 proteomics:', len(tempS1))
###Common among interventions in M001 proteomics and transcriptomics (version 3-1)
tempS2 = {'GO:0016558', 'GO:0006749', 'GO:0006635', 'GO:0098761', 'GO:0150093',
          'GO:0043691', 'GO:0006796', 'GO:0006734', 'GO:0015909'}
print('Common among interventions in M001 proteomics and transcriptomics:', len(tempS2))
###Common among interventions but specifically in M001 proteomics (version 3-1)
tempS3 = {'GO:0006625', 'GO:0031998', 'GO:0042866', 'GO:0033572', 'GO:0000028'}
print('Common among interventions but specifically in M001 proteomics:', len(tempS3))
###Similar to M004 4EGI-1 in any intervention of M001 proteomics (version 3-1)
tempS4 = {'GO:0010637', 'GO:0006102', 'GO:0006739', 'GO:0046826', 'GO:0023035'}
print('Similar to M004 4EGI-1 in any intervention of M001 proteomics:', len(tempS4))
###Dissimilar to M004 4EGI-1 in any intervention of M001 proteomics (version 3-1)
tempS5 = {'GO:0044794', 'GO:0034372', 'GO:0006544', 'GO:0008210', 'GO:0000038',
          'GO:0072378', 'GO:0070189'}
print('Dissimilar to M004 4EGI-1 in any intervention of M001 proteomics:', len(tempS5))
##Merge
tempL = [item for sublist in [tempL, tempS1, tempS2, tempS3, tempS4, tempS5] for item in sublist]

#Resolve duplicates and judge whether it was included in this analysis
tempL = list(set(tempL))
tempL.sort()
for module in tempL:
    if module not in moduleDF.index.tolist():
        print(module+' was NOT included in this analysis.')
        tempL.remove(module)

print('Assessed modules:', len(tempL))
display(moduleDF.loc[moduleDF.index.isin(tempL)])

targetL = tempL

In [None]:
tempL = targetL
topX = len(targetL)
nPlots = len(targetL)
sort_varL = ['PrOmics_ANOVA_AdjPval', 'TrOmics_ANOVA_AdjPval']

#Sort the target modules
tempDF = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=sort_varL, ascending=True)
#topX = np.min([30, len(tempL)])
print('Top', topX, 'modules (sort by', sort_varL, '):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:np.min([nPlots, len(tempL)])]

#Prepare RMS DF for plot
tempDF1 = rmsDF_kk.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars='ModuleID')
tempDF = sampleDF.reset_index()[['SampleID', 'Dataset', 'Intervention']]
tempDF1 = pd.merge(tempDF1, tempDF, on='SampleID', how='left')

#Prepare label and color
tempD0 = {'Ctrl':'Control', 'Aca':'Acarbose', 'Rapa':'Rapamycin',
          'PrOmics':'Proteins', 'TrOmics':'Transcripts'}
tempDF1['Dataset'] = tempDF1['Dataset'].map(tempD0)
tempDF1['Group'] = tempDF1['Intervention'].map(tempD0)
tempD1 = {'Control':'tab:blue', 'Acarbose':'tab:red', 'Rapamycin':'tab:purple'}
tempD2 = {'Proteins':plt.get_cmap('tab20')(13), 'Transcripts':plt.get_cmap('tab20')(19)}

#Prepare P-value DF for plot
tempDF2 = statDF_flatten.loc[:, statDF_flatten.columns.str.contains('-vs-.*_AdjPval$')]
tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')

#Visualize each module
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by', sort_varL, '):')
    module = plotL[rank_i]
    #Check module summary
    tempDF = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF)
    
    #Select RMS
    tempDF3 = tempDF1.loc[tempDF1['ModuleID']==module]
    
    #Check RMS summary
    tempDF = tempDF3.groupby(['Dataset', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF.index.tolist():
        count, mean, std = tempDF.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF['0.025'] = tempL1
    tempDF['0.975'] = tempL2
    ##Multiindex sort
    tempDF = tempDF.reset_index()
    tempDF['Dataset'] = pd.Categorical(tempDF['Dataset'], categories=list(tempD2.keys()))
    tempDF['Group'] = pd.Categorical(tempDF['Group'], categories=list(tempD1.keys()))
    tempDF = tempDF.sort_values(by=['Dataset', 'Group']).set_index(['Dataset', 'Group'])
    display(tempDF)
    
    #Prepare significance labels
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF = tempS.index.to_series().str.split(pat='_', expand=True)
    tempDF = tempDF.rename(columns={0:'Dataset', 1:'Comparison'})
    tempS1 = tempDF['Dataset']
    tempDF = tempDF['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF = tempDF.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF = pd.merge(tempS1, tempDF, left_index=True, right_index=True, how='left')
    tempDF4 = pd.merge(tempDF, tempS, left_index=True, right_index=True, how='left')
    tempDF4['Dataset'] = tempDF4['Dataset'].map(tempD0)
    tempDF4['Contrast'] = tempDF4['Contrast'].map(tempD0)
    tempDF4['Baseline'] = tempDF4['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF4)):
        pval = tempDF4['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF4['SignifLabel'] = tempL
    display(tempDF4)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.275
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(4.75, 5), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[2, 2]})
    for ax_i, ax in enumerate(axes.flat):
        dataset = list(tempD2.keys())[ax_i]
        tempDF = tempDF3.loc[tempDF3['Dataset']==dataset]
        sns.pointplot(data=tempDF, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS\n(Mean = Module RCI)')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF = tempDF4.loc[tempDF4['Dataset']==dataset]
        for row_i in range(len(tempDF)):
            #Baseline
            group_0 = tempDF['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title(dataset, {'fontsize':'small'})
        xoff = 0.025
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.1,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[dataset],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
        if ax_i==1:
            at = AnchoredText('Consensus: own group', loc='center',
                              bbox_to_anchor=(1.15, 0.505), bbox_transform=ax.transAxes,#Minor manual adjustment
                              frameon=False, prop={'size':'small', 'rotation':90})
            ax.add_artist(at)
            rect = plt.Rectangle((1+xoff, yoff), 0.225, 1-2*yoff,#Manual adjustment
                                 transform=ax.transAxes, facecolor=plt.get_cmap('tab20')(15),
                                 clip_on=False, linewidth=0, zorder=0.5)
            ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 40))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='bottom', horizontalalignment='center', wrap=True, y=0.93)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '230224_LC-M001-PrOmics-vs-TrOmics-DIRAC-ver3_GOBP_'
    fileName = 'RCI-pointplot-'+module.replace('GO:', 'GO')+'.pdf'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04, transparent=True)
    plt.show()
    print('')

## 5. Rank matching score under a fixed consensus: inter-group module comparison

> The comparison of RMSs under a fixed rank consensus is skipped, since the pattern similarity across datasets is out of current interest.  

# — End of notebook —