# DIRAC Analysis of LC M004 Proteomics with LC M001 Proteomics — DIRAC with GOBP Modules

***by Kengo Watanabe***  

In this notebook, the differential rank conservation (DIRAC; Eddy, J.A. et al. PLoS Comput. Biol. 2010) analysis is performed on the preprocessed Longevity Consortium (LC) M001 and M004 proteomics datasets (analytes detected in all samples; sample-based robust Z-score followed by analyte-based robust Z-score) using the retrieved a priori module set (Gene Ontology (Biological Process) derived by EMBL-EBI QuickGO API; ≥4 analytes and ≥50% coverage). To directly compare the DIRAC results between M001 and M004, each rank consensus is utilized across the datasets.  
> To maintain the consistency with the other DIRAC analyses, statistical tests are performed in a different notebook with R kernel.  

Input:  
* Preprocessed analyte data (M001): 210126_LCprotomics-M001-DIRAC-ver6_preprocessing_cleaned-robustZscored-data.tsv  
* Preprocessed analyte data (M004): 210125_LCprotomics-M004-DIRAC_preprocessing_cleaned-robustZscored-data.tsv  
* Module–analyte metadata: 220525_LCproteomics-M004-DIRAC_Preprocessing-with-M001_ver2-2_module-metadata_QuickGO-GOBP-min-n4-cov50.tsv  
* Analyte metadata: 220525_LCproteomics-M004-DIRAC_Preprocessing-with-M001_ver2-2_analyte-metadata_UniProt.tsv  
* Sample–mouse metadata (M001): 210126_LCprotomics-M001-DIRAC-ver6_preprocessing_metadata-sample.tsv  
* Sample–mouse metadata (M004): 210125_LCprotomics-M004-DIRAC_preprocessing_metadata-sample.tsv  
* Statistical test summary: 220529_LCproteomics-M004-DIRAC_StatisticalTest-GOBP-with-M001_ver2-3_inter-group-comparison.xlsx (Supplementary Data 7)  

Output:  
* Cleaned module metadata, which is incorporated into Supplementary Data 7 in R sub-notebook  
* Combined sample–mouse metadata, which is used in R sub-notebook  
* DIRAC measures, which are used in statistical analysis (R sub-notebook)  
* Figure 5e–g  
* Supplementary Figure 3e, f, h, i  

Original notebook (memo for my future tracing):  
* dalek:[JupyterLab HOME]/210125_LCproteomics-M004-DIRAC/220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3.ipynb  
* dalek:[JupyterLab HOME]/210125_LCproteomics-M004-DIRAC/220606_LCproteomics-M004-DIRAC-ver2-3_Supplement.ipynb  

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
#For Arial font
#!conda install -c conda-forge -y mscorefonts
##-> The below was also needed in matplotlib 3.4.2
#import shutil
#import matplotlib
#shutil.rmtree(matplotlib.get_cachedir())
import warnings
warnings.filterwarnings('ignore')
from IPython.display import display
import time

from itertools import combinations
import math
from multiprocessing import Pool
from decimal import Decimal, ROUND_HALF_UP
import re
import matplotlib.patches as mpatches
#!pip install venn
from venn import venn
#!conda install -c conda-forge -y matplotlib-venn
from matplotlib_venn import venn3, venn3_circles, venn2, venn2_circles
from textwrap import wrap

!conda list

## 0. DIRAC code

> The original code for DIRAC was written in MATLAB. Hence, it is re-written in Python 3 here.  
> <– I don't care about computational cost here; rather, the code adheres to the story in the original paper.  

In [None]:
def network_ranking(DF, networkS):
    # This function calculates the pairwise ordering of network genes (i.e., network ranking).
    ## Ref. Eddy, J. A. et al. PLoS Comput. Biol. 2010 (Figure 1 at a glance)
    # Requirements:
    ## import numpy as np: confirmed with versions 1.17.5 and 1.21.1
    ## import pandas as pd: confirmed with versions 0.25.3 and 1.3.1
    ## from itertools import combinations: confirmed with Python 3.7.6 and 3.9.6
    # Input:
    ## DF: pd.DataFrame containing expression values (X_gn) with gene (g; 1-G) indices and sample (n; 1-N) columns
    ## networkS: pd.Series containing genes (g; 1-G) with network (m; 1-M) indices (i.e., 1-on-1 long-format)
    # Output:
    ## pd.DataFrame containing binary values of the networking ranking comparison for each sample
    ## -> row: comparison_id (indicated by network (m) and ordering (g_i < g_j) columns)
    ## -> column: sample (n) (with exception of the NetworkID and Ordering columns)
    # Note:
    ## If items in network m and gene g contain ' : ' or ' < ', this code would produce error or unexpected output.
    ## If an item in sample n is 'NetworkID' or 'Ordering', this code would produce error or unexpected output.
    
    #Calculate binary values of the network ranking comparison for each sample
    rankDF = pd.DataFrame()
    sampleL = DF.columns.tolist()
    networkL = networkS.index.unique()
    for n in sampleL:
        rankDF_n = pd.DataFrame()
        for m in networkL:
            #Pairs of genes (g_i, g_j) in the network m
            networkS_m = networkS.loc[m]
            pairL_m = list(combinations(range(0, len(networkS_m)), 2))
            pairL_m_i, pairL_m_j = [[pair[x] for pair in pairL_m] for x in (0,1)]
            rankDF_m = pd.DataFrame({'g_i':networkS_m.iloc[pairL_m_i],
                                     'g_j':networkS_m.iloc[pairL_m_j]})#Hold network (m; 1-M) index
            #Compare the expression values (X_gn) between pairwise genes (g_i vs. g_j)
            tempL = []
            for pair_i in range(0, len(rankDF_m)):
                g_i = rankDF_m.iloc[pair_i, 0]
                g_j = rankDF_m.iloc[pair_i, 1]
                X_i = DF.loc[g_i, n]
                X_j = DF.loc[g_j, n]
                #If X_i < X_j is true, add 1; otherwise (X_i >= X_j), add 0
                if X_i < X_j:
                    tempL.append(1)
                else:
                    tempL.append(0)
            rankDF_m['X_i<X_j'] = tempL
            #Update the network ranking dataframe of sample n
            rankDF_m.index.set_names('NetworkID', inplace=True)#Set/reset index name
            rankDF_m = rankDF_m.reset_index()
            rankDF_n = pd.concat([rankDF_n, rankDF_m], axis=0)
        #Updata the network ranking dataframe of all samples
        rankDF_n['Sample'] = n
        rankDF = pd.concat([rankDF, rankDF_n], axis=0)
    ##Prepare dummy index and clean dataframe
    rankDF['ComparisonID'] = rankDF['NetworkID'] + ' : ' + rankDF['g_i'] + ' < ' + rankDF['g_j']
    rankDF = rankDF.pivot(index='ComparisonID', columns='Sample', values='X_i<X_j')#Sorted by index during this
    rankDF = rankDF.reset_index()#Index becomes row number here
    tempDF = rankDF['ComparisonID'].str.split(pat=' : ', expand=True)
    tempDF = tempDF.rename(columns={0:'NetworkID', 1:'Ordering'})
    rankDF = pd.concat([tempDF, rankDF], axis=1)#Dropping columns name 'Sample' during this
    rankDF = rankDF.drop(columns='ComparisonID')
    return rankDF

def rank_template(rankDF, phenotypeS):
    # This function generates the rank template (T) presenting the expected network ranking in a phenotype.
    ## Ref. Eddy, J. A. et al. PLoS Comput. Biol. 2010 (Figure 1 at a galance)
    # Requirements:
    ## import numpy as np: confirmed with versions 1.17.5 and 1.21.1
    ## import pandas as pd: confirmed with versions 0.25.3 and 1.3.1
    # Input:
    ## rankDF: pd.DataFrame obtained from the above network_ranking() function
    ## phenotypeS: pd.Series containing phenotypes (k; 1-K) with sample (n; 1-N) indices (i.e., 1-on-1 long-format)
    # Output:
    ## pd.DataFrame containing the expected binary values of network ranking comparison for each phenotype (T_mk)
    ## -> row: comparison_id (indicated by network (m) and ordering (g_i < g_j) columns)
    ## -> column: phenotype (k) (with exception of the NetworkID and Ordering columns)
    # Note:
    ## True rate = 0.5 is assigned to 0 in this code.
    ## If an item in phenotype k is 'NetworkID' or 'Ordering', this code would produce error or unexpected output.
    
    #Calculate the expected binary values of network ranking comparison for each phenotype (T_mk)
    templateDF = rankDF[['NetworkID', 'Ordering']]
    phenotypeL = phenotypeS.unique().tolist()
    for k in phenotypeL:
        sampleL_k = phenotypeS.loc[phenotypeS==k].index.tolist()
        tempDF = rankDF[sampleL_k]
        tempS = tempDF.mean(axis=1)#True (=1) rate
        tempS = (tempS>0.5).astype('int64')#If true rate > 0.5, 1; otherwise (<= 0.5), 0
        templateDF[k] = tempS
    return templateDF

def rank_matching_score(rankDF, templateDF):
    # This function calculates the rank matching score (R) of a sample.
    ## Ref. Eddy, J. A. et al. PLoS Comput. Biol. 2010 (Figure 1 at a galance)
    # Requirements:
    ## import numpy as np: confirmed with versions 1.17.5 and 1.21.1
    ## import pandas as pd: confirmed with versions 0.25.3 and 1.3.1
    # Input:
    ## rankDF: pd.DataFrame obtained from the above network_ranking() function
    ## templateDF: pd.DataFrame obtained from the above rank_template() function
    # Output:
    ## pd.DataFrame containing the rates of gene pairs matching to a rank template in a sample (R_mkn)
    ## -> row: rank template (T_mk) (indicated by network (m) and template phenotype (k) columns)
    ## -> column: sample (n) (with exception of the NetworkID and Template columns)
    # Note:
    ## True rate = 0.5 was assigned to 0 in the above rank_template() function.
    ## -> 'Match (1)' and 'Mismatch (0)' are evenly assigned to samples in the phenotype.
    ##    (i.e., 'Match' for (X_i < X_j) = 0 and 'Mismatch' for (X_i < X_j) = 1 in the tie case)
    ## If items in network m and phenotype k contain ' : ', this code would produce error or unexpected output.
    
    #Calculate the rates of gene pairs matching to a rank template in a sample (R_mkn)
    scoreDF = pd.DataFrame()
    sampleL = rankDF.drop(columns=['NetworkID', 'Ordering']).columns.tolist()
    phenotypeL = templateDF.drop(columns=['NetworkID', 'Ordering']).columns.tolist()
    for n in sampleL:
        scoreDF_n = pd.DataFrame()
        for k_template in phenotypeL:
            tempDF = rankDF[['NetworkID', 'Ordering']]
            tempS = (rankDF[n]==templateDF[k_template]).astype('int64')#If matching, 1; otherwise, 0
            tempDF['Match'] = tempS
            #Calculate the rank matching score
            scoreDF_k = tempDF.groupby(by='NetworkID', as_index=False, sort=False).mean()
            #Update the rank matching score dataframe of sample n
            scoreDF_k['Template'] = k_template
            scoreDF_k = scoreDF_k[['NetworkID', 'Template', 'Match']]
            scoreDF_n = pd.concat([scoreDF_n, scoreDF_k], axis=0)
        #Update the rank matching score dataframe of all samples
        scoreDF_n['Sample'] = n
        scoreDF = pd.concat([scoreDF, scoreDF_n], axis=0)
    ##Prepare dummy index and clean dataframe
    scoreDF['RMSmkID'] = scoreDF['NetworkID'].str.cat(scoreDF['Template'], sep=' : ')
    scoreDF = scoreDF.pivot(index='RMSmkID', columns='Sample', values='Match')#Sorted by index during this
    scoreDF = scoreDF.reset_index()#Index becomes row number here
    tempDF = scoreDF['RMSmkID'].str.split(pat=' : ', expand=True)
    tempDF = tempDF.rename(columns={0:'NetworkID', 1:'Template'})
    scoreDF = pd.concat([tempDF, scoreDF], axis=1)#Dropping columns name 'Sample' during this
    scoreDF = scoreDF.drop(columns='RMSmkID')
    return scoreDF

def rank_conservation_index(scoreDF, phenotypeS):
    # This function calculates the rank conservation index (muR) of a phenotype.
    ## Ref. Eddy, J. A. et al. PLoS Comput. Biol. 2010 (Figure 1 at a galance)
    # Requirements:
    ## import numpy as np: confirmed with versions 1.17.5 and 1.21.1
    ## import pandas as pd: confirmed with versions 0.25.3 and 1.3.1
    # Input:
    ## scoreDF: pd.DataFrame obtained from the above rank_matching_score() function
    ## phenotypeS: pd.Series containing phenotypes (k; 1-K) with sample (n; 1-N) indices (i.e., 1-on-1 long-format)
    # Output:
    ## pd.DataFrame containing the mean values of rank matching scores (RMSs) in a phenotype (muR_mkk)
    ## -> row: rank template (T_mk) (indicated by network (m) and template phenotype (k) columns)
    ## -> column: phenotype (k) (with exception of the NetworkID and Template columns)
    # Note:
    ## Rank conservation index (RCI) is used with a broader stance here, but following a strict interpretation,
    ## the term RCI is used for the mean of RMSs only when template phenotype is same with the sample phenotype.
    ## If an item in phenotype k is 'NetworkID' or 'Template', this code would produce error or unexpected output.
    
    #Calculate the mean values of RMSs in a phenotype (muR_mkk)
    conservationDF = scoreDF[['NetworkID', 'Template']]
    phenotypeL = phenotypeS.unique().tolist()
    for k in phenotypeL:
        sampleL_k = phenotypeS.loc[phenotypeS==k].index.tolist()
        tempS = scoreDF[sampleL_k].mean(axis=1)
        conservationDF[k] = tempS
    return conservationDF

## 1. Prepare dataset and metadata

### 1-1. Analyte data

In [None]:
#Import M001 analyte data
fileDir = '../210126_LCproteomics-M001-DIRAC-ver6/ExportData/'
ipynbName = '210126_LCprotomics-M001-DIRAC-ver6_preprocessing_'
fileName = 'cleaned-robustZscored-data.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('UniProtID')
print('Original shape of M001 DF:', tempDF1.shape)

#Import M004 analyte data
fileDir = './ExportData/'
ipynbName = '210125_LCprotomics-M004-DIRAC_preprocessing_'
fileName = 'cleaned-robustZscored-data.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('UniProtID')
print('Original shape of M004 DF:', tempDF2.shape)

#Merge and extract common analytes
tempDF = pd.merge(tempDF1, tempDF2, left_index=True, right_index=True, how='inner')
display(tempDF)

analyteDF = tempDF

### 1-2. Module–analyte metadata

In [None]:
#Import module-analyte metadata
fileDir = './ExportData/'
ipynbName = '220525_LCproteomics-M004-DIRAC_Preprocessing-with-M001_ver2-2_'
fileName = 'module-metadata_QuickGO-GOBP-min-n4-cov50.tsv'
tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t').set_index('ModuleID')
print(' - Unique analytes with module:', len(tempDF['UniProtID'].unique()))
print(' - Unique modules with analytes:', len(tempDF.index.unique()))

#Prepare moduleS
moduleS = tempDF['UniProtID']
display(moduleS)

#Retrieve module metadata
tempDF = tempDF[['ModuleName', 'nAnalytes', 'nBackgrounds', 'Coverage']]
moduleDF = tempDF.reset_index().drop_duplicates(keep='first').set_index('ModuleID')
display(moduleDF)
display(moduleDF.describe(include='all'))

> –> Add the mapped analytes to module metadata.  

In [None]:
#Import analyte metadata
fileDir = './ExportData/'
ipynbName = '220525_LCproteomics-M004-DIRAC_Preprocessing-with-M001_ver2-2_'
fileName = 'analyte-metadata_UniProt.tsv'
tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF = pd.merge(moduleS.reset_index(), tempDF, on='UniProtID', how='left').set_index('ModuleID')
display(tempDF)
print(' - Unique analytes:', len(tempDF['UniProtID'].unique()))
print(' - Unique labels:', len(tempDF['GeneLabel'].unique()))

#Concatenate labels
t_start = time.time()
moduleDF['MappedAnalyteIDs'] = ''#Initialize
moduleDF['MappedAnalyteGeneLabels'] = ''#Initialize
for module in moduleDF.index.tolist():
    tempS = tempDF['UniProtID'].loc[tempDF.index.isin([module])]#Retrieve as pd.Series
    label = tempS.str.cat(sep=';')
    moduleDF.loc[module, 'MappedAnalyteIDs'] = label
    tempS = tempDF['GeneLabel'].loc[tempDF.index.isin([module])]#Retrieve as pd.Series
    label = tempS.str.cat(sep=';')
    moduleDF.loc[module, 'MappedAnalyteGeneLabels'] = label
t_elapsed = time.time() - t_start
print('Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

#Clean
moduleDF['ModuleType'] = 'Gene Ontology (Biological Process)'
moduleDF['Source'] = 'EMBL-EBI QuickGO API'
moduleDF = moduleDF[['ModuleName', 'ModuleType', 'MappedAnalyteIDs', 'MappedAnalyteGeneLabels',
                     'nAnalytes', 'nBackgrounds', 'Coverage', 'Source']]
moduleDF = moduleDF.sort_index(ascending=True)

#Save for using in the sub-notebook
fileDir = './ExportData/'
ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
fileName = 'module-metadata.tsv'
moduleDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

display(moduleDF)

### 1-3. Sample–mouse metadata

In [None]:
#Import M001 sample-mouse metadata
fileDir = '../210126_LCproteomics-M001-DIRAC-ver6/ExportData/'
ipynbName = '210126_LCprotomics-M001-DIRAC-ver6_preprocessing_'
fileName = 'metadata-sample.tsv'
tempDF1 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF1 = tempDF1.rename(columns={'MouseID':'SampleID', 'Treatment':'Intervention'})
tempDF1 = tempDF1.loc[tempDF1['SampleID'].isin(analyteDF.columns)]
tempDF1 = tempDF1.set_index('SampleID')
tempDF1['Dataset'] = 'M001'

#Import M004 sample-mouse metadata
fileDir = './ExportData/'
ipynbName = '210125_LCprotomics-M004-DIRAC_preprocessing_'
fileName = 'metadata-sample.tsv'
tempDF2 = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
tempDF2 = tempDF2.rename(columns={'MouseID':'SampleID', 'Treatment':'Intervention'})
tempDF2 = tempDF2.loc[tempDF2['SampleID'].isin(analyteDF.columns)]
tempDF2 = tempDF2.set_index('SampleID')
tempDF2['Dataset'] = 'M004'

#Merge
tempDF = pd.concat([tempDF1, tempDF2], axis=0)
tempDF = tempDF[['Dataset', 'Intervention', 'Sex']]

#Prepare phenotypeS and re-label control group
tempDF['Phenotype'] = tempDF['Intervention']
for row_i in range(len(tempDF)):
    phenotype = tempDF['Phenotype'].iloc[row_i]
    if phenotype=='Control':
        dataset = tempDF['Dataset'].iloc[row_i]
        if dataset=='M001':
            tempDF['Phenotype'].iloc[row_i] = 'Control (M001)'
        elif dataset=='M004':
            tempDF['Phenotype'].iloc[row_i] = 'Control (M004)'

display(tempDF)
display(tempDF['Phenotype'].value_counts())

#Save for using in the sub-notebook
fileDir = './ExportData/'
ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
fileName = 'sample-metadata.tsv'
tempDF.to_csv(fileDir+ipynbName+fileName, index=True, sep='\t')

sampleDF = tempDF

## 2. Perform DIRAC with sex-pooled rank consensus

> When applied to overall dataset, the DIRAC code often stops runnning due to memory limit (e.g., transcriptomics).  
> –> Divide moduleS into subsets while considering the number of comparisons, and compute DIRAC in parallel.  

In [None]:
#Divide moduleS
cutoff = 250#The maximum number of analytes per module
tempDF = moduleDF.sort_values(by='nAnalytes', ascending=True)
nwsubL = []
tempL = []#Initialize
count = 0#Initialize
for module in tempDF.index.tolist():
    nanalytes = tempDF.loc[module, 'nAnalytes']
    if nanalytes>cutoff:
        nwsubL.append([module])
    else:
        tempL.append(module)
        count += nanalytes
        if count>cutoff:
            nwsubL.append(tempL)
            tempL = []#Initialize
            count = 0#Initialize
if len(tempL)>0:#The last one but still count<=cutoff
    nwsubL.append(tempL)
nSub = len(nwsubL)
print('nSublists: ', nSub)
print('nModules per sublist:', [len(sublist) for sublist in nwsubL])

#Check nAnalytes distribution
tempS = pd.Series(name='Subset')
for list_i in range(nSub):
    tempL = moduleS.loc[nwsubL[list_i]].index.unique().tolist()
    tempS1 = pd.Series(np.repeat('Subset '+str(list_i+1).zfill(3), len(tempL)), index=tempL, name='Subset')
    tempS = pd.concat([tempS, tempS1], axis=0)
tempS.index.set_names('ModuleID', inplace=True)
tempDF = pd.merge(moduleDF, tempS, left_index=True, right_index=True, how='left')
tempDF = tempDF.sort_values(by='nAnalytes', ascending=False)
display(tempDF)
tempDF = tempDF.groupby('Subset').agg({'nAnalytes':[len, sum, max, np.median]})
tempDF = tempDF.sort_values(by=('nAnalytes', 'sum'), ascending=False)
print('Subset summary:')
display(tempDF.describe())
print(' -> Check subset which will need high computational cost:')
display(tempDF.loc[tempDF[('nAnalytes', 'max')]>cutoff])

In [None]:
nprocessors = 4
fileDir = './ExportData/'
ipynbName = '210128_LCproteomics-M004-DIRAC_M001-M004-common-DIRAC-GOBP_'

#Wrap as a single function
def parallel_dirac(list_i):
    tempS = moduleS.loc[nwsubL[list_i]]
    
    #Calculate the pairwise ordering of network genes (i.e., network ranking)
    t_start1 = time.time()
    rankingDF = network_ranking(analyteDF, tempS)
    t_elapsed1 = time.time() - t_start1
    
    #Generate the rank template (T) presenting the expected network ranking in a phenotype
    t_start2 = time.time()
    templateDF = rank_template(rankingDF, sampleDF['Phenotype'])
    t_elapsed2 = time.time() - t_start2
    
    #Calculate the rank matching score (R) of a sample
    t_start3 = time.time()
    rmsDF = rank_matching_score(rankingDF, templateDF)
    t_elapsed3 = time.time() - t_start3
    
    #Calculate the rank conservation index (muR) of a phenotype
    t_start4 = time.time()
    rciDF = rank_conservation_index(rmsDF, sampleDF['Phenotype'])
    t_elapsed4 = time.time() - t_start4
    
    #Save
    fileName = 'NetworkRanking-'+str(list_i+1).zfill(3)+'.tsv'
    rankingDF.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')
    fileName = 'RankTemplate-BS-'+str(list_i+1).zfill(3)+'.tsv'
    templateDF.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')
    fileName = 'RankMatchingScore-BS-'+str(list_i+1).zfill(3)+'.tsv'
    rmsDF.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')
    fileName = 'RankConservationIndex-BS-'+str(list_i+1).zfill(3)+'.tsv'
    rciDF.to_csv(fileDir+ipynbName+fileName, index=False, sep='\t')
    
    #Check results
    print('Subset '+str(list_i+1).zfill(3)+':',
          len(tempS.index.unique()), 'modules,', len(tempS.unique()), 'analytes')
    print('• Network ranking dataframe:')
    print('  - DF shape:', rankingDF.shape)
    print('  - Elapsed time:', round(t_elapsed1//60), 'min', round(t_elapsed1%60, 1), 'sec')
    print('• Rank template dataframe:')
    print('  - DF shape:', templateDF.shape)
    print('  - Elapsed time:', round(t_elapsed2//60), 'min', round(t_elapsed2%60, 1), 'sec')
    print('• Rank matching score dataframe:')
    print('  - DF shape:', rmsDF.shape)
    print('  - Elapsed time:', round(t_elapsed3//60), 'min', round(t_elapsed3%60, 1), 'sec')
    print('• Rank conservation index dataframe:')
    print('  - DF shape:', rciDF.shape)
    print('  - Elapsed time:', round(t_elapsed4//60), 'min', round(t_elapsed4%60, 1), 'sec')
    print('')

#Parallel computing
if __name__=='__main__':
    t_start = time.time()
    p = Pool(nprocessors)
    p.map(parallel_dirac, range(nSub))
    t_finish = time.time()

In [None]:
#Record as reference
print(nSub, 'sublists with', nprocessors, 'processors')
print(' - Start:', time.ctime(t_start))
print(' - Finish:', time.ctime(t_finish))
t_elapsed = (t_finish - t_start)
print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')
t_elapsed = (t_finish - t_start) * nprocessors
print(' - Total (approximate) elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

In [None]:
#Combine each result
rmsDF = pd.DataFrame()
rciDF = pd.DataFrame()
for list_i in range(nSub):
    fileName = 'RankMatchingScore-BS-'+str(list_i+1).zfill(3)+'.tsv'
    tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
    tempDF = tempDF.rename(columns={'NetworkID':'ModuleID'})
    rmsDF = pd.concat([rmsDF, tempDF], axis=0, ignore_index=True)
    
    fileName = 'RankConservationIndex-BS-'+str(list_i+1).zfill(3)+'.tsv'
    tempDF = pd.read_csv(fileDir+ipynbName+fileName, sep='\t')
    tempDF = tempDF.rename(columns={'NetworkID':'ModuleID'})
    rciDF = pd.concat([rciDF, tempDF], axis=0, ignore_index=True)

print('• Rank matching score dataframe:')
display(rmsDF)
print('')
print('• Rank conservation index dataframe:')
display(rciDF)

## 3. Rank conservation index: general pattern

### 3-1. Extract RCI (the mean of RMSs under the own phenotype consensus)

In [None]:
#Extract RCI whose template phenotype corresponds to the own phenotype
phenotypeL = rciDF.drop(columns=['ModuleID', 'Template']).columns.tolist()
rciDF_kk = pd.DataFrame(index=pd.Index(rciDF['ModuleID'].unique(), name='ModuleID'))
tempDF = rciDF.set_index('ModuleID')
for k in phenotypeL:
    tempS = tempDF[k].loc[tempDF['Template']==k]
    rciDF_kk = pd.merge(rciDF_kk, tempS, left_index=True, right_index=True, how='left')

#Order and re-label
tempD = {'Control (M001)':'Control-1', 'Acarbose':'Acarbose',
         'Estradiol':'17'+r'$\alpha$'+'-Estradiol', 'Rapamycin':'Rapamycin',
         'Control (M004)':'Control-2', '4EGI-1':'4EGI-1'}
rciDF_kk = rciDF_kk[list(tempD.keys())]
rciDF_kk = rciDF_kk.rename(columns=tempD)
display(rciDF_kk)
display(rciDF_kk.describe())

# — †1. Go to the top of the R sub-notebook —  

### 3-2. Repeated Student's t-tests

> In this study, RCI was not normalized (i.e., the expected mean and variance were different between M001 and M004 datasets due to different sample size).  
> –> Therefore, not Dunnett's test but Student's t-test (i.e., t-test with pooled variance) is used because RCI is NOT comparable across datasets (i.e., Control group is different between datasets).  

In [None]:
#Import the summary table
fileDir = './ExportData/'
ipynbName = '220529_LCproteomics-M004-DIRAC_StatisticalTest-GOBP-with-M001_ver2-3_'
fileName = 'inter-group-comparison.xlsx'
sheetName = 'RCImean'
tempDF = pd.read_excel(fileDir+ipynbName+fileName, sheet_name=sheetName, engine='openpyxl')
tempDF = tempDF.set_index('ModuleID')
display(tempDF)

statDF = tempDF

### 3-3. Visualization: boxplot

In [None]:
#Prepare DF and color
tempDF = rciDF_kk.reset_index().melt(var_name='Group', value_name='RCI', id_vars='ModuleID')
tempD = {'Control-1':'tab:blue', 'Acarbose':'tab:red',
         '17'+r'$\alpha$'+'-Estradiol':'tab:green', 'Rapamycin':'tab:purple',
         'Control-2':'tab:blue', '4EGI-1':'tab:orange'}

#Prepare significance labels
##Retrieve statistical significance
module = 'All'
tempS = statDF.loc[module, statDF.columns.str.contains('AdjPval')]
tempS.index = tempS.index.str.replace('_AdjPval', '')
tempS.name = 'AdjPval'
##Clean
tempDF1 = tempS.index.to_series().str.split(pat='-vs-', expand=True)
tempDF1 = tempDF1.rename(columns={0:'Contrast', 1:'Baseline'})
tempDF1 = pd.merge(tempDF1, tempS, left_index=True, right_index=True, how='left')
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempDF1['Contrast'] = tempDF1['Contrast'].map(tempD0)
tempDF1['Baseline'] = tempDF1['Baseline'].map(tempD0)
##Convert p-value to label
tempL = []
for row_i in range(len(tempDF1)):
    pval = tempDF1['AdjPval'].iloc[row_i]
    if pval<0.001:
        tempL.append('***')
    elif pval<0.01:
        tempL.append('**')
    elif pval<0.05:
        tempL.append('*')
    else:
        pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
        tempL.append(r'$P$ = '+str(pval_text))
tempDF1['SignifLabel'] = tempL
##Add the y-position level in figure
tempDF1['YposLevel'] = [0, 1, 2, 2]
display(tempDF1)

#Visualization
ymax = 1.0
ymin = 0.5
yinter = 0.1
ymargin_t = 0.06
ymargin_b = 0.01
aline_ymin = 0.9
aline_ymargin = 0.05
sns.set(style='ticks', font='Arial', context='talk')
plt.figure(figsize=(3, 4))
p = sns.boxplot(data=tempDF, y='RCI', x='Group', order=list(tempD.keys()), palette=tempD, dodge=False,
                showfliers=True, flierprops={'marker':'o', 'markerfacecolor':'gray', 'alpha':0.4},
                showcaps=True, notch=True)
p.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
##Add border line
p.axvline(x=3.5, **{'linestyle':'dotted', 'color':'black', 'zorder':0})
##Add significance labels
lines = p.axes.get_lines()#Line2D: [[Q1, Q1-1.5IQR], [Q3, Q3+1.5IQR], [Q1, Q1], [Q3, Q3], [Med, Med], [flier]]
lines_unit = 5 + int(True)#showfliers=True
for row_i in range(len(tempDF1)):
    #Baseline
    group_0 = tempDF1['Baseline'].iloc[row_i]
    index_0 = list(tempD.keys()).index(group_0)
    whisker_0 = lines[index_0*lines_unit + 1]
    xcoord_0 = whisker_0._x[1]#Q3+1.5IQR
    #ycoord_0 = whisker_0._y[1]#Q3+1.5IQR
    #Contrast
    group_1 = tempDF1['Contrast'].iloc[row_i]
    index_1 = list(tempD.keys()).index(group_1)
    whisker_1 = lines[index_1*lines_unit + 1]
    xcoord_1 = whisker_1._x[1]#Q3+1.5IQR
    #ycoord_1 = whisker_1._y[1]#Q3+1.5IQR
    #Standard point of marker
    xcoord = (xcoord_0+xcoord_1)/2
    #ycoord = max(ycoord_0, ycoord_1)
    ycoord = aline_ymin + aline_ymargin*tempDF1['YposLevel'].iloc[row_i]
    label = tempDF1['SignifLabel'].iloc[row_i]
    #Add annotation lines
    aline_offset = yinter/10
    aline_length = yinter/10 + aline_offset
    plt.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
             [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
             lw=1.5, c='k')
    #Add annotation text
    if label in ['***', '**', '*']:
        text_offset = yinter/25
        p.annotate(label, xy=(xcoord, ycoord+text_offset),
                   horizontalalignment='center', verticalalignment='bottom',
                   fontsize='medium', color='k')
    else:
        text_offset = yinter/5
        p.annotate(label, xy=(xcoord, ycoord+text_offset),
                   horizontalalignment='center', verticalalignment='bottom',
                   fontsize='x-small', color='k')
sns.despine()
plt.xlabel('')
plt.ylabel('Module RCI')
plt.xticks(rotation=70, horizontalalignment='right', verticalalignment='center', rotation_mode='anchor')
##Save
fileDir = './ExportFigures/'
ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
fileName = 'RCI-boxplot.tif'
plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                  pil_kwargs={'compression':'tiff_lzw'})
plt.show()

## 4. Rank matching score under the M004 consensus: inter-group module comparison

> Test specific hypotheses: control RMS mean == intervention RMS mean per the M004-derived rank consensus for each module (i.e., inter-group module comparison).  
>
> 1. Testing the main effect of intervention on rank mathing scores (RMSs) for each module using ANOVA models  
> 2. Then, performing post-hoc comparisons of RMSs between control vs. each intervention using the repeated Student's t-tests  
>
> As well as using sex-pooled rank consensus, sex is NOT included in the ANOVA model. Instead of adding the rank consensus group (i.e., M004 group) and its interaction term to the ANOVA model, stratified ANOVA model is generated for each rank consensus group. Note that ANOVA can shrink the variance utilizing all samples (per module), whose statistical power is better than the repeated Welch's t-tests in the case of small sample size. Although tricky, the P-value adjustment in (1) is performed across all models (= modules x rank consensus groups) under the assumption that modules are independent, which would be more conservative and less likely raise referees' eyebrows for venn diagram-type summary than using nominal P-value cutoff. Because the post-hoc comparisons (2) are to address the similarity of each intervention within a specific module, the P-values are adjusted across interventions only within the module (not across modules). At the same time, these P-values are simultaneously adjusted across rank consensus groups within the module, because the hypotheses across rank consensus are independent in contrast to the previous M001-alone DIRAC analysis. Note that not Dunnett's test but Student's t-test (i.e., t-test with pooled variance) is used as the post-hoc test because further adjustment of the Dunnett's test p-values with the Holm-Bonferroni method is too much (incorrect) adjustment for family-wise error rate (FWER).

# — †2. Go to †1 of the R sub-notebook —  

### 4-1. ANOVA test (RMS ~ Intervention), followed by repeated Student's t-tests (Intervention)

#### 4-1-1/2/3. Import the summary table

In [None]:
#Import the summary table
fileDir = './ExportData/'
ipynbName = '220529_LCproteomics-M004-DIRAC_StatisticalTest-GOBP-with-M001_ver2-3_'
fileName = 'inter-group-comparison.xlsx'
tempDF = pd.DataFrame()
for template in ['M004:Cont', 'M004:4EGI']:
    sheetName = template.replace(':', '-')+'-fixed-RMSmean'#Colon not allowed in sheet name in Excel...
    tempDF1 = pd.read_excel(fileDir+ipynbName+fileName, sheet_name=sheetName, engine='openpyxl')
    tempDF1['Template'] = template
    tempDF1 = tempDF1.set_index(['Template', 'ModuleID'])
    tempDF = pd.concat([tempDF, tempDF1], axis=0)
tempDF = tempDF.sort_values(by='Intervention_Pval', ascending=True)
display(tempDF)

statDF = tempDF

#### 4-1-4. Changed modules (ANOVA)

In [None]:
#Prepare variables in the model
tempS = statDF.loc[:, statDF.columns.str.contains('_Fstat')].columns.to_series()
variableL = tempS.str.replace('_Fstat', '').tolist()

#Changed modules
for variable in variableL:
    tempDF = statDF.loc[statDF[variable+'_AdjPval']<0.05]
    tempDF = tempDF.sort_values(by=variable+'_AdjPval', ascending=True)
    tempL1 = tempDF.loc[:, tempDF.columns.str.contains('_RMSmean')].columns.tolist()
    tempL2 = tempDF.loc[:, tempDF.columns.str.contains('^'+variable+'_')].columns.tolist()
    tempDF = tempDF[[col_n for subL in [['ModuleName'], tempL1, tempL2] for col_n in subL]]
    print(variable+' (adjusted P < 0.05):', len(tempDF))
    tempL = tempDF.index.to_frame()['ModuleID'].unique().tolist()
    print(' -> Unique module:', len(tempL))
    tempL = tempDF.index.to_frame()['Template'].unique().tolist()
    for template in tempL:
        tempDF1 = tempDF.loc[template]
        print(' -> '+template+' rank consensus:', len(tempDF1))
    display(tempDF)

#### 4-1-5. Changed modules by each intervention (Student's t-tests)

In [None]:
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for template in ['M004:Cont', 'M004:4EGI']:
    tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
    tempDF1.columns = template+'-fixed_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')

#Extract only the changed modules
variable = 'Intervention'
tempS1 = tempDF['M004:Cont-fixed_'+variable+'_AdjPval']<0.05
tempS2 = tempDF['M004:4EGI-fixed_'+variable+'_AdjPval']<0.05
tempDF = tempDF.loc[tempS1|tempS2]
tempDF = tempDF.sort_values(by=['M004:4EGI-fixed_'+variable+'_AdjPval',
                                'M004:Cont-fixed_'+variable+'_AdjPval'], ascending=True)
print(variable+' (adjusted P < 0.05):', len(tempDF))
print(' -> M004:Cont rank consensus:', sum(tempS1))
print(' -> M004:4EGI rank consensus:', sum(tempS2))

#Take adjusted P-value
tempDF1 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_AdjPval$')]
tempDF1.columns = tempDF1.columns.str.replace('_AdjPval$', '')
tempDF1 = pd.merge(tempDF[['ModuleName', 'M004:4EGI-fixed_'+variable+'_AdjPval',
                           'M004:Cont-fixed_'+variable+'_AdjPval']],
                   tempDF1, left_index=True, right_index=True, how='left')
print('Adjusted P-value:')
display(tempDF1)
display(tempDF1.describe())

#Take effect size
tempDF2 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_Coef$')]
tempDF2.columns = tempDF2.columns.str.replace('_Coef$', '')
tempDF2 = pd.merge(tempDF[['ModuleName', 'M004:4EGI-fixed_'+variable+'_AdjPval',
                           'M004:Cont-fixed_'+variable+'_AdjPval']],
                   tempDF2, left_index=True, right_index=True, how='left')
print('Changed direction (effect size):')
display(tempDF2)
display(tempDF2.describe())

pvalDF = tempDF1
diffDF = tempDF2

> Check the changed modules (based on the nominal P-value for the main effect and the adjusted P-values for the post-hoc tests) as reference.  

In [None]:
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for template in ['M004:Cont', 'M004:4EGI']:
    tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
    tempDF1.columns = template+'-fixed_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')

#Extract only the changed modules
variable = 'Intervention'
tempS1 = tempDF['M004:Cont-fixed_'+variable+'_Pval']<0.05
tempS2 = tempDF['M004:4EGI-fixed_'+variable+'_Pval']<0.05
tempDF = tempDF.loc[tempS1|tempS2]
tempDF = tempDF.sort_values(by=['M004:4EGI-fixed_'+variable+'_Pval',
                                'M004:Cont-fixed_'+variable+'_Pval'], ascending=True)
print(variable+' (nominal P < 0.05):', len(tempDF))
print(' -> M004:Cont rank consensus:', sum(tempS1))
print(' -> M004:4EGI rank consensus:', sum(tempS2))

#Take adjusted P-value
tempDF1 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_AdjPval$')]
tempDF1.columns = tempDF1.columns.str.replace('_AdjPval$', '')
tempDF1 = pd.merge(tempDF[['ModuleName', 'M004:4EGI-fixed_'+variable+'_AdjPval',
                           'M004:Cont-fixed_'+variable+'_AdjPval']],
                   tempDF1, left_index=True, right_index=True, how='left')
print('Adjusted P-value:')
display(tempDF1)
display(tempDF1.describe())

#Take effect size
tempDF2 = tempDF.loc[:, tempDF.columns.str.contains('-vs-.*_Coef$')]
tempDF2.columns = tempDF2.columns.str.replace('_Coef$', '')
tempDF2 = pd.merge(tempDF[['ModuleName', 'M004:4EGI-fixed_'+variable+'_Pval',
                           'M004:Cont-fixed_'+variable+'_Pval']],
                   tempDF2, left_index=True, right_index=True, how='left')
#print('Changed direction (effect size):')
#display(tempDF2)
#display(tempDF2.describe())

pvalDF_ref = tempDF1
diffDF_ref = tempDF2

### 4-2. Visualization: venn diagram

#### 4-2-1. RMS under 4EGI-1 consensus

In [None]:
template_label = 'M004:4EGI-fixed'

#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF.loc[:, pvalDF.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on adjusted ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF) - len(tempDF)
    print('Changed modules (adjusted ANOVA P < 0.05):', len(pvalDF))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

In [None]:
#Export module list in each subset in the venn diagram
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD = {}
    tempL = pvalDF.loc[:, pvalDF.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on adjusted ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempD[col_n] = set(tempS2.index.tolist())
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF) - len(tempDF)
    print('Changed modules (adjusted ANOVA P < 0.05):', len(pvalDF))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Prepare a new .xlsx file (dummy README)
    tempL1 = [len(tempD[key]) for key in tempD.keys()]
    tempDF = pd.DataFrame({'Group':tempD.keys(), 'nModules':tempL1})
    tempDF = tempDF.reset_index().rename(columns={'index':'VennOrder'})
    tempDF['VennOrder'] = tempDF['VennOrder'] + 1
    fileDir = './ExportData/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed.xlsx'
    tempDF.to_excel(fileDir+ipynbName+fileName, sheet_name='README', header=True, index=False)
    display(tempDF)#Check
    
    #Spread statDF by template to flatten multi-index
    tempDF = moduleDF['ModuleName']#pd.Series() for now
    for template in ['M004:Cont', 'M004:4EGI']:
        tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
        tempDF1.columns = template+'-fixed_'+tempDF1.columns
        tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
    tempDF = tempDF.sort_index(ascending=True)
    
    t_start = time.time()
    #Extract overall set
    for key_i in range(len(tempD)):
        key = list(tempD.keys())[key_i]
        tempS = tempD[key]
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        tempL1 = ['NA' for i in range(len(tempD))]
        tempL1[key_i] = '1'
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    #Extract subset
    tempL1 = ['1', '0']
    tempL2 = [[k1, k2, k3] for k1 in tempL1 for k2 in tempL1 for k3 in tempL1]
    #tempL2.remove(['0', '0', '0'])
    for tempL1 in tempL2:
        #Positive module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='1']
        tempS1 = set(pvalDF.index.tolist())#Initialize
        for tempS in tempL3:
            tempS1 = tempS1 & tempS
        #Negative module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='0']
        tempS2 = set()#Initialize
        for tempS in tempL3:
            tempS2 = tempS2 | tempS
        #Extract subset
        tempS = tempS1 - tempS2
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    t_elapsed = time.time() - t_start
    print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

> Check the changed modules (based on the nominal P-value for the main effect and the adjusted P-values for the post-hoc tests) as reference.  

In [None]:
template_label = 'M004:4EGI-fixed'

#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on nominal ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF_ref[col_n]
        tempS2 = diffDF_ref[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF_ref.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF_ref) - len(tempDF)
    print('Changed modules (nominal ANOVA P < 0.05):', len(pvalDF_ref))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed(nominalPval).tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

In [None]:
#Export module list in each subset in the venn diagram
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD = {}
    tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on nominal ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF_ref[col_n]
        tempS2 = diffDF_ref[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempD[col_n] = set(tempS2.index.tolist())
    
    #Not significant in all contrasts
    tempDF = pvalDF_ref.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF_ref) - len(tempDF)
    print('Changed modules (nominal ANOVA P < 0.05):', len(pvalDF_ref))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Prepare a new .xlsx file (dummy README)
    tempL1 = [len(tempD[key]) for key in tempD.keys()]
    tempDF = pd.DataFrame({'Group':tempD.keys(), 'nModules':tempL1})
    tempDF = tempDF.reset_index().rename(columns={'index':'VennOrder'})
    tempDF['VennOrder'] = tempDF['VennOrder'] + 1
    fileDir = './ExportData/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed(nominalPval).xlsx'
    tempDF.to_excel(fileDir+ipynbName+fileName, sheet_name='README', header=True, index=False)
    display(tempDF)#Check
    
    #Spread statDF by template to flatten multi-index
    tempDF = moduleDF['ModuleName']#pd.Series() for now
    for template in ['M004:Cont', 'M004:4EGI']:
        tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
        tempDF1.columns = template+'-fixed_'+tempDF1.columns
        tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
    tempDF = tempDF.sort_index(ascending=True)
    
    t_start = time.time()
    #Extract overall set
    for key_i in range(len(tempD)):
        key = list(tempD.keys())[key_i]
        tempS = tempD[key]
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        tempL1 = ['NA' for i in range(len(tempD))]
        tempL1[key_i] = '1'
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    #Extract subset
    tempL1 = ['1', '0']
    tempL2 = [[k1, k2, k3] for k1 in tempL1 for k2 in tempL1 for k3 in tempL1]
    #tempL2.remove(['0', '0', '0'])
    for tempL1 in tempL2:
        #Positive module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='1']
        tempS1 = set(pvalDF.index.tolist())#Initialize
        for tempS in tempL3:
            tempS1 = tempS1 & tempS
        #Negative module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='0']
        tempS2 = set()#Initialize
        for tempS in tempL3:
            tempS2 = tempS2 | tempS
        #Extract subset
        tempS = tempS1 - tempS2
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    t_elapsed = time.time() - t_start
    print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

> –> Add the nominal P-value-based counts to the adjusted P-value-based venn diagrams.  

In [None]:
countD_sub = {}
countD_total = {}

#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on nominal ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF_ref[col_n]
        tempS2 = diffDF_ref[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF_ref.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF_ref) - len(tempDF)
    print('Changed modules (nominal ANOVA P < 0.05):', len(pvalDF_ref))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Save the count texts to use later
    tempL = []
    for text in ax.texts:
        tempL.append(text.get_text())
    countD_sub[similarity] = tempL
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    tempL = []
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
        ##Save the count texts to use later
        tempL.append(total)
    countD_total[similarity] = tempL
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    plt.show()

In [None]:
#Check
display(countD_sub)
display(countD_total)

In [None]:
#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF.loc[:, pvalDF.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on adjusted ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF) - len(tempDF)
    print('Changed modules (adjusted ANOVA P < 0.05):', len(pvalDF))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Add the nominal P-value-based counts
    for text_i, text in enumerate(ax.texts):
        count_adj = text.get_text()
        count_nom = countD_sub[similarity][text_i]
        text.set_text(count_adj+' ['+count_nom+']')
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total_adj = f'{len(tempD[key]):,}'
        total_nom = countD_total[similarity][i]
        ax.text(x_coord[i], y_coord[i], key+'\n('+total_adj+' ['+total_nom+'] modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220606_LCproteomics-M004-DIRAC-ver2-3_Supplement_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

#### 4-2-2. RMS under Ctrl (M004) consensus

In [None]:
template_label = 'M004:Cont-fixed'

#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF.loc[:, pvalDF.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on adjusted ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF) - len(tempDF)
    print('Changed modules (adjusted ANOVA P < 0.05):', len(pvalDF))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

In [None]:
#Export module list in each subset in the venn diagram
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD = {}
    tempL = pvalDF.loc[:, pvalDF.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on adjusted ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempD[col_n] = set(tempS2.index.tolist())
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF) - len(tempDF)
    print('Changed modules (adjusted ANOVA P < 0.05):', len(pvalDF))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Prepare a new .xlsx file (dummy README)
    tempL1 = [len(tempD[key]) for key in tempD.keys()]
    tempDF = pd.DataFrame({'Group':tempD.keys(), 'nModules':tempL1})
    tempDF = tempDF.reset_index().rename(columns={'index':'VennOrder'})
    tempDF['VennOrder'] = tempDF['VennOrder'] + 1
    fileDir = './ExportData/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed.xlsx'
    tempDF.to_excel(fileDir+ipynbName+fileName, sheet_name='README', header=True, index=False)
    display(tempDF)#Check
    
    #Spread statDF by template to flatten multi-index
    tempDF = moduleDF['ModuleName']#pd.Series() for now
    for template in ['M004:Cont', 'M004:4EGI']:
        tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
        tempDF1.columns = template+'-fixed_'+tempDF1.columns
        tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
    tempDF = tempDF.sort_index(ascending=True)
    
    t_start = time.time()
    #Extract overall set
    for key_i in range(len(tempD)):
        key = list(tempD.keys())[key_i]
        tempS = tempD[key]
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        tempL1 = ['NA' for i in range(len(tempD))]
        tempL1[key_i] = '1'
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    #Extract subset
    tempL1 = ['1', '0']
    tempL2 = [[k1, k2, k3] for k1 in tempL1 for k2 in tempL1 for k3 in tempL1]
    #tempL2.remove(['0', '0', '0'])
    for tempL1 in tempL2:
        #Positive module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='1']
        tempS1 = set(pvalDF.index.tolist())#Initialize
        for tempS in tempL3:
            tempS1 = tempS1 & tempS
        #Negative module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='0']
        tempS2 = set()#Initialize
        for tempS in tempL3:
            tempS2 = tempS2 | tempS
        #Extract subset
        tempS = tempS1 - tempS2
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    t_elapsed = time.time() - t_start
    print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

> Check the changed modules (based on the nominal P-value for the main effect and the adjusted P-values for the post-hoc tests) as reference.  

In [None]:
template_label = 'M004:Cont-fixed'

#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on nominal ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF_ref[col_n]
        tempS2 = diffDF_ref[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF_ref.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF_ref) - len(tempDF)
    print('Changed modules (nominal ANOVA P < 0.05):', len(pvalDF_ref))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed(nominalPval).tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

In [None]:
#Export module list in each subset in the venn diagram
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD = {}
    tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on nominal ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF_ref[col_n]
        tempS2 = diffDF_ref[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        tempD[col_n] = set(tempS2.index.tolist())
    
    #Not significant in all contrasts
    tempDF = pvalDF_ref.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF_ref) - len(tempDF)
    print('Changed modules (nominal ANOVA P < 0.05):', len(pvalDF_ref))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Prepare a new .xlsx file (dummy README)
    tempL1 = [len(tempD[key]) for key in tempD.keys()]
    tempDF = pd.DataFrame({'Group':tempD.keys(), 'nModules':tempL1})
    tempDF = tempDF.reset_index().rename(columns={'index':'VennOrder'})
    tempDF['VennOrder'] = tempDF['VennOrder'] + 1
    fileDir = './ExportData/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed(nominalPval).xlsx'
    tempDF.to_excel(fileDir+ipynbName+fileName, sheet_name='README', header=True, index=False)
    display(tempDF)#Check
    
    #Spread statDF by template to flatten multi-index
    tempDF = moduleDF['ModuleName']#pd.Series() for now
    for template in ['M004:Cont', 'M004:4EGI']:
        tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
        tempDF1.columns = template+'-fixed_'+tempDF1.columns
        tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
    tempDF = tempDF.sort_index(ascending=True)
    
    t_start = time.time()
    #Extract overall set
    for key_i in range(len(tempD)):
        key = list(tempD.keys())[key_i]
        tempS = tempD[key]
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        tempL1 = ['NA' for i in range(len(tempD))]
        tempL1[key_i] = '1'
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    #Extract subset
    tempL1 = ['1', '0']
    tempL2 = [[k1, k2, k3] for k1 in tempL1 for k2 in tempL1 for k3 in tempL1]
    #tempL2.remove(['0', '0', '0'])
    for tempL1 in tempL2:
        #Positive module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='1']
        tempS1 = set(pvalDF.index.tolist())#Initialize
        for tempS in tempL3:
            tempS1 = tempS1 & tempS
        #Negative module set
        tempL3 = [list(tempD.values())[key_i] for key_i, binary in enumerate(tempL1) if binary=='0']
        tempS2 = set()#Initialize
        for tempS in tempL3:
            tempS2 = tempS2 | tempS
        #Extract subset
        tempS = tempS1 - tempS2
        tempDF1 = tempDF.loc[tempDF.index.isin(tempS)]
        #Save summary table by appended to the above .xlsx file
        ##Prepare sheet name
        setName = '('+','.join(tempL1)+')'
        with pd.ExcelWriter(fileDir+ipynbName+fileName, mode='a', engine='openpyxl') as writer:
            tempDF1.to_excel(writer, sheet_name=setName, header=True, index=True)
        print(' - '+setName+':', len(tempDF1))
    
    t_elapsed = time.time() - t_start
    print(' - Elapsed time:', round(t_elapsed//60), 'min', round(t_elapsed%60, 1), 'sec')

> –> Add the nominal P-value-based counts to the adjusted P-value-based venn diagrams.  

In [None]:
countD_sub = {}
countD_total = {}

#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on nominal ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF_ref[col_n]
        tempS2 = diffDF_ref[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF_ref.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF_ref) - len(tempDF)
    print('Changed modules (nominal ANOVA P < 0.05):', len(pvalDF_ref))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Save the count texts to use later
    tempL = []
    for text in ax.texts:
        tempL.append(text.get_text())
    countD_sub[similarity] = tempL
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    tempL = []
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total = f'{len(tempD[key]):,}'
        ax.text(x_coord[i], y_coord[i], key+'\n('+total+' modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
        ##Save the count texts to use later
        tempL.append(total)
    countD_total[similarity] = tempL
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    plt.show()

In [None]:
#Check
display(countD_sub)
display(countD_total)

In [None]:
#Prepare label and color
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Acarbose':'tab:red', '17'+r'$\alpha$'+'-Estradiol':'tab:green',
          'Rapamycin':'tab:purple'}

#Visualization per direction
for similarity in ['Similar', 'Dissimilar']:
    #Prepare similar/dissimilar module set
    tempD2 = {}
    tempL = pvalDF.loc[:, pvalDF.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()#Based on adjusted ANOVA P-values
    for col_n in tempL:
        tempS1 = pvalDF[col_n]
        tempS2 = diffDF[col_n]
        if similarity=='Similar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2>0)]
        elif similarity=='Dissimilar':
            tempS2 = tempS2.loc[(tempS1<0.05)&(tempS2<0)]
        comparison = col_n.replace(template_label+'_', '')
        contrast = re.sub('-vs-.*', '', comparison)
        label = tempD0[contrast]
        tempD2[label] = set(tempS2.index.tolist())
    ##Sort to make consistent order in manual legend generation
    tempD = {}
    for label in tempD1.keys():
        tempD[label] = tempD2[label]
    
    #Not significant in all contrasts
    tempDF = pvalDF.copy()
    for moduleS in tempD.values():
        tempDF = tempDF.loc[~tempDF.index.isin(moduleS)]
    count = len(pvalDF) - len(tempDF)
    print('Changed modules (adjusted ANOVA P < 0.05):', len(pvalDF))
    print(' -> '+similarity+'ly changed in any of interventions:', count)
    
    #Skip the followings if no similarly/dissimilarly changed module
    if count==0:
        continue
    
    #Venn diagram
    sns.set(style='ticks', font='Arial', context='talk')
    fig, ax = plt.subplots(figsize=(4, 4))
    venn(tempD, fmt='{size:,}', cmap=list(tempD1.values()), legend_loc=None, ax=ax)
    plt.setp(ax, ylim=(0.05, 0.975))#Otherwise, weird space...
    ##Add the nominal P-value-based counts
    for text_i, text in enumerate(ax.texts):
        count_adj = text.get_text()
        count_nom = countD_sub[similarity][text_i]
        text.set_text(count_adj+' ['+count_nom+']')
    ##Add legend annotation
    x_coord = [0.1, 0.9, 0.8]
    y_coord = [0.8, 0.8, 0.25]
    h_align = ['right', 'left', 'left']
    v_align = ['bottom', 'bottom', 'top']
    for i in range(len(tempD1)):
        key = list(tempD1.keys())[i]
        total_adj = f'{len(tempD[key]):,}'
        total_nom = countD_total[similarity][i]
        ax.text(x_coord[i], y_coord[i], key+'\n('+total_adj+' ['+total_nom+'] modules)',
                fontsize='small', multialignment='center',
                horizontalalignment=h_align[i], verticalalignment=v_align[i],
                bbox={'boxstyle':'round', 'facecolor':tempD1[key], 'pad':0.2, 'alpha':0.5})
    title = similarity+'ly changed modules (vs. Control-1)\nto '+tempD0[template_label.replace('-fixed', '')]+' consensus'
    ax.set_title(title, fontsize='medium')
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220606_LCproteomics-M004-DIRAC-ver2-3_Supplement_'
    fileName = template_label.replace(':', '-')+'-RMSmean-inter-group-comparison_venn-'+similarity.lower()+'ly-changed.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()

### 4-3. Visualization: pointplot

#### 4-3-1. Similarly changed modules to 4EGI-1 consensus by all interventions

In [None]:
#Prepare the target module set
posL = ['M004:4EGI-fixed_M001:Acar-vs-M001:Cont',
        'M004:4EGI-fixed_M001:17aE-vs-M001:Cont',
        'M004:4EGI-fixed_M001:Rapa-vs-M001:Cont']
negL = ['']
template_label = 'M004:4EGI-fixed'
similarity = 'Similar'
tempS = pd.Series(np.repeat(True, len(pvalDF_ref)), index=pvalDF_ref.index)#Initialize based on nominal ANOVA P-values
tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()
for col_n in tempL:
    tempS1 = pvalDF_ref[col_n]
    tempS2 = diffDF_ref[col_n]
    if col_n in posL:
        if similarity=='Similar':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif similarity=='Dissimilar':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if similarity=='Similar':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif similarity=='Dissimilar':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS & tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), similarity.lower()+'ly changed modules to '+template_label+' consensus with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for template in ['M004:Cont', 'M004:4EGI']:
    tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
    tempDF1.columns = template+'-fixed_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['M004:4EGI-fixed_'+variable+'_AdjPval',
                                'M004:Cont-fixed_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars=['ModuleID', 'Template'])
tempDF1 = sampleDF.reset_index()[['SampleID', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Control (M001)':'Control-1', 'Acarbose':'Acarbose',
         'Estradiol':'17'+r'$\alpha$'+'-Estradiol', 'Rapamycin':'Rapamycin',
         'Control (M004)':'Control-2', '4EGI-1':'4EGI-1'}
tempDF['Template'] = tempDF['Template'].map(tempD)
tempDF['Group'] = tempDF['Phenotype'].map(tempD)
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Control-1':'tab:blue', 'Acarbose':'tab:red',
          '17'+r'$\alpha$'+'-Estradiol':'tab:green', 'Rapamycin':'tab:purple',
          'Control-2':'tab:blue', '4EGI-1':'tab:orange'}
tempD2 = {}
for label in ['Control-2', '4EGI-1']:
    if tempD1[label]=='tab:blue':
        tempD2[label] = plt.get_cmap('tab20')(1)
    elif tempD1[label]=='tab:orange':
        tempD2[label] = plt.get_cmap('tab20')(3)
    elif tempD1[label]=='tab:green':
        tempD2[label] = plt.get_cmap('tab20')(5)
    elif tempD1[label]=='tab:red':
        tempD2[label] = plt.get_cmap('tab20')(7)
    elif tempD1[label]=='tab:purple':
        tempD2[label] = plt.get_cmap('tab20')(9)
    elif tempD1[label]=='tab:olive':
        tempD2[label] = plt.get_cmap('tab20')(17)
    else:
        tempD2[label] = 'Error?'

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    tempDF1 = tempDF1.loc[tempDF1['Template'].isin(tempD2.keys())]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Template', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Template'] = pd.Categorical(tempDF2['Template'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Template', 'Group']).set_index(['Template', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF_ref.loc[module, pvalDF_ref.columns.str.contains('-fixed_.*-vs-')]#Adjusted P-value
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='-fixed_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Template', 1:'Comparison'})
    tempS1 = tempDF2['Template']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Template'] = tempDF2['Template'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.4
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6.4, 5.875), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[3.2, 3.2]})
    for ax_i, ax in enumerate(axes.flat):
        template = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Template']==template]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Add border line
        ax.axvline(x=3.5, **{'linestyle':'dotted', 'color':'black', 'zorder':0})
        #Add RCI line
        rci = tempDF3['RMS'].loc[tempDF3['Group']==template].mean()
        ax.axhline(y=rci, **{'linestyle':'--', 'color':tempD1[template], 'zorder':0})
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Template']==template]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title('Consensus:\n'+template, {'fontsize':'small'})
        xoff = 0.015
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.16,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[template],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 60))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='baseline', horizontalalignment='center', wrap=True, y=0.935)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = 'fixed-RMSmean-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 4-3-2. Similarly changed modules to 4EGI-1 consensus by any of interventions

In [None]:
#Prepare the target module set
posL = ['M004:4EGI-fixed_M001:Acar-vs-M001:Cont',
        'M004:4EGI-fixed_M001:17aE-vs-M001:Cont',
        'M004:4EGI-fixed_M001:Rapa-vs-M001:Cont']
negL = ['']
template_label = 'M004:4EGI-fixed'
similarity = 'Similar'
tempS = pd.Series(np.repeat(False, len(pvalDF_ref)), index=pvalDF_ref.index)#Initialize based on nominal ANOVA P-values
tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()
for col_n in tempL:
    tempS1 = pvalDF_ref[col_n]
    tempS2 = diffDF_ref[col_n]
    if col_n in posL:
        if similarity=='Similar':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif similarity=='Dissimilar':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if similarity=='Similar':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif similarity=='Dissimilar':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS | tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), similarity.lower()+'ly changed modules to '+template_label+' consensus with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for template in ['M004:Cont', 'M004:4EGI']:
    tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
    tempDF1.columns = template+'-fixed_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['M004:4EGI-fixed_'+variable+'_AdjPval',
                                'M004:Cont-fixed_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars=['ModuleID', 'Template'])
tempDF1 = sampleDF.reset_index()[['SampleID', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Control (M001)':'Control-1', 'Acarbose':'Acarbose',
         'Estradiol':'17'+r'$\alpha$'+'-Estradiol', 'Rapamycin':'Rapamycin',
         'Control (M004)':'Control-2', '4EGI-1':'4EGI-1'}
tempDF['Template'] = tempDF['Template'].map(tempD)
tempDF['Group'] = tempDF['Phenotype'].map(tempD)
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Control-1':'tab:blue', 'Acarbose':'tab:red',
          '17'+r'$\alpha$'+'-Estradiol':'tab:green', 'Rapamycin':'tab:purple',
          'Control-2':'tab:blue', '4EGI-1':'tab:orange'}
tempD2 = {}
for label in ['Control-2', '4EGI-1']:
    if tempD1[label]=='tab:blue':
        tempD2[label] = plt.get_cmap('tab20')(1)
    elif tempD1[label]=='tab:orange':
        tempD2[label] = plt.get_cmap('tab20')(3)
    elif tempD1[label]=='tab:green':
        tempD2[label] = plt.get_cmap('tab20')(5)
    elif tempD1[label]=='tab:red':
        tempD2[label] = plt.get_cmap('tab20')(7)
    elif tempD1[label]=='tab:purple':
        tempD2[label] = plt.get_cmap('tab20')(9)
    elif tempD1[label]=='tab:olive':
        tempD2[label] = plt.get_cmap('tab20')(17)
    else:
        tempD2[label] = 'Error?'

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    tempDF1 = tempDF1.loc[tempDF1['Template'].isin(tempD2.keys())]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Template', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Template'] = pd.Categorical(tempDF2['Template'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Template', 'Group']).set_index(['Template', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF_ref.loc[module, pvalDF_ref.columns.str.contains('-fixed_.*-vs-')]#Adjusted P-value
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='-fixed_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Template', 1:'Comparison'})
    tempS1 = tempDF2['Template']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Template'] = tempDF2['Template'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.4
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6.4, 5.875), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[3.2, 3.2]})
    for ax_i, ax in enumerate(axes.flat):
        template = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Template']==template]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Add border line
        ax.axvline(x=3.5, **{'linestyle':'dotted', 'color':'black', 'zorder':0})
        #Add RCI line
        rci = tempDF3['RMS'].loc[tempDF3['Group']==template].mean()
        ax.axhline(y=rci, **{'linestyle':'--', 'color':tempD1[template], 'zorder':0})
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Template']==template]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title('Consensus:\n'+template, {'fontsize':'small'})
        xoff = 0.015
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.16,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[template],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 60))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='baseline', horizontalalignment='center', wrap=True, y=0.935)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = 'fixed-RMSmean-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 4-3-3. Dissimilarly changed modules to 4EGI-1 consensus by any of interventions

In [None]:
#Prepare the target module set
posL = ['M004:4EGI-fixed_M001:Acar-vs-M001:Cont',
        'M004:4EGI-fixed_M001:17aE-vs-M001:Cont',
        'M004:4EGI-fixed_M001:Rapa-vs-M001:Cont']
negL = ['']
template_label = 'M004:4EGI-fixed'
similarity = 'Dissimilar'
tempS = pd.Series(np.repeat(False, len(pvalDF_ref)), index=pvalDF_ref.index)#Initialize based on nominal ANOVA P-values
tempL = pvalDF_ref.loc[:, pvalDF_ref.columns.str.contains(template_label+'_.*-vs-')].columns.tolist()
for col_n in tempL:
    tempS1 = pvalDF_ref[col_n]
    tempS2 = diffDF_ref[col_n]
    if col_n in posL:
        if similarity=='Similar':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        elif similarity=='Dissimilar':
            tempS1 = (tempS1<0.05) & (tempS2<0)
    elif col_n in negL:
        tempS3 = (tempS1>=0.05)
        #Significance for inverse regulation
        if similarity=='Similar':
            tempS1 = (tempS1<0.05) & (tempS2<0)
        elif similarity=='Dissimilar':
            tempS1 = (tempS1<0.05) & (tempS2>0)
        tempS1 = tempS3 | tempS1
    else:
        tempS1 = (tempS1>=0.0)
    #Update True
    tempS = tempS | tempS1
tempL = tempS.loc[tempS.tolist()].index.tolist()
print(len(tempL), similarity.lower()+'ly changed modules to '+template_label+' consensus with significance in', posL, 'but not in', negL)

#Select representatives
topX = np.min([30, len(tempL)])
topX_plot = np.min([10, len(tempL)])
#Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for template in ['M004:Cont', 'M004:4EGI']:
    tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
    tempDF1.columns = template+'-fixed_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[tempL]
tempDF = tempDF.sort_values(by=['M004:4EGI-fixed_'+variable+'_AdjPval',
                                'M004:Cont-fixed_'+variable+'_AdjPval'], ascending=True)
print('Top', topX, 'modules (sort by the main effect of '+variable+'):')
display(tempDF.iloc[:topX])
plotL = tempDF.index.tolist()[:topX_plot]

#Prepare DF for plot
tempDF = rmsDF.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars=['ModuleID', 'Template'])
tempDF1 = sampleDF.reset_index()[['SampleID', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Control (M001)':'Control-1', 'Acarbose':'Acarbose',
         'Estradiol':'17'+r'$\alpha$'+'-Estradiol', 'Rapamycin':'Rapamycin',
         'Control (M004)':'Control-2', '4EGI-1':'4EGI-1'}
tempDF['Template'] = tempDF['Template'].map(tempD)
tempDF['Group'] = tempDF['Phenotype'].map(tempD)
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Control-1':'tab:blue', 'Acarbose':'tab:red',
          '17'+r'$\alpha$'+'-Estradiol':'tab:green', 'Rapamycin':'tab:purple',
          'Control-2':'tab:blue', '4EGI-1':'tab:orange'}
tempD2 = {}
for label in ['Control-2', '4EGI-1']:
    if tempD1[label]=='tab:blue':
        tempD2[label] = plt.get_cmap('tab20')(1)
    elif tempD1[label]=='tab:orange':
        tempD2[label] = plt.get_cmap('tab20')(3)
    elif tempD1[label]=='tab:green':
        tempD2[label] = plt.get_cmap('tab20')(5)
    elif tempD1[label]=='tab:red':
        tempD2[label] = plt.get_cmap('tab20')(7)
    elif tempD1[label]=='tab:purple':
        tempD2[label] = plt.get_cmap('tab20')(9)
    elif tempD1[label]=='tab:olive':
        tempD2[label] = plt.get_cmap('tab20')(17)
    else:
        tempD2[label] = 'Error?'

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    tempDF1 = tempDF1.loc[tempDF1['Template'].isin(tempD2.keys())]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Template', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Template'] = pd.Categorical(tempDF2['Template'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Template', 'Group']).set_index(['Template', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Retrieve statistical significance
    tempS = pvalDF_ref.loc[module, pvalDF_ref.columns.str.contains('-fixed_.*-vs-')]#Adjusted P-value
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='-fixed_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Template', 1:'Comparison'})
    tempS1 = tempDF2['Template']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Template'] = tempDF2['Template'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.4
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6.4, 5.875), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[3.2, 3.2]})
    for ax_i, ax in enumerate(axes.flat):
        template = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Template']==template]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Add border line
        ax.axvline(x=3.5, **{'linestyle':'dotted', 'color':'black', 'zorder':0})
        #Add RCI line
        rci = tempDF3['RMS'].loc[tempDF3['Group']==template].mean()
        ax.axhline(y=rci, **{'linestyle':'--', 'color':tempD1[template], 'zorder':0})
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Template']==template]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title('Consensus:\n'+template, {'fontsize':'small'})
        xoff = 0.015
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.16,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[template],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 60))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='baseline', horizontalalignment='center', wrap=True, y=0.935)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = 'fixed-RMSmean-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

#### 4-3-4. Modules of special interest

In [None]:
plotL = ['GO:0006635', 'GO:0031998', 'GO:0016558', 'GO:0006625',#Common in M001
         'GO:0006637', 'GO:0006734', 'GO:0001561',#Related to fatty acid oxidation (mentioned in previous manuscript)
         'GO:0023035', 'GO:0039529',#Immune response (mentioned in previous manuscript)
         'GO:0010638', 'GO:0002181', 'GO:0045899',#Mentioned in previous manuscript
         'GO:0006635', 'GO:1990126', 'GO:0098761',#Common in proteomics and transcriptomics
         'GO:0061732', 'GO:0006086', 'GO:0019441',#Specific to proteomics (Aca)
         'GO:0031998', 'GO:0044794', 'GO:0006625', 'GO:0016558', 'GO:0019441', 'GO:0033572',#Specific to proteomics (Rapa)
         'GO:0006635', 'GO:0002181', 'GO:0000028', 'GO:0016558', 'GO:0006734', 'GO:0098761',#Common in M001 + M004
         'GO:0015986', 'GO:0042776', 'GO:0070934', 'GO:0006703', 'GO:0034354', 'GO:0045899']#Similar to 4EGI-1
plotL = list(set(plotL))
plotL.sort()
for module in plotL:
    if module not in moduleDF.index.tolist():
        print(module+' was NOT included in this analysis.')
        plotL.remove(module)

#Prepare the target module set
##Spread statDF by template to flatten multi-index
tempDF = moduleDF['ModuleName']#pd.Series() for now
for template in ['M004:Cont', 'M004:4EGI']:
    tempDF1 = statDF.loc[template].drop(columns=['ModuleName'])
    tempDF1.columns = template+'-fixed_'+tempDF1.columns
    tempDF = pd.merge(tempDF, tempDF1, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[:, tempDF.columns.str.contains('Pval$')]
tempDF = pd.merge(moduleDF['ModuleName'], tempDF, left_index=True, right_index=True, how='left')
tempDF = tempDF.loc[plotL]
display(tempDF)

#Prepare DF for plot
tempDF = rmsDF.reset_index().melt(var_name='SampleID', value_name='RMS', id_vars=['ModuleID', 'Template'])
tempDF1 = sampleDF.reset_index()[['SampleID', 'Phenotype']]
tempDF = pd.merge(tempDF, tempDF1, on='SampleID', how='left')

#Prepare label and color
tempD = {'Control (M001)':'Control-1', 'Acarbose':'Acarbose',
         'Estradiol':'17'+r'$\alpha$'+'-Estradiol', 'Rapamycin':'Rapamycin',
         'Control (M004)':'Control-2', '4EGI-1':'4EGI-1'}
tempDF['Template'] = tempDF['Template'].map(tempD)
tempDF['Group'] = tempDF['Phenotype'].map(tempD)
tempD0 = {'M001:Cont':'Control-1', 'M001:Acar':'Acarbose',
          'M001:17aE':'17'+r'$\alpha$'+'-Estradiol', 'M001:Rapa':'Rapamycin',
          'M004:Cont':'Control-2', 'M004:4EGI':'4EGI-1'}
tempD1 = {'Control-1':'tab:blue', 'Acarbose':'tab:red',
          '17'+r'$\alpha$'+'-Estradiol':'tab:green', 'Rapamycin':'tab:purple',
          'Control-2':'tab:blue', '4EGI-1':'tab:orange'}
tempD2 = {}
for label in ['Control-2', '4EGI-1']:
    if tempD1[label]=='tab:blue':
        tempD2[label] = plt.get_cmap('tab20')(1)
    elif tempD1[label]=='tab:orange':
        tempD2[label] = plt.get_cmap('tab20')(3)
    elif tempD1[label]=='tab:green':
        tempD2[label] = plt.get_cmap('tab20')(5)
    elif tempD1[label]=='tab:red':
        tempD2[label] = plt.get_cmap('tab20')(7)
    elif tempD1[label]=='tab:purple':
        tempD2[label] = plt.get_cmap('tab20')(9)
    elif tempD1[label]=='tab:olive':
        tempD2[label] = plt.get_cmap('tab20')(17)
    else:
        tempD2[label] = 'Error?'

#Visualize each representative
for rank_i in range(len(plotL)):
    print(' - Rank '+str(rank_i+1)+' (sort by the main effect of '+variable+'):')
    module = plotL[rank_i]
    #Check module summary
    tempDF1 = pd.DataFrame(moduleDF.loc[module]).T
    display(tempDF1)
    
    #Select RMS
    tempDF1 = tempDF.loc[tempDF['ModuleID']==module]
    tempDF1 = tempDF1.loc[tempDF1['Template'].isin(tempD2.keys())]
    
    #Check RMS summary
    tempDF2 = tempDF1.groupby(['Template', 'Group'])['RMS'].agg(['count', 'mean', 'std'])
    tempL1 = []
    tempL2 = []
    for row_n in tempDF2.index.tolist():
        count, mean, std = tempDF2.loc[row_n]
        tempL1.append(mean - 1.96*std/np.sqrt(count))
        tempL2.append(mean + 1.96*std/np.sqrt(count))
    tempDF2['0.025'] = tempL1
    tempDF2['0.975'] = tempL2
    ##Multiindex sort
    tempDF2 = tempDF2.reset_index()
    tempDF2['Template'] = pd.Categorical(tempDF2['Template'], categories=list(tempD2.keys()))
    tempDF2['Group'] = pd.Categorical(tempDF2['Group'], categories=list(tempD1.keys()))
    tempDF2 = tempDF2.sort_values(by=['Template', 'Group']).set_index(['Template', 'Group'])
    display(tempDF2)
    
    #Prepare significance labels
    ##Re-prepare p-values because these modules can be filtered out in pvalDF
    tempDF2 = moduleDF['ModuleName']#pd.Series() for now
    for template in ['M004:Cont', 'M004:4EGI']:
        tempDF3 = statDF.loc[template].drop(columns=['ModuleName'])
        tempDF3.columns = template+'-fixed_'+tempDF3.columns
        tempDF2 = pd.merge(tempDF2, tempDF3, left_index=True, right_index=True, how='left')
    tempDF2 = tempDF2.loc[:, tempDF2.columns.str.contains('-vs-.*_AdjPval$')]
    tempDF2.columns = tempDF2.columns.str.replace('_AdjPval$', '')
    ##Retrieve statistical significance
    tempS = tempDF2.loc[module]
    tempS.name = 'AdjPval'
    ##Clean
    tempDF2 = tempS.index.to_series().str.split(pat='-fixed_', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Template', 1:'Comparison'})
    tempS1 = tempDF2['Template']
    tempDF2 = tempDF2['Comparison'].str.split(pat='-vs-', expand=True)
    tempDF2 = tempDF2.rename(columns={0:'Contrast', 1:'Baseline'})
    tempDF2 = pd.merge(tempS1, tempDF2, left_index=True, right_index=True, how='left')
    tempDF2 = pd.merge(tempDF2, tempS, left_index=True, right_index=True, how='left')
    tempDF2['Template'] = tempDF2['Template'].map(tempD0)
    tempDF2['Contrast'] = tempDF2['Contrast'].map(tempD0)
    tempDF2['Baseline'] = tempDF2['Baseline'].map(tempD0)
    ##Convert p-value to label
    tempL = []
    for row_i in range(len(tempDF2)):
        pval = tempDF2['AdjPval'].iloc[row_i]
        if pval<0.001:
            tempL.append('***')
        elif pval<0.01:
            tempL.append('**')
        elif pval<0.05:
            tempL.append('*')
        else:
            pval_text = Decimal(str(pval)).quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            tempL.append(r'$P$ = '+str(pval_text))
    tempDF2['SignifLabel'] = tempL
    display(tempDF2)
    
    #Visualization
    ymax = 1.0
    ymin = 0.0
    yinter = 0.2
    ymargin_t = 0.4
    ymargin_b = 0.05
    aline_ymin = 1.0
    aline_ymargin = 0.125
    sns.set(style='ticks', font='Arial', context='talk')
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6.4, 5.875), sharex=True, sharey=True,
                             gridspec_kw={'width_ratios':[3.2, 3.2]})
    for ax_i, ax in enumerate(axes.flat):
        template = list(tempD2.keys())[ax_i]
        tempDF3 = tempDF1.loc[tempDF1['Template']==template]
        sns.pointplot(data=tempDF3, x='Group', y='RMS', order=list(tempD1.keys()), palette=tempD1,
                      markers='o', dodge=False, join=False, capsize=0.6, estimator=np.mean, ci=95, ax=ax)
        sns.stripplot(data=tempDF3, x='Group', y='RMS',
                      order=list(tempD1.keys()), palette=tempD1, dodge=False, jitter=0.15,
                      size=5, edgecolor='black', linewidth=1, **{'marker':'o', 'alpha':0.5}, ax=ax)
        #Add border line
        ax.axvline(x=3.5, **{'linestyle':'dotted', 'color':'black', 'zorder':0})
        #Add RCI line
        rci = tempDF3['RMS'].loc[tempDF3['Group']==template].mean()
        ax.axhline(y=rci, **{'linestyle':'--', 'color':tempD1[template], 'zorder':0})
        #Set axis
        sns.despine()
        ax.set(ylim=(ymin-ymargin_b, ymax+ymargin_t), yticks=np.arange(ymin, ymax + yinter/10, yinter))
        plt.setp(ax.get_xticklabels(), rotation=70, horizontalalignment='right',
                 verticalalignment='center', rotation_mode='anchor')
        if ax_i==0:
            plt.setp(ax, xlabel='', ylabel='Sample RMS')
        else:
            plt.setp(ax.get_yticklabels(), visible=False)
            plt.setp(ax, xlabel='', ylabel='')
        #Add significance labels
        tempDF3 = tempDF2.loc[tempDF2['Template']==template]
        for row_i in range(len(tempDF3)):
            #Baseline
            group_0 = tempDF3['Baseline'].iloc[row_i]
            index_0 = list(tempD1.keys()).index(group_0)
            xcoord_0 = index_0
            #Contrast
            group_1 = tempDF3['Contrast'].iloc[row_i]
            index_1 = list(tempD1.keys()).index(group_1)
            xcoord_1 = index_1
            #Standard point of marker
            xcoord = (xcoord_0+xcoord_1)/2
            ycoord = aline_ymin + aline_ymargin*row_i
            label = tempDF3['SignifLabel'].iloc[row_i]
            #Add annotation lines
            aline_offset = yinter/5
            aline_length = yinter/5 + aline_offset/2
            ax.plot([xcoord_0, xcoord_0, xcoord_1, xcoord_1],
                    [ycoord+aline_offset, ycoord+aline_length, ycoord+aline_length, ycoord+aline_offset],
                    lw=1.5, c='k')
            #Add annotation text
            if label in ['***', '**', '*']:
                text_offset = yinter/21
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='medium', color='k')
            else:
                text_offset = yinter/3.5
                ax.annotate(label, xy=(xcoord, ycoord+text_offset),
                            horizontalalignment='center', verticalalignment='bottom',
                            fontsize='x-small', color='k')
        #Add annotation
        ax.set_title('Consensus:\n'+template, {'fontsize':'small'})
        xoff = 0.015
        yoff = 0.01
        rect = plt.Rectangle((xoff, 1), 1-xoff, 0.16,#Manual adjustment
                             transform=ax.transAxes, facecolor=tempD2[template],
                             clip_on=False, linewidth=0, zorder=0.5)
        ax.add_patch(rect)
    fig.tight_layout()
    #Set title
    modulename = moduleDF.loc[module, 'ModuleName']
    initial = modulename[0].capitalize()
    title = re.sub('^.', initial, modulename)+' ('+module+')'
    title = '\n'.join(wrap(title, 60))#Because the below wrap=True didn't work
    fig.suptitle(title, size='small',
                 verticalalignment='baseline', horizontalalignment='center', wrap=True, y=0.935)
    ##Save
    fileDir = './ExportFigures/'
    ipynbName = '220529_LCproteomics-M004-DIRAC_DIRAC-GOBP-with-M001_ver2-3_'
    fileName = 'fixed-RMSmean-inter-group-comparison_RMS-pointplot-'+module.replace('GO:', 'GO')+'.tif'
    plt.gcf().savefig(fileDir+ipynbName+fileName, dpi=300, bbox_inches='tight', pad_inches=0.04,
                      pil_kwargs={'compression':'tiff_lzw'})
    plt.show()
    print('')

# — End of notebook —