# GEM Reconstruction with LC M001-related Transcriptomics — Enrichment Analysis

***by Kengo Watanabe***  

Priyanka Baloni reconstructed mouse genome-scale metabolic models (GEMs; Khodaee, S. et al. Sci. Rep. 2020) with the preprocessed Longevity Consortium (LC) M001-related transcriptomics dataset (Tyshkovskiy, A. et al. Cell Metab. 2019; adjusted with sex and age), and calculated maximum flux values using flux variability analysis (FVA).  
–> This Jupyter Notebook (with R kernel) performed the enrichment analysis on the potentially changed reactions using GEM subsystem annotations.  

Input files:  
- Reaction metadata: 230502_LC-M001-related-TrOmics-GEM-ver3-15_FluxAnalysis_reaction-metadata.xlsx  
- Flux data (assessed reactions): 230502_LC-M001-related-TrOmics-GEM-ver3-15_FluxAnalysis_average-flux-data_selected.tsv  
- Flux comparison result: 230502_LC-M001-related-TrOmics-GEM-ver3-15_FluxAnalysis_flux-comparison_vs-each-control.tsv  

Output figures and tables:  
- Figure 6b, c  
- Supplementary Data 8  

Original notebook (memo for my future tracing):  
- dalek:\[JupyterLab HOME\]/230315_LC-M001-related-TrOmics-GEM-ver3/230503_LC-M001-related-TrOmics-GEM-ver3-15_Enrichment.ipynb  

> I don't know exactly from when but at least after early Feb 2022, BiocManager::install("clusterProfiler") falls into the non-zero exit status error at the sub-dependency: tidygraph package.  
> –> After trials and errors, the following code to downgrade BiocManager to the 3.13 version worked! Of note, the currently instralled BiocManager was 3.14 version on Jun 16, 2022.  

>> if (!require("BiocManager", quietly = TRUE))  
>>     install.packages("BiocManager", version="3.13")  
>> BiocManager::install(version="3.13", ask=FALSE)  
>> BiocManager::install("clusterProfiler")  

> –> After this code, library() raises rlang error, but restarting r-kernel worked. Also, even though BiocManager was downgraded, clusterProfiler was updated from version 4.0.5 (Feb 2022) to version 4.2.2.  

In [1]:
library("tidyverse")
options(repr.plot.width=5, repr.plot.height=5)#Default=7x7

#Bioconductor
for (package in c("clusterProfiler", "enrichplot")) {
    #if (!requireNamespace("BiocManager", quietly=TRUE))
    #    install.packages("BiocManager")
    #BiocManager::install(package)
    eval(bquote(library(.(package))))
    print(str_c(package, ": ", as.character(packageVersion(package))))
}
#CRAN
for (package in c("readxl", "openxlsx")) {
    #install.packages(package)
    eval(bquote(library(.(package))))
    print(str_c(package, ": ", as.character(packageVersion(package))))
}

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



Registered S3 method overwritten by 'ggtree':
  method      from 
  identify.gg ggfun

clusterProfiler v4.2.2  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/

If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfi

[1] "clusterProfiler: 4.2.2"
[1] "enrichplot: 1.14.2"
[1] "readxl: 1.4.2"
[1] "openxlsx: 4.2.5.2"


## 1. All the assessed reactions

In [None]:
#Import reaction metadata
fileDir <- "./ExportData/"
ipynbName <- "230502_LC-M001-related-TrOmics-GEM-ver3-15_FluxAnalysis_"
fileName <- "reaction-metadata.xlsx"
sheetName <- "Reaction"
temp <- read_excel(str_c(fileDir,ipynbName,fileName), sheet=sheetName)

print(str_c("nrow: ",nrow(temp)))
head(temp)
print(str_c("Unique reaction: ", length(unique(temp$ReactionID))))
print(str_c("Unique subsystem: ", length(unique(temp$Subsystem))))

meta_tbl <- temp

In [None]:
#Import the assessed reactions
fileDir <- "./ExportData/"
ipynbName <- "230502_LC-M001-related-TrOmics-GEM-ver3-15_FluxAnalysis_"
fileName <- "average-flux-data_selected.tsv"
temp <- read_delim(str_c(fileDir,ipynbName,fileName), delim="\t")

#Take the successfully calculated reactions
temp <- meta_tbl %>%
    dplyr::filter(ReactionID %in% temp$ReactionID)

print(str_c("nrow: ",nrow(temp)))
head(temp)
print(str_c("Unique reaction: ", length(unique(temp$ReactionID))))
print(str_c("Unique subsystem: ", length(unique(temp$Subsystem))))

bgd_tbl <- temp

## 2. Changed reactions

> To investigate as a system, changed reactions are selected based on not adjusted P-value but nominal P-value at this step.  

In [None]:
#Import the summary tables
fileDir <- "./ExportData/"
ipynbName <- "230502_LC-M001-related-TrOmics-GEM-ver3-15_FluxAnalysis_"
fileName <- "flux-comparison_vs-each-control.tsv"
temp <- read_delim(str_c(fileDir,ipynbName,fileName), delim="\t")
print(str_c("Original nrow: ",nrow(temp)))

#Clean
temp1 <- str_subset(names(temp), "_Pval")
temp <- temp %>%
    dplyr::select(ReactionID, all_of(temp1))
names(temp) <- str_replace(names(temp), "_Pval", "")

print(str_c("nrow: ",nrow(temp)))
head(temp)
print(str_c("Unique reaction: ", length(unique(temp$ReactionID))))

pval_tbl <- temp

## 3. Enrichment Analysis

### 3-1. Intervention vs. each control

#### 3-1-0. Save result objects

In [None]:
comparison_vec <- str_subset(names(pval_tbl), "-vs-")

#Create a workbook object to save as one single .xlsx file
workbook <- createWorkbook()

#Summarize results per comparison
nRxns_vec <- c()
nMappedRxns_vec <- c()
nRxnSystems_vec <- c()
res_list <- list()
for (i in 1:length(comparison_vec)) {
    #Prepare input for enricher()
    comparison <- comparison_vec[i]
    rxns <- pval_tbl %>%
        dplyr::filter(!!as.name(comparison)<0.05) %>%
        .$ReactionID
    print(str_c(comparison,": ",length(rxns)," changed reactions"))
    bgds <- bgd_tbl %>%
        dplyr::select(Subsystem, ReactionID)
    labels <- bgd_tbl %>%
        dplyr::mutate(Label=Subsystem) %>%#Dummy in this case
        dplyr::select(Subsystem, Label) %>%
        dplyr::distinct()
    
    #Save info
    nRxns_vec <- c(nRxns_vec, length(rxns))
    temp <- bgds %>%
        dplyr::filter(ReactionID %in% rxns)
    nMappedRxns_vec <- c(nMappedRxns_vec, length(unique(temp$ReactionID)))
    nRxnSystems_vec <- c(nRxnSystems_vec, length(unique(temp$Subsystem)))
    
    #Enrichment analysis
    temp <- enricher(gene=rxns,
                     pvalueCutoff=1.0,#To export all
                     pAdjustMethod="BH",
                     #universe=backgrounds,#Already managed
                     minGSSize=4,
                     maxGSSize=10000,
                     qvalueCutoff=1.0,#To export all
                     TERM2GENE=bgds,
                     TERM2NAME=labels)
    
    #Add the summary table to the workbook object as an independent sheet
    if (is.data.frame(temp[])) {
        temp1 <- tibble(temp[]) %>%
            dplyr::select(-Description) %>%
            dplyr::rename(Subsystem=ID, Ratio2ChangedRxns=GeneRatio, Ratio2BGs=BgRatio,
                          Pval=pvalue, AdjPval=p.adjust, Qval=qvalue, MappedChangedRxn=geneID, nMappedChangedRxns=Count)
    } else {
        temp1 <- tibble(`n/a`=NA)
    }
    addWorksheet(workbook, sheetName=comparison)
    writeData(workbook, comparison, temp1)
    
    #Add result object to list
    res_list <- c(res_list, list(temp))
}

#Save the workbook as one single .xlsx file
fileDir <- "./ExportData/"
ipynbName <- "230503_LC-M001-related-TrOmics-GEM-ver3-15_Enrichment_"
fileName <- "clusterProfiler-results.xlsx"
saveWorkbook(workbook, file=str_c(fileDir,ipynbName,fileName), overwrite=TRUE)

print(str_c("nObjects: ", as.character(length(res_list))))

#### 3-1-1. Aca

In [None]:
obj_i <- 1
figtitle <- "Acarbose vs. Control"

#Retreive results
comparison <- comparison_vec[obj_i]
nRxns <- nRxns_vec[obj_i]
nMappedRxns <- nMappedRxns_vec[obj_i]
nRxnSystems <- nRxnSystems_vec[obj_i]
res <- res_list[[obj_i]]

#Check
print(comparison)
print(str_c(" - # of the changed reactions: ",as.character(nRxns)))
print(str_c(" - # of the changed reactions that were mapped to any subsystem: ",as.character(nMappedRxns)))
print(str_c(" - # of subsystems having any changed reactions as a member: ",as.character(nRxnSystems)))
res
print(" <- Note that the above 'X enriched terms found' is not correct. In this case, X indicates the number of all the tested terms.")
if (is.data.frame(res[])) {
    tibble(res[]) %>%
        dplyr::filter(pvalue<0.05) %>%#Display only nominal P-value < 0.05
        dplyr::select(-Description) %>%
        dplyr::rename(Subsystem=ID, Ratio2ChangedRxns=GeneRatio, Ratio2BGs=BgRatio,
                      Pval=pvalue, AdjPval=p.adjust, Qval=qvalue, MappedChangedRxn=geneID, nMappedChangedRxns=Count)
}

#Visualization
if (length(tibble(res[]))>0) {
    display <- tibble(res[]) %>%
        dplyr::filter(pvalue<0.05) %>%
        nrow()
} else {
    display <- 0
}
if (display>0) {
    temp <- res %>%
        dplyr::filter(pvalue<0.05) %>%#Display only nominal P-value < 0.05
        dplyr::mutate(PvalLabel=str_c("AdjPval = ",scales::scientific(p.adjust, digits=2)),
                      AdjSignif=ifelse(p.adjust<0.05, "True", "False")) %>%
        barplot(., x="Count", color="p.adjust", showCategory=display) +
        geom_text(aes(label=PvalLabel, color=AdjSignif), nudge_x=2.5, hjust=0) +
        coord_cartesian(clip="off") +
        scale_x_continuous(limits=c(0, 110), breaks=seq(0, 100, by=25), expand=c(0, 0)) +
        scale_y_discrete(labels=function(x) {str_wrap(x, width=50)}) +
        scale_fill_viridis_c(begin=0, end=1, direction=1, option="plasma",
                             limits=c(0, 0.1), breaks=seq(0, 0.1, by=0.025), name="AdjPval") +
        scale_color_manual(values=c("True"="#990000", "False"="gray40"), ) +
        guides(fill=guide_colorbar(reverse=TRUE), color="none") +
        labs(x="Count of the changed reactions",
             y="", title=str_c("Enriched subsystems: ",figtitle)) +
        theme_classic(base_size=16, base_family="Helvetica") +
        theme(text=element_text(face="plain", color="black", family="Helvetica")) +
        theme(axis.text.x=element_text(face="plain", color="black", family="Helvetica"),
              axis.text.y=element_text(face="plain", color="black", family="Helvetica"),
              axis.title=element_text(face="plain", color="black", family="Helvetica")) +
        theme(plot.title=element_text(size=18, hjust=1.0)) +
        theme(legend.direction="vertical", legend.box="horizontal",
              legend.background=element_blank())
    options(repr.plot.width=8.25, repr.plot.height=max(c(1+display*0.25, 2.5)))
    plot(temp)
    #Save
    fileDir <- "./ExportFigures/"
    ipynbName <- "230503_LC-M001-related-TrOmics-GEM-ver3-15_Enrichment_"
    fileName <- str_c(comparison,".pdf")
    ggsave(file=str_c(fileDir,ipynbName,fileName), plot=temp,
           width=8.25, height=max(c(1+display*0.25, 2.5)), units="in")
    #(Font family is not reflected in JupyterLab output, but correctly done in .pdf file.)
}

#### 3-1-2. Rapa

In [None]:
obj_i <- 2
figtitle <- "Rapamycin vs. Control"

#Retreive results
comparison <- comparison_vec[obj_i]
nRxns <- nRxns_vec[obj_i]
nMappedRxns <- nMappedRxns_vec[obj_i]
nRxnSystems <- nRxnSystems_vec[obj_i]
res <- res_list[[obj_i]]

#Check
print(comparison)
print(str_c(" - # of the changed reactions: ",as.character(nRxns)))
print(str_c(" - # of the changed reactions that were mapped to any subsystem: ",as.character(nMappedRxns)))
print(str_c(" - # of subsystems having any changed reactions as a member: ",as.character(nRxnSystems)))
res
print(" <- Note that the above 'X enriched terms found' is not correct. In this case, X indicates the number of all the tested terms.")
if (is.data.frame(res[])) {
    tibble(res[]) %>%
        dplyr::filter(pvalue<0.05) %>%#Display only nominal P-value < 0.05
        dplyr::select(-Description) %>%
        dplyr::rename(Subsystem=ID, Ratio2ChangedRxns=GeneRatio, Ratio2BGs=BgRatio,
                      Pval=pvalue, AdjPval=p.adjust, Qval=qvalue, MappedChangedRxn=geneID, nMappedChangedRxns=Count)
}

#Visualization
if (length(tibble(res[]))>0) {
    display <- tibble(res[]) %>%
        dplyr::filter(pvalue<0.05) %>%
        nrow()
} else {
    display <- 0
}
if (display>0) {
    temp <- res %>%
        dplyr::filter(pvalue<0.05) %>%#Display only nominal P-value < 0.05
        dplyr::mutate(PvalLabel=str_c("AdjPval = ",scales::scientific(p.adjust, digits=2)),
                      AdjSignif=ifelse(p.adjust<0.05, "True", "False")) %>%
        barplot(., x="Count", color="p.adjust", showCategory=display) +
        geom_text(aes(label=PvalLabel, color=AdjSignif), nudge_x=2.5, hjust=0) +
        coord_cartesian(clip="off") +
        scale_x_continuous(limits=c(0, 110), breaks=seq(0, 100, by=25), expand=c(0, 0)) +
        scale_y_discrete(labels=function(x) {str_wrap(x, width=50)}) +
        scale_fill_viridis_c(begin=0, end=1, direction=1, option="plasma",
                             limits=c(0, 0.1), breaks=seq(0, 0.1, by=0.025), name="AdjPval") +
        scale_color_manual(values=c("True"="#990000", "False"="gray40"), ) +
        guides(fill=guide_colorbar(reverse=TRUE), color="none") +
        labs(x="Count of the changed reactions",
             y="", title=str_c("Enriched subsystems: ",figtitle)) +
        theme_classic(base_size=16, base_family="Helvetica") +
        theme(text=element_text(face="plain", color="black", family="Helvetica")) +
        theme(axis.text.x=element_text(face="plain", color="black", family="Helvetica"),
              axis.text.y=element_text(face="plain", color="black", family="Helvetica"),
              axis.title=element_text(face="plain", color="black", family="Helvetica")) +
        theme(plot.title=element_text(size=18, hjust=1.0)) +
        theme(legend.direction="vertical", legend.box="horizontal",
              legend.background=element_blank())
    options(repr.plot.width=8.25, repr.plot.height=max(c(1+display*0.25, 2.5)))
    plot(temp)
    #Save
    fileDir <- "./ExportFigures/"
    ipynbName <- "230503_LC-M001-related-TrOmics-GEM-ver3-15_Enrichment_"
    fileName <- str_c(comparison,".pdf")
    ggsave(file=str_c(fileDir,ipynbName,fileName), plot=temp,
           width=8.25, height=max(c(1+display*0.25, 2.5)), units="in")
    #(Font family is not reflected in JupyterLab output, but correctly done in .pdf file.)
}

#### 3-1-3. CRdiet

In [None]:
obj_i <- 3
figtitle <- "CR diet vs. Control"

#Retreive results
comparison <- comparison_vec[obj_i]
nRxns <- nRxns_vec[obj_i]
nMappedRxns <- nMappedRxns_vec[obj_i]
nRxnSystems <- nRxnSystems_vec[obj_i]
res <- res_list[[obj_i]]

#Check
print(comparison)
print(str_c(" - # of the changed reactions: ",as.character(nRxns)))
print(str_c(" - # of the changed reactions that were mapped to any subsystem: ",as.character(nMappedRxns)))
print(str_c(" - # of subsystems having any changed reactions as a member: ",as.character(nRxnSystems)))
res
print(" <- Note that the above 'X enriched terms found' is not correct. In this case, X indicates the number of all the tested terms.")
if (is.data.frame(res[])) {
    tibble(res[]) %>%
        dplyr::filter(pvalue<0.05) %>%#Display only nominal P-value < 0.05
        dplyr::select(-Description) %>%
        dplyr::rename(Subsystem=ID, Ratio2ChangedRxns=GeneRatio, Ratio2BGs=BgRatio,
                      Pval=pvalue, AdjPval=p.adjust, Qval=qvalue, MappedChangedRxn=geneID, nMappedChangedRxns=Count)
}

#Visualization
if (length(tibble(res[]))>0) {
    display <- tibble(res[]) %>%
        dplyr::filter(pvalue<0.05) %>%
        nrow()
} else {
    display <- 0
}
if (display>0) {
    temp <- res %>%
        dplyr::filter(pvalue<0.05) %>%#Display only nominal P-value < 0.05
        dplyr::mutate(PvalLabel=str_c("AdjPval = ",scales::scientific(p.adjust, digits=2)),
                      AdjSignif=ifelse(p.adjust<0.05, "True", "False")) %>%
        barplot(., x="Count", color="p.adjust", showCategory=display) +
        geom_text(aes(label=PvalLabel, color=AdjSignif), nudge_x=2.5, hjust=0) +
        coord_cartesian(clip="off") +
        scale_x_continuous(limits=c(0, 110), breaks=seq(0, 100, by=25), expand=c(0, 0)) +
        scale_y_discrete(labels=function(x) {str_wrap(x, width=50)}) +
        scale_fill_viridis_c(begin=0, end=1, direction=1, option="plasma",
                             limits=c(0, 0.1), breaks=seq(0, 0.1, by=0.025), name="AdjPval") +
        scale_color_manual(values=c("True"="#990000", "False"="gray40"), ) +
        guides(fill=guide_colorbar(reverse=TRUE), color="none") +
        labs(x="Count of the changed reactions",
             y="", title=str_c("Enriched subsystems: ",figtitle)) +
        theme_classic(base_size=16, base_family="Helvetica") +
        theme(text=element_text(face="plain", color="black", family="Helvetica")) +
        theme(axis.text.x=element_text(face="plain", color="black", family="Helvetica"),
              axis.text.y=element_text(face="plain", color="black", family="Helvetica"),
              axis.title=element_text(face="plain", color="black", family="Helvetica")) +
        theme(plot.title=element_text(size=18, hjust=1.0)) +
        theme(legend.direction="vertical", legend.box="horizontal",
              legend.background=element_blank())
    options(repr.plot.width=6.3, repr.plot.height=max(c(1+display*0.25, 2.5)))
    plot(temp)
    #Save
    fileDir <- "./ExportFigures/"
    ipynbName <- "230503_LC-M001-related-TrOmics-GEM-ver3-15_Enrichment_"
    fileName <- str_c(comparison,".pdf")
    ggsave(file=str_c(fileDir,ipynbName,fileName), plot=temp,
           width=6.3, height=max(c(1+display*0.25, 2.5)), units="in")
    #(Font family is not reflected in JupyterLab output, but correctly done in .pdf file.)
}

# — Session information —

In [9]:
sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/envs/arivale-r/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] openxlsx_4.2.5.2      readxl_1.4.2          enrichplot_1.14.2    
 [4] clusterProfiler_4.2.2 forcats_0.5.1         stringr_1.4.0        
 [7] dplyr_1.0.9           purrr_0.3.4           readr_2.1.2          
[10] tidyr_1.2.0           tibble_3.1.7          ggplot2_3.3.6        
[13] tidyverse_1.3.1      

lo