oCEM: Automatic detection and analysis of overlapping co-expressed gene modules

I. Introduction

When it comes to the co-expressed gene module detection, its typical challenges consist of overlap between identified modules and local co-expression in a subset of biological samples. Recent studies have reported that the decomposition methods are the most appropriate for solving these challenges. In this study, we represent an R tool, termed Overlapping CoExpressed gene Module (oCEM), which possesses those methods with a wholly automatic analysis framework to help non-technical users to easily perform complicated statistical analyses and then gain robust results. We also develop a novel auxiliary statistical approach to select the optimal number of principal components using a permutation procedure. Three example datasets are used, related to human breast cancer, mouse metabolic syndrome, and E.coli gene expression compendium, to enable the illustration of the straightforward use of the tool. Computational experiment results show that oCEM outperforms state-of-the-art techniques in the ability to detect biologically relevant modules additionally.

II. Understanding the tool and Data Structure

The following are parameters included in overlapCEM and their role:

data: a data frame or matrix. data has its rows are samples and its columns are genes.
clinical: a data frame or matrix. Input data serve to perform Pearson's correlations between each identified module and each clinical feature. It includes its rows are samples, and its columns are clinical features of your choice.
ncomp: positive integer. The optimal number of principal components. It should be >= 1.
standardize: logical. If your data are not standardized, just feed T or TRUE to this parameter. Default value is T.
method: string. Post-processing methods. Allowed values are ICA-FDR, ICA-Zscore, or IPCA-FDR. Default value is ICA-Zscore.
cex.text: numeric. Change the font size of texts in cells of the heatmap showing correlations between each identified module and each clinical feature. Default value is 0.7.
verbose: logical. Show the running time to complete running the whole pipeline. Default value is T.

Please download datasets data_n_code and read Additional File 1 (highly recommended) as examples to well grasp oCEM's easy-to-meet format and its usage.

III. Pipeline

Figure: Pipeline of the package oCEM.

IV. Implementation

Use the following command to install directly from GitHub;

devtools::install_github("huynguyen250896/oCEM", dependencies = T)

Call the nescessary libraries;

x = c("oCEM", "dplyr", "dynamicTreeCut", "flashClust","Hmisc",
  "WGCNA", "moments", "fastICA", "tidyr", "fdrtool", "mixOmics",
  "cluster", "purrr", "parallel")
lapply(x, require, character.only = TRUE)

running example:

# oCEM
num_pc <- optimizeCOM(data = exp, cores = 5)
# >> oCEM suggests choosing the optimal number of components is: 9
# >> oCEM also suggests using ICA for your case. 

cem <-overlapCEM(data = exp, clinical = clinicalEXP, ncomp = num_pc)

V. What's new

2023-10-13: I made a bad decision that required the users to input both the mRNA and clinical data into overlapCEM to be able to run the tool successfully. Now, I made the input clinical data optional, meaning that the mRNA data is the only data for the tool to run. Besides, I refactored the codes comprehensively that would make them readable more (and hope that it runs more rapidly also!). Besides, default value to optimizeCOM's method parameter set to ICA-Zscore makes both life scientists and bioinformatics scientists not confused about what to select and serves to compare the performance of oCEM with that of other tools. This decision was based on the results of a wonderful paper [1].
2023-10-08: Users now can set the number of cores to the optimizeCOM algorithm on their own using its new argument cores, meaning that they can parallely perform the algorithm and get the optimal number of PCs more rapidly . Unfortunately, this feature is not available to Window users this time!

VI. Citation

Please kindly cite the following paper (and Star this Github repository if you find this tool of interest) if you use the tool in this repo:

Reference Type: Journal Article
Author: Nguyen, Quang-Huy
Le, Duc-Hau
Year: 2022
Title: oCEM: Automatic detection and analysis of overlapping co-expressed gene modules
Journal: BMC Genomics
Volume: 23
Issue: 1
Pages: 39
Date: 2022/01/08
ISSN: 1471-2164
DOI: 10.1186/s12864-021-08072-5

Feel free to contact Quang-Huy Nguyen <huynguyen96.dnu AT gmail DOT com> for any questions about the code and results.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
R		R
data_n_code		data_n_code
man		man
.DS_Store		.DS_Store
.gitignore		.gitignore
Additional File 1.pdf		Additional File 1.pdf
Additional File 2.xlsx		Additional File 2.xlsx
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
oCEM.Rproj		oCEM.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oCEM: Automatic detection and analysis of overlapping co-expressed gene modules

I. Introduction

II. Understanding the tool and Data Structure

III. Pipeline

IV. Implementation

V. What's new

VI. Citation

About

Releases 2

Packages

Languages

License

huynguyen250896/oCEM

Folders and files

Latest commit

History

Repository files navigation

oCEM: Automatic detection and analysis of overlapping co-expressed gene modules

I. Introduction

II. Understanding the tool and Data Structure

III. Pipeline

IV. Implementation

V. What's new

VI. Citation

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages