Skip to content

showteeth/DEbPeak

DEbPeak - Analyze and integrate multi-omics to unravel the regulation of gene expression.

License CODE_SIZE

Introduction

DEbPeak aims to explore, visualize, interpret multi-omics data and unravel the regulation of gene expression by combining RNA-seq with peak-related data (eg: ChIP-seq, ATAC-seq, m6a-seq et al.). It contains eleven functional modules:

  • Parse GEO: Extract study information, raw count matrix and metadata from GEO database.
  • Quality Control (QC): QC on count matrix and samples.
    • QC on count matrix: Proportion of genes detected in different samples under different CPM thresholds and the saturation of the number of genes detected.
    • QC on samples: Euclidean distance and pearson correlation coefficient of samples across different conditions, sample similarity on selected principal components (check batch information and conduct batch correction) and outlier detection with robust PCA.
  • Principal Component Analysis (PCA): this module can be divided into three sub modules, basic info, loading related and 3D visualization.
    • Basic info: scree plot (help to select the useful PCs), biplot (sample similarity with corresponding genes with larger loadings) and PC pairs plot (sample similarity under different PC combinations).
    • Loading related: visualize genes with larger positive and negative loadings on selected PCs, conduct GO enrichment analysis on genes with larger positive and negative loadings on selected PCs.
    • 3D visualization: visualize samples on three selected PCs.
  • Differential Analysis and Visualization: this module includes seven powerful visualization methods (Volcano Plot, Scatter Plot, MA Plot, Rank Plot, Gene/Peak Plot, Heatmap, Pie Plot for peak-related data).
  • Functional Enrichment Analysis (FEA): GO enrichment analysis, KEGG enrichment analysis, Gene Set Enrichment Analysis (GSEA).
    • GO (Biological Process, Molecular Function, Cellular Component) and KEGG on differential expression genes or accessible/binding peaks.
    • GSEA on all genes (Notice: GSEA is not available for peak-related data)
  • Predict transcription factors (PredictTFs): Identify transcription factors with differentially expressed genes, DEbPeak provides three methods (BART, ChEA3 and TFEA.ChIP).
  • Motif analysis:
    • de novo motif discovery
    • motif enrichment
  • Integrate RNA-seq with peak-related data:
    • Get consensus peaks: For multiple peak files, get consensus peaks; for single peak file, use it directly (used in consensus integration mode).
    • Peak profile plots: Heatmap of peak binding to TSS regions, Average Profile of ChIP peaks binding to TSS region, Profile of ChIP peaks binding to different regions (used in consensus integration mode).
    • Peak annotaion (used in consensus integration mode).
    • Integrate RNA-seq with peak-related data (consensus mode): Integrate RNA-seq with peak-related data to find direct targets, including up-regulated and down-regulated.
    • Integrate RNA-seq with peak-related data (differential mode): Integrate RNA-seq and peak-related data based on differential analysis.
    • Integration summary: include venn diagram and quadrant diagram (differential mode).
    • GO enrichment on integrated results.
    • Find motif on integrated results: Due to the nature of ATAC-seq, we usually need to find motif on integrated results to obtain potential regulatory factors.
  • Integrate RNA-seq with RNA-seq:
    • Integration summary: include venn diagram and quadrant diagram.
    • GO enrichment on integrated results.
  • Integrate peak-related data with peak-related data:
    • Integration summary: include venn diagram and quadrant diagram (differential mode).
    • GO enrichment on integrated results.
  • Utils: useful functions, including creating enrichment plot for selected enrichment terms, gene ID conversion and count normalization(DESeq2’s median of ratios, TMM, CPM, TPM, RPKM).

To enhance the ease of use of the tool, we have also developed an web server for DEbPeak that allows users to submit files to the web page and set parameters to get the desired results. Unlike the standalone R package, the web server has built-in DESeq2 for differential analysis, while the R package can accept user input results from DESeq2 or edgeR, which will be more flexible.

By the way, all plots generated are publication-ready , and most of them are based on ggplot2, so that users can easily modify them according to their needs. We also provide various color palettes, including discrete and continuous, color blind friendly and multiple categorical variables.


Framework

DEbPeak_framework

Application scenarios for multi-omics integration

DEbPeak_scenarios

Installation

R package

You can install the package via the Github repository:

# install.packages("devtools")   #In case you have not installed it.

# install prerequisites for enrichplot and ChIPseeker
devtools::install_version("ggfun", version = "0.0.6", repos = "https://cran.r-project.org")
devtools::install_version("aplot", version = "0.1.6", repos = "https://cran.r-project.org")
devtools::install_version("scatterpie", version = "0.1.7", repos = "https://cran.r-project.org")

# For mac, you may need to install xquartz: brew install --cask xquartz

# install DEbPeak
devtools::install_github("showteeth/DEbPeak")

In general, it is recommended to install from Github repository (update more timely).

For other issues about installation, please refer Installation guide.

Install additional tools:

# install MSPC --- consensus peak
wget --quiet https://github.com/Genometric/MSPC/releases/latest/download/linux-x64.zip -O MSPC_linux_x64.zip && unzip -q MSPC_linux_x64.zip -d mspc && cd mspc && chmod +x mspc

# install meme --- motif anaysis
## install from source
cd /opt && wget --quiet https://meme-suite.org/meme/meme-software/5.5.5/meme-5.5.5.tar.gz -O meme-5.5.5.tar.gz && tar -zxf meme-5.5.5.tar.gz && cd meme-5.5.5 && ./configure --prefix=`pwd`/meme-5.5.5/meme --enable-build-libxml2 --enable-build-libxslt && make && make install
## install from conda: conda install -c bioconda meme

# install homer --- motif enrichment
## install from source
mkdir homer && cd homer && wget --quiet http://homer.ucsd.edu/homer/configureHomer.pl -O configureHomer.pl && chmod +x configureHomer.pl && perl configureHomer.pl -install
## install from conda: conda install -c bioconda homer
## Downloading Homer Packages: http://homer.ucsd.edu/homer/introduction/install.html

# install deeptools and bart
pip install deeptools numpy pandas scipy tables scikit-learn matplotlib
wget --quiet https://virginia.box.com/shared/static/031noe820hk888qzcxvw1cazol1gdhi0.gz -O bart_v2.0.tar.gz && tar -zxf bart_v2.0.tar.gz
## Download the resources and setup the configuration file
## https://zanglab.github.io/bart/index.htm#install

Docker

We also provide a docker image to use:

# pull the image
docker pull soyabean/debpeak:1.2

# run the image
docker run --rm -p 8888:8787 -e PASSWORD=passwd -e ROOT=TRUE -it soyabean/debpeak:1.2

Notes:

  • After running the above codes, open browser and enter http://localhost:8888/, the user name is rstudio, the password is passwd (set by -e PASSWORD=passwd)
  • If port 8888 is in use, change -p 8888:8787
  • The meme suit path: /opt/meme-5.5.5/meme/bin.
  • The homer suit path: /opt/homer/bin.
  • The configureHomer.pl path: /opt/homer.
  • The bart path: /opt/bart_v2.0/bin
  • You still need to download the resources and setup the configuration file for bart and download species packages for homer.

Usage

Vignette

Detailed usage is available in here. We divide these vignettes into four categories:

Function list

Type Function Description Key packages
Parse GEO ParseGEO Extract study information, raw count matrix and metadata from GEO database GEOquery
Quality Control CountQC Quality control on count matrix (gene detection sensitivity and sequencing depth saturation) NOISeq
SampleRelation Quality control on samples (sample clustering based on euclidean distance and pearson correlation coefficient) stats
OutlierDetection Detect outlier with robust PCA rrcov
QCPCA PCA related functions used in quality control (batch detection and correction, outlier detection) stats, sva, rrcov
Principal Component Analysis PCA Conduct principal component analysis stats
PCABasic Generated PCA baisc plots, including screen plot, biplot and pairs plot PCAtools
ExportPCGenes Export genes of selected PCs tidyverse
LoadingPlot PCA loading plot, including bar plot and heatmap ggplot2, ComplexHeatmap
LoadingGO GO enrichment on PC’s loading genes clusterProfiler
PCA3D Create 3D PCA plot plot3D
Differential Analysis ExtractDA Extract differential analysis results tidyverse
VolcanoPlot VolcanoPlot for differential analysis results ggplot2
ScatterPlot ScatterPlot for differential analysis results ggplot2
MAPlot MA-plot for differential analysis results ggplot2
RankPlot Rank plot for differential analysis results ggplot2
GenePlot Gene expresion or peak accessibility/binding plot ggplot2
DEHeatmap Heatmap for differential analysis results ComplexHeatmap
DiffPeakPie Stat genomic regions of differential peaks with pie plot ggpie
ConductDESeq2 Conduct differential analysis with DESeq2 NOISeq, stats, sva, rrcov, PCAtools, DESeq2, ggplot2, ComplexHeatmap, clusterProfiler, plot3D, tidyverse
Functional Enrichment Analysis ConductFE Conduct functional enrichment analysis (GO and KEGG) clusterProfiler
ConductGSEA Conduct gene set enrichment analysis (GSEA) clusterProfiler
VisGSEA Visualize GSEA results enrichplot
Predict Transcription Factors InferRegulator Predict TFs   from RNA-seq data with ChEA3, BART2 and TFEA.ChIP ChEA3, BART2,   TFEA.ChIP
VizRegulator Visualize the   Identified TFs ggplot2
Motif Analysis MotifEnrich Motif enrichment for differentially accessible/binding peaks HOMER
MotifDiscovery de novo motif discovery with STREME MEME
MotifCompare Map motifs against a motif database with Tomtom MEME
Peak-related Analysis PeakMatrix Prepare count matrix and sample metadata for peak-related data DiffBind, ChIPseeker
GetConsensusPeak Get consensus peak from replicates MSPC
PeakProfile Visualize peak accessibility/binding profile ChIPseeker
AnnoPeak Assign peaks with the genomic binding region and nearby genes ChIPseeker
PeakAnnoPie Visualize peak annotation results with pie plot ggpie
Integrate RNA-seq with Peak-related Data DEbPeak Integrate differential expression results and peak annotation/differential analysis results. tidyverse
DEbPeakFE GO enrichment on integrated results clusterProfiler
DEbCA Integrate differential expression results and peak annotation results (two kinds of peak-related data) tidyverse
ProcessEnhancer Get genes near differential peaks IRanges
InteVenn Create a Venn diagram for integrated results (support DEbPeak, DEbDE, PeakbPeak) ggvenn
InteDiffQuad Create quadrant diagram for differential expression analysis of RNA-seq and peak-related data ggplot2
NetViz Visualize   enhancer-gene network results igraph,   ggnetwork
FindMotif Find motif on integrated results HOMER
Integrate RNA-seq with RNA-seq DEbDE Integrate Two Differential Expression Results tidyverse
DEbDEFE GO Enrichment on Two Differential Expression Integration Results. clusterProfiler
Integrate Peak-related Data with Peak-related Data PeakbPeak Integrate Two Peak Annotation/Differential Analysis Results. tidyverse
PeakbPeakFE GO Enrichment on Two Peak Annotation/Differential Analysis Integration Results. clusterProfiler
Utils

EnrichPlot Create a bar or   dot plot for selected functional enrichment analysis results (GO and KEGG) ggplot2
IDConversion Gene ID conversion between ENSEMBL ENTREZID SYMBOL clusterProfiler
GetGeneLength Get gene length from GTF GenomicFeatures, GenomicRanges
NormalizedCount Perform counts normalization (DESeq2’s median of ratios, TMM, CPM, RPKM, TPM) DESeq2, edgeR, tidyverse

Notice

  • The KEGG API has changed, to perform KEGG enrichment, you'd better update clusterProfiler >= 4.7.1.

Contact

For any question, feature request or bug report please write an email to songyb0519@gmail.com.


Code of Conduct

Please note that the DEbPeak project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.