Reproducible analysis code for a systematic benchmark of six DNA methylation profiling technologies across diverse sequencing platforms.
MethylBench systematically compares six widely used DNA methylation profiling technologies:
| Technology | Type | CpG Coverage |
|---|---|---|
| Illumina EPIC array | Array-based | ~850k CpGs |
| TWIST Methylation Panel | Targeted short-read | 2–4 million CpGs |
| Whole-Genome Enzymatic Conversion (WGEC) | Genome-wide short-read | 28–30 million CpGs |
| Reduced Representation Bisulfite Sequencing (RRBS) | Enrichment-based short-read | 1–4 million CpGs |
| Oxford Nanopore Technologies (ONT) | Long-read | 28–30 million CpGs |
| Pacific Biosciences (PacBio) | Long-read | 28–30 million CpGs |
Analyses were performed on matched blood and fibroblast samples from 5 individuals and two Genome in a Bottle (GIAB) reference samples (HG001, HG002).
MethylBench/
│
├── README.md
├── LICENSE
├── .gitignore
│
├── scripts/
│ ├── bash/
│ │ ├── 01_modkit_pileup.sh # modkit, extract methylation information from aligned .bam files.
│ │ ├── 02_toulligqc.sh # ToulligQC, perform QC analysis on alignend .bam files and create intermediary files.
│ │ ├── 03_pbcpgtools.sh # pb-cpg-tools, extract methylation information from PacBio alignment files.
│ │ └── 04_methylseq.sh # nf-core/methylseq, run the nextflow methylseq pipeline for standard short-read data processing.
│ ├── python/
│ │ ├── 05_parse_toulligqc.py # Summarize over ToulligQC .data files into one QC table
│ └── R/
│ ├── 06_visualize_toulligqc_summary.R # Visualization for ONT QC reports.
│ ├── 07_generate_cpg_stats.R # Summarize CpG information for further analysis.
│ ├── 08_qc_visualization.R # QC Visualization, Figure 3.
│ ├── 09_correlation_analysis.R # Correlation Analysis, Figure 4.
│ ├── 10_density_plots.R # Methylation Density Analysis, Figure 5.
│ ├── 11_pca.R # Principal Component Analysis, Figure 6.
│ ├── 12_differential_methylation.R # Differential Methylation Analysis, Figure 7.
│ ├── 13_annotation.R # Visualization for ONT QC reports.
│ ├── limma_diff_meth.R # Provides functionality to run also the limma approach (Needed for some figures un 12_differential_methylation.R).
│ └── helpers.R # Helper functionality.
│
└── envs/
├── environment.yml # Basic environment, Tools and Python utility
├── ont.yml # ONT related tools, modkit, toulligQC
├── pacbio.yml # PacBio specific tool, pb-cpg-tools
└── r_analysis.yml # R-related packages for R-analysis
- Conda >= 23.x
- Nextflow >= 20.x
- Singularity >= 3.x
- R >= 4.3
- Python >= 3.10
All tool-specific dependencies are managed via Conda environments defined in envs/.
# Clone repository
git clone https://github.com/epigenetics-sb/MethylBench
cd MethylBench
# Create environments
cd envs/
conda env create -f environment.yml
conda env create -f ont.yml
conda env create -f pacbio.yml
conda env create -f r-analysis.yml
cd ..################################################################################################
# NOTE: Please check the scripts for proper argument structure and folder setup for all scripts!
################################################################################################
# Run preprocessing steps
# All bash scripts can be properly run, as the help descriptions tell you.
conda activate ont
bash scripts/bash/01_modkit_pileup.sh --help
bash scripts/bash/02_toulligqc.sh --help
conda deactivate
conda activate pacbio
bash scripts/bash/03_pbcpgtools.sh --help
conda deactivate
conda activate methylbench
bash scripts/bash/04_methylseq.sh --help
# Run tool QC visualization
python3 scripts/python/05_parse_toulligqc.py
# Run actual analysis in R
Rscript scripts/R/06_visualize_toulligqc_summary.R
Rscript scripts/R/07_generate_cpg_stats.R
Rscript scripts/R/08_qc_visualization.R
Rscript scripts/R/09_correlation_analysis.R
Rscript scripts/R/10_density_plots.R
Rscript scripts/R/11_pca.R
Rscript scripts/R/limma_diff_meth.R
Rscript scripts/R/12_differential_methylation.R
Rscript scripts/R/13_annotation.RProcessed methylation matrices are available from the corresponding author upon request.
GIAB reference samples (HG001/NA12878, HG002/NA24385) including PacBio methylation data are publicly available via the PacBio website.
Two pre-computed summary tables are required as input for the downstream R analysis scripts. Both are provided in data/stats/:
Per-sample counts of overlapping CpGs at increasing coverage thresholds, computed across all methods simultaneously. Generated programmatically from the raw per-CpG methylation files:
Rscript scripts/R/07_generate_cpg_stats.R \
--samplesheet [samplesheet.tsv] \
--datadir data/ \
--outdir data/stats/Per-sample × per-method QC metrics. This table was assembled manually by extracting summary statistics from the QC reports of each tool:
| Column | Source | Tool / File |
|---|---|---|
Mean_meth_general |
Global mean methylation | Bismark summary report / modkit stats |
Mean_meth_overlapped |
Mean methylation at overlapping CpGs | Computed from merged matrices |
Mean_meth_10x |
Mean methylation at ≥10× CpGs | Computed from merged matrices |
Passed_reads |
Fraction of reads passing QC | nf-core/methylseq MultiQC report (RRBS/WGEC/TWIST), ToulligQC report (ONT) |
Mean_readlength_passed |
Mean read length of passing reads | MultiQC (short-read), ToulligQC .data report (ONT) |
Mean_Cov |
Mean CpG-level coverage | Bismark coverage report / modkit / pb-cpg-tools |
Insert_size |
Mean insert size (TWIST only) | Picard InsertSizeMetrics via MultiQC |
Unique_alignments |
Number of uniquely aligned reads | Bismark alignment report / modkit stats |
Mean_Genome_Cov |
Mean genome-wide coverage (ONT only) | samtools coverage summary |
Note: Because
qc_stats_methylbench.tabaggregates heterogeneous per-tool reports that do not share a common machine-readable format, it was compiled manually and is not auto-generated by this pipeline.
Each R script in scripts/R/ corresponds directly to a figure in the manuscript:
# Figure 3 – QC metrics and coverage
Rscript scripts/R/08_qc_visualization.R
# Figure 4 – Cross-platform correlation
Rscript scripts/R/09_correlation_analysis.R
# Figure 5 – Methylation density distributions
Rscript scripts/R/10_density_plots.R
# Figure 6 – Principal component analysis
Rscript scripts/R/11_pca.R
# Figure 7 – Differential methylation analysis
Rscript scripts/R/12_differential_methylation.R
# Figure S14 – DMC annotation
Rscript scripts/R/13_annotation.RAll scripts expect preprocessed (methylation) matrices as input.
This project is licensed under the MIT License – see LICENSE for details.
For questions regarding the analysis code, please open a GitHub issue or contact the corresponding author: Julia Schulze-Hentrich – Department of Genetics, Saarland University Lukas Laufer - Department of Genetics, Saarland University