Skip to content
epzjlm edited this page Jan 18, 2024 · 46 revisions

Welcome to meffil!

Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics. 2018 Jun 21.

This wiki will guide you through processing and analyzing 450k and 850k (EPIC) methylation arrays.

Please make sure you have installed version 1.1.0 or higher. You can check the version like this:

packageVersion("meffil")

Meffil has been optimised for speed and memory, and instructions on how to do this can be found here:

  1. Installation
  2. Sample QC
  3. Functional normalization
  4. Functional normalizing separate datasets
  5. Extracting structural variants
  6. Estimating cellular composition
  7. Removing chrX and chrY probes
  8. Running EWAS
  9. Extracting CpG annotations
  10. Extracting SNP annotations
  11. Extracting detection p-values
  12. Extracting methylated and unmethylated intensities
  13. Generate normalization report from normalised betas
  14. Full pipeline for analysing massive datasets
  15. Common problems
  16. Citation

In addition, the tests directory has several complete analysis examples applied to publicly available datasets.

  1. Normalization of a small GEO dataset (GSE86831)
  2. Normalization of a slightly larger GEO dataset (GSE55491) and calculation of cell count estimates
  3. Normalization of an Illumina EPIC dataset and identification of CpG sites in popular DNA methylation age models that are missing from the EPIC microarray
  4. Normalization of Illumina 450K and EPIC microarrays together
  5. Normalization of Illumina EPIC v2 dataset
  6. Normalization and EWAS of a cord blood dataset and comparison of cord cell type references
  7. Saving normalized data to disk to minimize memory requirements and performing a basic EWAS without loading the full dataset into memory
  8. Comprehensive comparison to minfi outputs
  9. Adjusting for random effects in functional normalization
  10. QC, normalization and EWAS without loading the full dataset into memory