example_workflow/example_workflow.Rmd

---
title: "Example of using mitch package for Infinium methylation analysis"
author: "The GMEA team"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_float: true
    fig_width: 7
    fig_height: 7
theme: cosmo
bibliography: references.bib
csl: epigenetics.csl
---

## Introduction

Here we are looking at putting together the easiest possible workflow for the package.

Source code: https://github.com/markziemann/gmea/blob/main/example_workflow/example_workflow.Rmd

## Requirements

Load packages.

Important: ensure that the mitch version used is 1.15.1 or higher.

```{r,packages}

suppressPackageStartupMessages({
  library("limma")
  library("eulerr")
  library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
  library("IlluminaHumanMethylationEPICanno.ilm10b4.hg19")
  library("HGNChelper")
  library("tictoc")
  library("mitch")
  library("kableExtra")
  library("beeswarm")
  library("missMethyl")
  library("gridExtra")
  library("png")
})

if( packageVersion("mitch") < "1.15.1") {
  warning("This workflow requires mitch version 1.15.1 or higher")
}

```

## Load methylation data

The limma[@Ritchie2015-zn] data table is read in.
This dataset is available from GEO (GSE49667)[@Zhang2013-oy].
The limma data was obtained using a previously prepared workflow [@Maksimovic2017-zv].

```{r,loaddata}

dm <- read.csv("DMPs.csv.gz",header=TRUE)

head(dm)

```

Note that this object has probes with their own column, not simply as row names.

## Load pathways

Gene ontologies were downloaded in GMT format from MSigDB on 15th Jan 2024[@Liberzon2015-um;@The_Gene_Ontology_Consortium2023-fm].
The GMT file is read into R using the mitch function `gmt_import()`.

```{r,loadpathways}

gene_sets <- gmt_import("c5.go.v2023.2.Hs.symbols.gmt")

```

## Curate the annotation

One of the critical parts of this workflow is the establishment of probe-gene relationships.
This controls how the probe data is aggregated to make the gene level scores.

In this demonstration we are using all probes on the HM450K chip.
Alternative approaches could use only promoters CpGs for example.

As these annotations are several years old, many of the annotated gene names are no longer
current.
To remedy this, the gene names are screened with the HGNChelper package and any defunct
symbols get updated to the newer gene name, so they will be recognised properly in the
gene sets.

The process for making a gene table for EPIC data is included down below.

```{r,anno1}

tic()
anno <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
myann <- data.frame(anno[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","Islands_Name","Relation_to_Island")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt1 <- stack(gp2)
colnames(gt1) <- c("gene","probe")
gt1$probe <- as.character(gt1$probe)
dim(gt1)
str(gt1)
length(unique(gt1$gene))
toc() #6.0s

tic()
#new.hgnc.table <- getCurrentHumanMap()
new.hgnc.table <- readRDS("../figs/new.hgnc.table.rds")
fix <- checkGeneSymbols(gt1$gene,map=new.hgnc.table)
fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
length(unique(fix2$x))
gt1$gene <- fix$Suggested.Symbol
toc() #  36 sec (2788 gene names updated)

head(gt1)

```

## Mitch pipeline

The first part is to import the data into mitch.

This only works properly when mitch's mapGeneIds function uses `mean` to aggregate (line 69 of mitch.R).

```{r,mg1}

tic()
m <- mitch_import(x=dm,DEtype="limma",geneTable=gt1,geneIDcol="Name")
toc() #2.5 sec

head(m) %>%
  kbl(caption = "Example of differential gene methylation scores used for pathway analysis") %>%
  kable_paper("hover", full_width = F)

```

Now run the enrichment analysis.

Note that the results are not sorted by p-value, rather S.distance, an enrichment score.
I think this works better for interpretation.

```{r,mitch1}

tic()
mres <- mitch_calc(x=m,genesets=gene_sets,minsetsize=5, priority="effect")
toc() #7 secs

mtable <- mres$enrichment_result
up <- subset(mtable,s.dist>0 & p.adjustANOVA<0.05)
dn <- subset(mtable,s.dist<0 & p.adjustANOVA<0.05)
nrow(up)
nrow(dn)

head(up,10) %>%
  kbl(caption = "Top significant pathways with higher methylation") %>%
  kable_paper("hover", full_width = F)

head(dn,10) %>%
  kbl(caption = "Top significant pathways with lower methylation") %>%
  kable_paper("hover", full_width = F)

```

Now make a barplot of these top findings.

```{r, barplot}

top <- rbind(up[1:10,],dn[1:10,])
top <- top[order(top$s.dist),]

barnames <- gsub("_"," ",top$set)
cols <- as.character(((sign(top$s.dist)+1)/2)+1)
cols <- gsub("1","blue",gsub("2","red",cols))

par(mar = c(5.1, 29.1, 4.1, 2.1))
barplot(abs(top$s.dist),horiz=TRUE,las=1,names.arg=barnames, cex.names=0.8,
  cex.axis=0.8,col=cols, xlab="Enrichment score",main="Gene Ontologies")
mtext("whole gene")
grid()

par( mar = c(5.1, 4.1, 4.1, 2.1) )

```

There are some interesting results there that give hints to the differences between naive T and Treg cells.

Make a html report and some charts.

```{r,report1}

mitch_report(res=mres,outfile="example_mitchreport.html",overwrite=TRUE)
mitch_plots(res=mres,outfile="example_mitchcharts.pdf")

```

## Make gene table for EPIC array data

In case you are working with EPIC array data, the probe-gene table can be make with this snippet of code.

```{r,gt2}

tic()
anno <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
myann <- data.frame(anno[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","Islands_Name","Relation_to_Island")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt <- stack(gp2)
colnames(gt) <- c("gene","probe")
gt$probe <- as.character(gt$probe)
dim(gt)
str(gt)
toc() #9.0s

tic()
#new.hgnc.table <- getCurrentHumanMap()
new.hgnc.table <- readRDS("../figs/new.hgnc.table.rds")
fix <- checkGeneSymbols(gt$gene,map=new.hgnc.table)
fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
length(unique(fix2$x))
gt$gene <- fix$Suggested.Symbol
toc()

```

## Session information

```{r,save}

sessionInfo()

```

## Bibliography