## Meta Analysis of expression data

Meta-analysis aims at combining the results of the different datasets from the expression analysis, generating average effect sizes and p values for one gene across the different datasets. Genes will be ranked according to this global p value, using the Rank library from the R Basic package.

The MetaDE R library (slightly modified) will be used to perform this analysis. The algorithm takes as input the p values, observed effect size (logFC values) and observed variance. More details below. [, which is computed from SE^2, where SE (Standard Error) is the difference of limma 95% CIs divided by 3.92. remove this?]


In [1]:
# Import pipeline functions for meta analysis
source("scripts/metaDE.R")

"replacing previous import 'limma::plotMA' by 'DESeq2::plotMA' when loading 'MetaDE'"


### 1. Prepare datasets for meta analysis

Creates a dataframe cointaing estimators needed for metaDE. Estimators are Effect Size, Variance and P value. The input is the result of differential analysis (e.g. limma table) and it must have variance (or SE (Standard Error) or confidence intervals from which the variance can be calculated), ES (=logFC) and p value - or just p value if the meta analysis is to be done just combining p values.

The Variance is calculated as SE^2, and SE is calculated as the difference of confidence intervals divided by 3.92.
Variance and/or SE can come from other differential expression algorithms, just modify prepare_matrix_function as you need.

- 1st parameter: path where the input dataset is located
- 2nd parameter: filename of input dataset. The one containing limma/differential analysis results
- 3rd parameter: type of dataframe to be prepared. "onlyP" if you want to consider only P values to do the meta analysis. By default it calculates all estimators.

In [3]:
file.gse15222 = "limma_Case_Control_annot"
path.gse15222 = "/mnt/data/GWES/Microarray/output/GSE15222"
data.gse15222<-prepare_matrix_function(path.gse15222,file.gse15222)

In [4]:
head(data.gse15222,n=3)

Unnamed: 0_level_0,ES,Var,P.Value,gene
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<chr>
ZNF264,0.9020393,0.004864041,1.618642e-31,ZNF264
NFKB1,0.4974747,0.001849,1.9326689999999998e-26,NFKB1
SVOP,-0.7692022,0.004666865,2.547848e-25,SVOP


In [5]:
file.gse48350 = "DE_hippocampus_casectrl_annot" 
path.gse48350 = "/mnt/data/GWES/Microarray/output/GSE48350/"
data.gse48350 <- prepare_matrix_function(path.gse48350,file.gse48350)

In [6]:
head(data.gse48350,n=3)
# if the function is called with nore than one input file
# str(data.gse48350)
# head(data.gse48350[[1]])
# head(data.gse48350$EC)

Unnamed: 0_level_0,ES,Var,P.Value,gene
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<chr>
OPA1,-0.639591,0.01142902,8.460282e-08,OPA1
INPP5F,-0.8587221,0.02068575,8.871989e-08,INPP5F
SYT13,-1.1375045,0.03908325,2.068461e-07,SYT13


* **Combine all prepared datasets**

Use parameter all = FALSE if you only want common genes to appear in the final ranking.

In [7]:
# Reduce can be use to merge more than two dataframes
allmatrix<-Reduce(function(x, y) merge(x, y, all=TRUE,by="gene"), list(data.gse15222,data.gse48350))
#allmatrix=merge(data.gse15222,data.gse48350,by="gene", all = TRUE) # merge can be used for just two datasets
dim(allmatrix)
head(allmatrix)

Unnamed: 0_level_0,gene,ES.x,Var.x,P.Value.x,ES.y,Var.y,P.Value.y
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,A1BG,,,,-0.01313243,0.003461993,0.820553511
2,A1BG-AS1,,,,0.06210514,0.006111045,0.420610767
3,A1CF,,,,0.05883921,0.002338179,0.21906956
4,A2M,0.08804011,0.02103422,0.5427902,0.25420002,0.021272891,0.080343603
5,A2M-AS1,,,,0.35336664,0.016523331,0.006771993
6,A2ML1,,,,0.17755806,0.003958125,0.005511303


### 2. Perform meta analysis

**meta_function** performs a meta analysis using the MetaDE.ES and MetaDE.pvalue algorithms from metaDE R library. This method is customised so that it gives also an estimator when the gene is not present in all datasets. More details in the script itself, scripts/MetaDE.ES_custom.R
 
It ranks the resulting genes in ascending order of Fisher P value, assigning the same rank if p value is NA.

- 1st parameter: single dataframe merging datasets obtained from prepare_matrix_function
- 2nd parameter: optional - key name to add to logFC column in result table (e.g case-control)
- 3rd parmeter: optional - path where to store the output file (default to current directory)
- 4rd parameter: optional - output file name 

In [11]:
meta_function(allmatrix = allmatrix,keyname = "case-ctl", "/mnt/data/MetaAnalysis/output")

         ES.x        ES.y
A1BG       NA -0.01313243
A1BG-AS1   NA  0.06210514
A1CF       NA  0.05883921
         Var.x       Var.y
A1BG        NA 0.003461993
A1BG-AS1    NA 0.006111045
A1CF        NA 0.002338179


* doing Meta DE
         P.Value.x P.Value.y
A1BG            NA 0.8205535
A1BG-AS1        NA 0.4206108
A1CF            NA 0.2190696


* doing Meta P
        rank logFC.case-ctl         Var      Qpvalue  REM.Pvalue    REM.FDR Fisher.Pvalue   Fisher.FDR n estimators
ZNF264     1      0.3552904 0.299334526 1.502152e-26 0.516086802 0.67039523  1.427335e-31 1.172413e-27            2
SVOP       2     -0.7902721 0.005249884 3.104079e-01 0.000000000 0.00000000  2.022028e-27 8.304468e-24            2
NFKB1      3      0.3817504 0.016106871 9.565457e-03 0.002629964 0.01517535  8.712374e-27 2.263827e-23            2
SRGAP1     4      0.6532914 0.054017894 1.850492e-04 0.004941036 0.02440919  1.102424e-26 2.263827e-23            2
DSTYK      5      0.0477448 0.151225067 2.776195e-14 0.90

Note: Fisher p-value is 0 when value is smaller than 10e-16.

**meta_P_function** performs a meta analysis using the MetaDE.pvalue algorithm from metaDE R library - with just P values. 

It ranks the resulting genes in ascending order of Fisher P value.
- 1st parameter: single dataframe merging datasets obtained from prepare_matrix_function
- 2nd parameter (keyname): optional, key name to add to logFC column in result table
- 3rd parmeter (output.path): path where to store the output file (default to current directory)
- 4rd parameter (output.file): optional - output file name 

In [12]:
meta_P_function(allmatrix = allmatrix,output.path = "/mnt/data/MetaAnalysis/output",output.file = "metaP_GSEresult_case_control" )

         P.Value.x   P.Value.y
A1BG            NA 0.820553511
A1BG-AS1        NA 0.420610767
A1CF            NA 0.219069560
A2M      0.5427902 0.080343603
A2M-AS1         NA 0.006771993
A2ML1           NA 0.005511303


* doing Meta P
        rank Fisher.Pvalue   Fisher.FDR
ZNF264     1  1.427335e-31 1.172413e-27
SVOP       2  2.022028e-27 8.304468e-24
NFKB1      3  8.712374e-27 2.263827e-23
SRGAP1     4  1.102424e-26 2.263827e-23
DSTYK      5  2.056165e-26 3.377867e-23
ATP6V1H    6  6.460666e-25 8.703545e-22


 * writing to  /mnt/data/MetaAnalysis/output