## Meta Analysis of expression data

Meta-analysis aims at combining the results of the different datasets from the expression analysis, generating average effect sizes and p values for one gene across the different datasets. Genes will be ranked according to this global p value, using the Rank library from the R Basic package.

The MetaDE R library (slightly modified) will be used to perform this analysis. The algorithm takes as input the p values, observed effect size (logFC values) and observed variance. More details below. [, which is computed from SE^2, where SE (Standard Error) is the difference of limma 95% CIs divided by 3.92. remove this?]


In [2]:
# Import pipeline functions for meta analysis
source("scripts/metaDE.R")
#source("/mnt/data/scripts/metaDE.R")

### 1. Prepare datasets for meta analysis

Creates a dataframe cointaing estimators needed for metaDE. Estimators are Effect Size, Variance and P value. The input is the result of differential analysis (e.g. limma table) and it must have variance (or SE (Standard Error) or confidence intervals from which the variance can be calculated), ES (=logFC) and p value - or just p value if the meta analysis is to be done just combining p values.

The Variance is calculated as SE^2, and SE is calculated as the difference of confidence intervals divided by 3.92.
Variance and/or SE can come from other differential expression algorithms, just modify prepare_matrix_function as you need.

- 1st parameter: path where the input dataset is located
- 2nd parameter: filename of input dataset. The one containing limma/differential analysis results
- 3rd parameter: type of dataframe to be prepared. "onlyP" if you want to consider only P values to do the meta analysis. By default it calculates all estimators.

In [3]:
file.gse15222 = "limma_Case_Control_annot"
path.gse15222 = "/home/guess/GeneExpression/results"
data.gse15222<-prepare_matrix_function(path.gse15222,file.gse15222)

In [4]:
head(data.gse15222,n=3)

Unnamed: 0,ES,Var,P.Value,gene
ZNF264,0.992701,0.005860051,1.289322e-31,ZNF264
NFKB1,0.5218817,0.002048756,2.816646e-26,NFKB1
DSTYK,0.4570696,0.00162615,1.433735e-25,DSTYK


In [5]:
file.gse48350 = "limma_Case_Control_annot" 
path.gse48350 = "/home/guess/GeneExpression/results/resultsGSE48350"
data.gse48350 <- prepare_matrix_function(path.gse48350,file.gse48350)
# if you have more than one dataset of the same platform, for example, different regions in which you have divided your study:
#files.gse48350 = c("EC_limma_Case_Control_annot","PG_limma_Case_Control_annot","SFG_limma_Case_Control_annot")
#names <- c("EC","PG","SFG" ) 
#data.gse48350 <- mapply(prepare_matrix_function,path.gse48350,files.gse48350,SIMPLIFY = FALSE)
#names(data.gse48350) <- names

In [6]:
head(data.gse48350,n=3)
# if the function is called with nore than one input file
# str(data.gse48350)
# head(data.gse48350[[1]])
# head(data.gse48350$EC)

Unnamed: 0,ES,Var,P.Value,gene
C7orf50,-0.3445536,0.003530735,1.321551e-07,C7orf50
SAMSN1,0.3674606,0.004786039,9.269699e-07,SAMSN1
CES2,-0.3593238,0.004682026,1.175101e-06,CES2


* **Combine all prepared datasets**

Use parameter all = FALSE if you only want common genes to appear in the final ranking.

In [7]:
# Reduce can be use to merge more than two dataframes
allmatrix<-Reduce(function(x, y) merge(x, y, all=TRUE,by="gene"), list(data.gse15222,data.gse48350))
#allmatrix=merge(data.gse15222,data.gse48350,by="gene", all = TRUE) # merge can be used for just two datasets
dim(allmatrix)
head(allmatrix)

gene,ES.x,Var.x,P.Value.x,ES.y,Var.y,P.Value.y
A1BG,-0.01019883,0.02079577,0.9434428,0.02187616,0.004188154,0.73170992
A1BG-AS1,,,,-0.0397788,0.004841772,0.56235627
A1CF,,,,-0.04542535,0.001528735,0.24089503
A2M,0.21301048,0.02375735,0.1663719,0.10833444,0.003603714,0.07052116
A2M-AS1,,,,0.04240487,0.014496622,0.72094139
A2ML1,,,,0.08216298,0.0044085,0.21193764


### 2. Perform meta analysis

Performs a meta analysis using the MetaDE.ES and MetaDE.pvalue algorithms from metaDE R library. This method is customised so that it gives also an estimator when the gene is not present in all datasets. More details in /mnt/data/scripts/MetaDE.ES_custom.R
 
It ranks the resulting genes in ascending order of Fisher P value, assigning the same rank if p value is NA.

- 1st parameter: single dataframe merging datasets obtained from prepare_matrix_function
- 2nd parameter: optional - key name to add to logFC column in result table (e.g case-control)
- 3rd parmeter: optional - path where to store the output file (default to current directory)
- 4rd parameter: optional - output file name 

In [8]:
meta_function(allmatrix = allmatrix,keyname = "case-ctl")



* doing Meta DE
         P.Value.x  P.Value.y
A1BG     0.9434428 0.73170992
A1BG-AS1        NA 0.56235627
A1CF            NA 0.24089503
A2M      0.1663719 0.07052116
A2M-AS1         NA 0.72094139
A2ML1           NA 0.21193764


* doing Meta P
                rank logFC.case-ctl         Var      Qpvalue   REM.Pvalue     REM.FDR Fisher.Pvalue Fisher.FDR
ANKHD1-EIF4EBP3    1      0.2311820 0.141041153 5.669907e-12 0.5381749200 0.771611904             0          0
ARHGEF9            1     -0.3601272 0.013220883 7.571364e-02 0.0017360325 0.020733932             0          0
ATP6V1H            1     -0.4756527 0.116138755 6.065009e-05 0.1627958090 0.440102707             0          0
ATPIF1             1     -0.3568552 0.026713434 6.088740e-04 0.0290086240 0.159985593             0          0
BRE                1     -0.3331802 0.009351711 1.211878e-02 0.0005703166 0.008601761             0          0
C14orf2            1     -0.2799842 0.050154476 1.533024e-08 0.2112273540 0.503021119    

Note: Fisher p-value is 0 when value is smaller than 10e-16.

In [9]:
# Performs a meta analysis using the MetaDE.pvalue algorithm from metaDE R library- with just P values.
meta_P_function(allmatrix = allmatrix,output.file = "metaP_GSEresult_case_control" )

         P.Value.x  P.Value.y
A1BG     0.9434428 0.73170992
A1BG-AS1        NA 0.56235627
A1CF            NA 0.24089503
A2M      0.1663719 0.07052116
A2M-AS1         NA 0.72094139
A2ML1           NA 0.21193764


* doing Meta P
                rank Fisher.Pvalue Fisher.FDR
ANKHD1-EIF4EBP3    1             0          0
ARHGEF9            1             0          0
ATP6V1H            1             0          0
ATPIF1             1             0          0
BRE                1             0          0
C14orf2            1             0          0


 * writing to  /home/guess/MetaAnalysis/GeneExprMeta

Individual logFCs were combined using the Random Effect Model (REM). Given that the analysis included data from different brain regions, genes were ranked according to the Fisher statistics to avoid making assumptions about the directionality of the effect, aimed at identifying candidate markers differentially expressed in the “majority” of studies, where Fisher methods has been described to outperform other methods in terms of power detection, biological association, stability and robustness (Chang et al. 2013). Incluir? En principio no. (Es del paper de ADAPTED)