# Meta Analysis of GWAS data

Different GWAS results can be combined by means of the standard error based weights meta-analysis method implemented in METAL, corrected by study-specific inflation factors. A Cochran’s Q test for heterogeneity and I2 estimates is generated to evaluate the potential effect of study heterogeneity on the results.

Note this template is not run as only one GWAS dataset is available.

https://genome.sph.umich.edu/wiki/METAL_Documentation

In [1]:
%load_ext rpy2.ipython

**Input File Columns**

Each input file should include the following information:

- A column with marker name, which should be consistent across studies
- A column indicating the tested allele
- A column indicating the other allele


If you are carrying out a sample size weighted analysis (based on p-values), you will also need:

- A column indicating the direction of effect for the tested allele
- A column indicating the corresponding p-value
- An optional column indicating the sample size (if the sample size varies by marker)


If you are carrying out a meta-analysis based on standard errors, you will need:

-A column indicating the estimated effect size for each marker
- A column indicating the standard error of this effect size estimate

The header for each of these columns must be specified so that METAL knows how to interpret the data. Additional columns including allele frequency information, strand information, and others can also be present.

**Selecting an Analysis Scheme**

     1) SCHEME SAMPLESIZE        - default approach, uses p-value and direction of effect, weighted according to sample size
     
     The weight for each MARKER can be stored in a column in the table (specified with the WEIGHTLABEL or WEIGHT commands). Most commonly, the weight will be the number of individuals contributing to that particular p-value.
     
     WEIGHTLABEL     N
     
     Alternatively, the same weight can be used for all markers for that inputfile (in which case the fixed weight can be set  with the DEFAULTWEIGHT command). The WEIGHTLABEL command takes precedence over the DEFAULTWEIGHT command, so the WEIGHT column label in use must not match any columns in the inputfile.
     
     WEIGHTLABEL     DONTUSECOLUMN
     DEFAULTWEIGHT   1000
    
    
    2) SCHEME STDERR            - classical approach, uses effect size estimates and standard errors

     For this approach, you need to specify the label for the standard error column:

     STDERR SE                

## SCHEME SAMPLESIZE 

In [3]:
%%bash
metal

MARKER SNP
ALLELE A2 A1
FREQ FRQ
EFFECT OR
PVAL P
SEPARATOR TAB
MINMAXFREQ ON
GENOMICCONTROL ON
SCHEME SAMPLESIZE
DEFAULTWEIGHT   10000

PROCESS /mnt/data/GWAS/output/build38/task4_assoc/dataset.b38.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot
PROCESS /mnt/data/MetaAnalysis/input/IGAP_stage_1_2_combined_for_METAL.txt

OUTFILE /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_ .tbl

ANALYZE HETEROGENEITY

MetaAnalysis Helper - (c) 2007 - 2009 Goncalo Abecasis
This version released on 2011-03-25

# This program faciliates meta-analysis of genome-wide association studies.
# Commonly used commands are listed below:
#
# Options for describing input files ...
#   SEPARATOR        [WHITESPACE|COMMA|BOTH|TAB] (default = WHITESPACE)
#   COLUMNCOUNTING   [STRICT|LENIENT]            (default = 'STRICT')
#   MARKERLABEL      [LABEL]                     (default = 'MARKER')
#   ALLELELABELS     [LABEL1 LABEL2]             (default = 'ALLELE1','ALLELE2')
#   EFFECTLABEL      [LABEL|log(LABEL)]          (default = 'EFFECT')
#   FLIP
#
# Options for filtering input files ...
#   ADDFILTER        [LABEL CONDITION VALUE]     (example = ADDFILTER N > 10)
#                    (available conditions are <, >, <=, >=, =, !=, IN)
#   REMOVEFILTERS
#
# Options for sample size weighted meta-analysis ...
#   WEIGHTLABEL      [LABEL]                     (default = 'N')
#   PVALUELABEL      [LABEL]                

In [5]:
%%bash
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_1.tbl
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_1.tbl.info


MarkerName	Allele1	Allele2	MinFreq	MaxFreq	Weight	Zscore	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
14:34778882:C:T	t	c	0.6063	0.6063	10000.00	0.337	0.7363	+	0.0	0.000	0	1
2:234097095:A:G	a	g	0.2657	0.2657	10000.00	-1.690	0.09105	-	0.0	0.000	0	1
5:128938723:G:T	t	g	0.2945	0.2945	10000.00	0.734	0.4628	+	0.0	0.000	0	1
15:48519823:A:G	a	g	0.7704	0.7704	10000.00	-0.998	0.3182	-	0.0	0.000	0	1
9:331389:C:G	c	g	0.8065	0.8065	10000.00	-0.457	0.6477	-	0.0	0.000	0	1
19:9898534:T:A	a	t	0.2106	0.2106	10000.00	1.000	0.3174	+	0.0	0.000	0	1
9:2750945:C:G	c	g	0.1433	0.1433	10000.00	-1.566	0.1173	-	0.0	0.000	0	1
1:75335165:C:T	t	c	0.3455	0.3455	10000.00	1.371	0.1705	+	0.0	0.000	0	1
4:24943395:T:C	t	c	0.2217	0.2217	10000.00	-1.642	0.1006	-	0.0	0.000	0	1
# This file contains a short description of the columns in the
# meta-analysis summary file, named '/mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_1.tbl'

# Marker    - this is the marker name
# Allele1   - the first allele for this

## SCHEME STDERR

In [4]:
%%bash
metal

MARKER SNP
ALLELE A2 A1
FREQ FRQ
EFFECT OR
PVAL P
SEPARATOR TAB
MINMAXFREQ ON
GENOMICCONTROL ON
SCHEME STDERR

STDERR SE

PROCESS /mnt/data/GWAS/output/build38/task4_assoc/dataset.b38.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot
PROCESS /mnt/data/MetaAnalysis/input/IGAP_stage_1_2_combined_for_METAL.txt

OUTFILE /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_ .tbl

ANALYZE HETEROGENEITY

MetaAnalysis Helper - (c) 2007 - 2009 Goncalo Abecasis
This version released on 2011-03-25

# This program faciliates meta-analysis of genome-wide association studies.
# Commonly used commands are listed below:
#
# Options for describing input files ...
#   SEPARATOR        [WHITESPACE|COMMA|BOTH|TAB] (default = WHITESPACE)
#   COLUMNCOUNTING   [STRICT|LENIENT]            (default = 'STRICT')
#   MARKERLABEL      [LABEL]                     (default = 'MARKER')
#   ALLELELABELS     [LABEL1 LABEL2]             (default = 'ALLELE1','ALLELE2')
#   EFFECTLABEL      [LABEL|log(LABEL)]          (default = 'EFFECT')
#   FLIP
#
# Options for filtering input files ...
#   ADDFILTER        [LABEL CONDITION VALUE]     (example = ADDFILTER N > 10)
#                    (available conditions are <, >, <=, >=, =, !=, IN)
#   REMOVEFILTERS
#
# Options for sample size weighted meta-analysis ...
#   WEIGHTLABEL      [LABEL]                     (default = 'N')
#   PVALUELABEL      [LABEL]                

In [7]:
%%bash
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_1.tbl
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_1.tbl.info

MarkerName	Allele1	Allele2	MinFreq	MaxFreq	Effect	StdErr	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
14:34778882:C:T	t	c	0.6063	0.6063	1.1163	1.1199	0.3189	+	0.0	0.000	0	1
2:234097095:A:G	a	g	0.2657	0.2657	-0.5117	1.3593	0.7066	-	0.0	0.000	0	1
5:128938723:G:T	t	g	0.2945	0.2945	1.2371	0.9935	0.213	+	0.0	0.000	0	1
15:48519823:A:G	a	g	0.7704	0.7704	-1.4157	1.1939	0.2357	-	0.0	0.000	0	1
9:331389:C:G	c	g	0.8065	0.8065	-1.1825	1.2578	0.3472	-	0.0	0.000	0	1
19:9898534:T:A	a	t	0.2106	0.2106	0.6710	1.3680	0.6238	+	0.0	0.000	0	1
9:2750945:C:G	c	g	0.1433	0.1433	-0.4439	1.7778	0.8028	-	0.0	0.000	0	1
1:75335165:C:T	t	c	0.3455	0.3455	1.6052	1.1838	0.1751	+	0.0	0.000	0	1
4:24943395:T:C	t	c	0.2217	0.2217	-2.2378	1.6818	0.1833	-	0.0	0.000	0	1
# This file contains a short description of the columns in the
# meta-analysis summary file, named '/mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_1.tbl'

# Marker    - this is the marker name
# Allele1   - the first allele for this marker in the 