# Meta Analysis of GWAS data

Different GWAS results can be combined by means of the standard error based weights meta-analysis method implemented in METAL, corrected by study-specific inflation factors. A Cochran’s Q test for heterogeneity and I2 estimates is generated to evaluate the potential effect of study heterogeneity on the results.

Note this template is not run as only one GWAS dataset is available.

https://genome.sph.umich.edu/wiki/METAL_Documentation

In [1]:
%load_ext rpy2.ipython

**Input File Columns**

Each input file should include the following information:

- A column with marker name, which should be consistent across studies
- A column indicating the tested allele
- A column indicating the other allele


If you are carrying out a sample size weighted analysis (based on p-values), you will also need:

- A column indicating the direction of effect for the tested allele
- A column indicating the corresponding p-value
- An optional column indicating the sample size (if the sample size varies by marker)


If you are carrying out a meta-analysis based on standard errors, you will need:

-A column indicating the estimated effect size for each marker
- A column indicating the standard error of this effect size estimate

The header for each of these columns must be specified so that METAL knows how to interpret the data. Additional columns including allele frequency information, strand information, and others can also be present.

**Selecting an Analysis Scheme**

     1) SCHEME SAMPLESIZE        - default approach, uses p-value and direction of effect, weighted according to sample size
     
     The weight for each MARKER can be stored in a column in the table (specified with the WEIGHTLABEL or WEIGHT commands). Most commonly, the weight will be the number of individuals contributing to that particular p-value.
     
     WEIGHTLABEL     N
     
     Alternatively, the same weight can be used for all markers for that inputfile (in which case the fixed weight can be set  with the DEFAULTWEIGHT command). The WEIGHTLABEL command takes precedence over the DEFAULTWEIGHT command, so the WEIGHT column label in use must not match any columns in the inputfile.
     
     WEIGHTLABEL     DONTUSECOLUMN
     DEFAULTWEIGHT   1000
    
    
    2) SCHEME STDERR            - classical approach, uses effect size estimates and standard errors

     For this approach, you need to specify the label for the standard error column:

     STDERR SE                

## SCHEME SAMPLESIZE 

In [19]:
%%bash
sed 's/ /\t/g' /mnt/data/GWAS/output/build38/task4_assoc/dataset.b38.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot > /mnt/data/GWAS/output/build38/task4_assoc/temp
head /mnt/data/GWAS/output/build38/task4_assoc/temp
mv /mnt/data/GWAS/output/build38/task4_assoc/temp  /mnt/data/GWAS/output/build38/task4_assoc/dataset.b38.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot.METAL

CHR	BP	SNP	A1	A2	FRQ	INFO	OR	SE	P	RS	ANNOT
10	100000235	10:100000235:C:T	C	T	0.2982	0.7959	0.6679	0.3084	0.1906	rs11596870	ABCC2(+148kb)|CHUK(-188.1kb)|CPN1(-42.07kb)|DNMBP(0)|DNMBP-AS1(+41.24kb)|ERLIN1(-149.9kb)
10	100000943	10:100000943:G:A	G	A	0.0929	0.7945	1.2681	0.5058	0.6387	rs11190359	ABCC2(+148.8kb)|CHUK(-187.4kb)|CPN1(-41.36kb)|DNMBP(0)|DNMBP-AS1(+41.94kb)|ERLIN1(-149.1kb)
10	100000979	10:100000979:T:C	T	C	0.0504	0.7881	1.7337	0.7499	0.4631	rs11190360	ABCC2(+148.8kb)|CHUK(-187.4kb)|CPN1(-41.33kb)|DNMBP(0)|DNMBP-AS1(+41.98kb)|ERLIN1(-149.1kb)
10	100002012	10:100002012:T:C	T	C	0.0374	0.8215	1.0097	0.6786	0.9886	rs11190362	ABCC2(+149.8kb)|CHUK(-186.4kb)|CPN1(-40.3kb)|DNMBP(0)|DNMBP-AS1(+43.01kb)|ERLIN1(-148.1kb)
10	100002038	10:100002038:G:A	G	A	0.0102	0.8861	3.051	1.3529	0.4097	rs192480913	ABCC2(+149.8kb)|CHUK(-186.3kb)|CPN1(-40.27kb)|DNMBP(0)|DNMBP-AS1(+43.04kb)|ERLIN1(-148.1kb)
10	100002300	10:100002300:GA:G	GA	G	0.0374	0.8216	1.0082	0.6788	0.9904	rs111354488	ABCC2(+150.1kb)|C

In [20]:
%%bash
metal

MARKER RS
ALLELE A2 A1
EFFECT OR
PVAL P
SEPARATOR TAB
GENOMICCONTROL ON
SCHEME SAMPLESIZE
DEFAULTWEIGHT   10000

PROCESS /mnt/data/GWAS/output/build38/task4_assoc/dataset.b38.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot.METAL
PROCESS /mnt/data/MetaAnalysis/input/IGAP_stage_1_2_combined_for_METAL.txt

OUTFILE /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_ .tbl

ANALYZE HETEROGENEITY

MetaAnalysis Helper - (c) 2007 - 2009 Goncalo Abecasis
This version released on 2011-03-25

# This program faciliates meta-analysis of genome-wide association studies.
# Commonly used commands are listed below:
#
# Options for describing input files ...
#   SEPARATOR        [WHITESPACE|COMMA|BOTH|TAB] (default = WHITESPACE)
#   COLUMNCOUNTING   [STRICT|LENIENT]            (default = 'STRICT')
#   MARKERLABEL      [LABEL]                     (default = 'MARKER')
#   ALLELELABELS     [LABEL1 LABEL2]             (default = 'ALLELE1','ALLELE2')
#   EFFECTLABEL      [LABEL|log(LABEL)]          (default = 'EFFECT')
#   FLIP
#
# Options for filtering input files ...
#   ADDFILTER        [LABEL CONDITION VALUE]     (example = ADDFILTER N > 10)
#                    (available conditions are <, >, <=, >=, =, !=, IN)
#   REMOVEFILTERS
#
# Options for sample size weighted meta-analysis ...
#   WEIGHTLABEL      [LABEL]                     (default = 'N')
#   PVALUELABEL      [LABEL]                

In [26]:
%%bash
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_1.tbl
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_1.tbl.info


MarkerName	Allele1	Allele2	Weight	Zscore	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
rs2326918	a	g	10000.00	-0.589	0.5559	-?	0.0	0.000	0	1
rs66941928	t	c	10000.00	-1.204	0.2287	-?	0.0	0.000	0	1
rs6039163	t	c	10000.00	0.266	0.7905	+?	0.0	0.000	0	1
rs62234673	t	c	10000.00	0.494	0.6214	+?	0.0	0.000	0	1
rs6977693	t	c	10000.00	-1.333	0.1825	-?	0.0	0.000	0	1
rs12364336	a	g	10000.00	-0.639	0.5229	-?	0.0	0.000	0	1
rs11250701	a	g	10000.00	-0.583	0.5596	-?	0.0	0.000	0	1
rs12562373	a	g	10000.00	-0.078	0.9382	-?	0.0	0.000	0	1
rs4766166	a	g	10000.00	0.616	0.5376	+?	0.0	0.000	0	1
# This file contains a short description of the columns in the
# meta-analysis summary file, named '/mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.SAMPLESIZE_1.tbl'

# Marker    - this is the marker name
# Allele1   - the first allele for this marker in the first file where it occurs
# Allele2   - the second allele for this marker in the first file where it occurs
# Weight    - the sum of the individual study weig

## SCHEME STDERR

In [22]:
%%bash
metal

MARKER SNP
ALLELE A2 A1
EFFECT OR
PVAL P
SEPARATOR TAB
MINMAXFREQ ON
GENOMICCONTROL ON
SCHEME STDERR

STDERR SE

PROCESS /mnt/data/GWAS/output/build38/task4_assoc/dataset.b38.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot.METAL
PROCESS /mnt/data/MetaAnalysis/input/IGAP_stage_1_2_combined_for_METAL.txt

OUTFILE /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_ .tbl

ANALYZE HETEROGENEITY

MetaAnalysis Helper - (c) 2007 - 2009 Goncalo Abecasis
This version released on 2011-03-25

# This program faciliates meta-analysis of genome-wide association studies.
# Commonly used commands are listed below:
#
# Options for describing input files ...
#   SEPARATOR        [WHITESPACE|COMMA|BOTH|TAB] (default = WHITESPACE)
#   COLUMNCOUNTING   [STRICT|LENIENT]            (default = 'STRICT')
#   MARKERLABEL      [LABEL]                     (default = 'MARKER')
#   ALLELELABELS     [LABEL1 LABEL2]             (default = 'ALLELE1','ALLELE2')
#   EFFECTLABEL      [LABEL|log(LABEL)]          (default = 'EFFECT')
#   FLIP
#
# Options for filtering input files ...
#   ADDFILTER        [LABEL CONDITION VALUE]     (example = ADDFILTER N > 10)
#                    (available conditions are <, >, <=, >=, =, !=, IN)
#   REMOVEFILTERS
#
# Options for sample size weighted meta-analysis ...
#   WEIGHTLABEL      [LABEL]                     (default = 'N')
#   PVALUELABEL      [LABEL]                

In [23]:
%%bash
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_1.tbl
head /mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_1.tbl.info

MarkerName	Allele1	Allele2	MinFreq	MaxFreq	Effect	StdErr	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
# This file contains a short description of the columns in the
# meta-analysis summary file, named '/mnt/data/MetaAnalysis/output/dataset.b38.IGAP.METAL.STDERR_1.tbl'

# Marker    - this is the marker name
# Allele1   - the first allele for this marker in the first file where it occurs
# Allele2   - the second allele for this marker in the first file where it occurs
# MinFreq     - minimum frequency for allele 1 across all studies
# MaxFreq     - maximum frequency for allele 1 across all studies
# Effect    - overall estimated effect size for allele1
# StdErr    - overall standard error for effect size estimate
