# Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies
Roby Joehanes, Xiaoling Zhang, Tianxiao Huan, Chen Yao, Sai-xia Ying, Quang Tri Nguyen, Cumhur Yusuf Demirkale, Michael L. Feolo, Nataliya R. Sharopova, Anne Sturcke, Alejandro A. Schäffer, Nancy Heard-Costa, Han Chen, Po-ching Liu, Richard Wang, Kimberly A. Woodhouse, Kahraman Tanriverdi, Jane E. Freedman, Nalini Raghavachari, Josée Dupuis, Andrew D. Johnson, Christopher J. O’Donnell, Daniel Levy & Peter J. Munson

Genome Biology volume 18, Article number: 16 (2017)

Online paper [link](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1142-6#article-info)

## Background
- The vast majority of the SNPs from GWAS reside in non-coding regions - most disease-associated SNPs do not directly influence protein structure or function.
- Identification of SNPs associated with gene expression levels, known as eQTLs, may improve understanding of the functional role of phenotype-associated SNPs in GWAS.
- Previous eQTL studies: the small sample sizes limited their statistical power.

## Methods
### Study participants
Whole blood eQTL from over 5000 samples in Framingham Heart Study (FHS) cohorts
- 2770 individuals of eighth offspring cohort examination cycle (2005–2008)
- 3341 indicudials of second examination cycle (2006–2009)

### Quality control
- It is detected that polymorphism-in-probe effects were generally minor

### eQTL Discovery
- Use stepwise linear regression procedure to identify independent "lead-eQTLs" for each genetic region, and found over 19,000 independent, lead cis-eQTLs and almost 6,000 independent, lead trans-eQTLs

### eQTL Validation
- Internal: 75% cis and 41% trans-eQTL are validated.
- With previous studies: 50-70% previous cis-eQTL and 30-60% trans-eQTL are replicated
- The replication in the other way is low due to lower power in previous work and different sequencing platforms, etc; but still 90% of cases the directions are consistent.

### Statistical analysis
- Accounting for reported familial relationships, they removed the effects of sex, age, platelet count, white blood cell whole count, etc, from the expression data.
- To infer hidden confounding factors, they use a Bayesian framework on the residualized gene expression data.

### Enrichment P value calculation
- Enrichment: observed number divided by expected number, accounting for the LD structure of the available 8.5 million SNPs.
- Expected numbers were obtained from the relevant 2 × 2 contingency tables.
- The LD was computed between pairwise SNPs within the FHS dataset. 

### cis-eQTL, trans-eQTL, primary lead eQTL definition
- cis: An SNP-transcript cluster pair is considered cis if the SNP resides within 1 Mb of the TSS on the same chromosome
- trans: eQTLs that fall in blocks which did not contain the TSS for its target transcript cluster
- primary lead eQTL:  the strongest eQTL, judged by P value for association, in its block

## Results
### eQTL discoveries
- over 19,000 independent lead cis-eQTLs and over 6000 independent lead trans-eQTLs, with a false discovery rate (FDR) < 5%.
- 48% are identified to be significant eQTLs

### Enrichment of lead eQTL to gene structure
- Lead eQTLs are highly enriched in exonic locations (25-fold) within the transcribed region, more so than for intronic locations (12-fold)
- Lead eQTLs are highly enriched in first exons and 5’ UTRs, especially, of transcribed regions (45 fold).
- Other exons, the 3’-UTR, the first intron and subsequent introns showed less degrees of enrichment (21-fold, 20-fold, 11-fold, and 8-fold)
- cis-eQTLs act preferentially through regulatory elements within the first exon, within the 5’-UTR or near the transcription start site (TSS).

### Enrichment of lead eQTLs at regulator sites
- Regulator sites include DNAase hypersensitivity, transcription factor binding sites, and biochemically characterized regulatory promoter regions
- It showed strong enrichment of regulatory evidence for primary and secondary lead cis-eQTLs (7-fold, P < 1E-89).
- The primary lead cis-eQTLs alone showed a stronger enrichment (8-fold, P < 1E-69).

### Clusters of trans-eQTL
- Some trans-eQTLs are associated with multiple distant transcripts and can be grouped into compact genomic blocks or clusters.
- At gene level, we identified 59 distinct clusters of trans-eQTLs, each targeting a set of 6 to 141 distant transcripts.
- The most prominent trans-eQTL clusters are on chromosomes 3 and 17, and are associated with expression of plateletspecific genes.
- They found 13 platelet-related GWAS clusters, many of which also had target gene sets enriched with platelet specific genes.

### GWAS analysis
- Among the 58 GWAS SNPs for  coronary artery disease or myocardial infarction (CAD/MI), 21 loci or 36% are lead cis-eQTL.
- The strongest eQTL (P < 1E-455), rs1412445, is in the 3rd intron of LIPA transcript variant 1 and was a cis-eQTL for LIPA expression.
- We also identified potentially novel, strong cis-eQTLs at UBE2Z locus and SH2B3 locus, where the CAD/MI GWAS risk SNP was in very strong LD with our lead eQTL.

## Conclusion
### Advantages
- The study has large sample size, providing obvious benefits in terms of greater statistical power for discovery.
- Expression measurement was carried out in a single laboratory with rigorous quality control.
- The results provide an extensive resource of cis-eQTLs and trans-eQTLs at the gene and exon level, and this information may be useful for elucidating the biological underpinnings of many GWAS SNPs.

### Limitations
- The homogeneity of the FHS population may limit the applicability of their results to populations of different ancestries.
- Lack of population diversity might also increase the size of LD blocks and thereby limit the resolution.