IMPORTANT: This repository is no longer maintained. In addition to running a TAD_Pathways analysis for Bone Mineral Density GWAS, this analysis pipeline downloads genomic data and explores distributions across TADs. A more streamlined analysis pipeline that does not require data preprocessing is available at https://github.com/greenelab/tad_pathways_pipeline
NOTE: Several files built from this pipeline are used in the above repository including the TAD based gene index and the hg19 converted NHGRI-EBI GWAS Catalog.
Incorporating TADs into GWAS Analysis - TAD_Pathways
Gregory P. Way and Casey S. Greene 2016
The repository contains methods for manipulating, observing, and visualizing topologically associating domains (TADs) in the context of SNPs, genes, and repeat elements for human (hg19) and mouse (mm9) genomes.
The repository also proposes methods and tools for the incorporation of TAD domains into the prioritization of GWAS signals through the investigation of publicly available GWAS data. We introduce TAD pathways as a method to identify the likely causal genes from GWAS independent of distance to sentinel SNP.
A preprint of our method is available here on bioRxiv
For all questions and bug reporting please file a GitHub issue
There are two ways to implement a TAD_Pathways analysis:
- Disease/Trait Specific - Uses GWAS identified SNPs
- Custom - Uses custom SNP list
Curates the GWAS catalog and TAD boundaries to visualize TADs and generate TAD based gene lists. This will also perform a TAD pathways analysis for Bone Mineral Density GWAS. This will reproduce the analysis and figures used in the paper.
# Using python dependencies conda env create --quiet --force --file environment.yml source activate tad_pathways bash scripts/run_pipeline.sh
This will download data, perform analyses, and output several genomic figures. The command will also output TAD based genes for 299 different GWAS traits. Our TAD_Pathways method can be applied directly using these gene lists.
TAD_Pathways is customizable and allows a user to prespecify any SNP list of interest to test TAD based pathway associations. To perform a custom analysis create a comma separated file where the first row of each column names the list of snps below in subsequent rows.
|Group 1||Group 2|
Then, perform the following steps:
# Extract locations for SNP list Rscript --vanilla scripts/tad_util/build_snp_list.R \ --snp_file "custom_example.csv" \ --output_file "mapped_results.tsv" # Build TAD based genelists for each group python scripts/build_custom_TAD_genelist.py \ --snp_data_file "mapped_results.tsv" \ --output_file "custom_tad_genelist.tsv" # The output file is then ready for the manual "TAD_Pathways" steps below
As a case study to demonstrate the utility of a TAD based approach, input the TAD based gene list for the Bone Mineral Density (1,297 genes) into a pathway analysis:
Next, run a WebGestalt pathway analysis on the gene list.
|Select gene ID type||hsapiens__gene_symbol|
|Enrichment Analysis||GO Analysis|
|GO Slim Classification||Yes|
|Multiple Test Adjustment||BH|
|Minimum Number of Genes for a Category||4|
Note - The output of
scripts/run_pipeline.sh in data/TAD_based_genes/ for
all traits is ready for TAD Pathway Analysis.
After performing the WebGestalt analysis, click
Export TSV Only and save the
<TRAIT> is "BMD" for the
- GWAS Catalog (2016-02-25)
- eQTL (2016-05-09) eQTL Browser
Nearest BMD gene GWAS reports
- Richards et al. 2008 Lancet
- Rivadeneira et al. 2009 Nature Genetics
- Estrada et al. 2012 Nature Genetics
- Styrkarsdottir et al. 2013 Nature
eQTL Browser Parameters
- Analysis ID (All)
- Association Test Significance Filters (p-value 1 x 10^-1)
- Phenotype Traits (Bone Mineral Density)
All analyses were performed in the Anaconda python distribution (3.5.1). For
specific package versions please refer to
environment.yml. R version 3.3.0 was
used for visualization. For more specific environment dependencies refer to our
accompanying docker file at