Skip to content

A flexible tool for the multi-resolution localization of causal variants across the genome

License

Notifications You must be signed in to change notification settings

fz-cambridge/knockoffzoom

 
 

Repository files navigation

KnockoffZoom

A powerful and versatile statistical method for the analysis of genome-wide association data.

Accompanying paper:

Multi-resolution localization of causal variants across the genome
M. Sesia, E. Katsevich, S. Bates, E. Candès, C. Sabatti
Nat Commun 11, 1093 (2020). https://doi.org/10.1038/s41467-020-14791-2

For more information, visit: https://msesia.github.io/knockoffzoom.

News: A new version of KnockoffZoom is available, which can also account for population structure and familial relatedness.

Overview

The goal of KnockoffZoom is to identify causal variants for complex traits effectively and precisely through genome-wide fine-mapping, accounting for linkage disequilibrium and controlling the false discovery rate. The results leverage the genetic models used for phasing and are equally valid for quantitative and binary traits.

The code contained in this repository is designed to allow the application of KnockoffZoom to large datasets, such as the UK Biobank. Some of the code is provided in the form of Bash and R scripts, while the core algorithms for Monte Carlo knockoff sampling are implemented in the R package SNPknock, which should be installed separately.

The KnockoffZoom methodology is divided into different modules, each corresponding to a separate Bash script contained in the directory knockoffzoom/.

Dependencies

Recommended OS: Linux. Mac OS is not supported but should be compatible.

The following software should be available from your user path:

The following R (version 3.5.1) packages are required:

The above version numbers correspond to the configuration on which this software was tested. Newer version are likely to be compatible, but have not been tested.

Installation

Clone this repository on your system and install any missing dependencies. Estimated installation time (dependencies): 5-15 minutes.

Toy dataset

A toy dataset containing 1000 artificial samples typed at 2000 loci (divided between chromosome 21 and 22) is offered as a toy example to test KnockoffZoom. To run the example, simply execute the script analyze.sh.

./analyze.sh

This script will also verify whether required R packages are available and install them otherwise.

The analysis should take approximately 5 minutes on a personal computer. The results can be visualized interactively with the script visualize.sh, which will launch a Shiny app in your browser. Some additional R packages are required by the visualization tool, and will be automatically installed if not found.

./visualize.sh

The expected results for the analysis of this toy dataset are provided in the directory results/ and can be visualized by running the script visualize.sh before running analyze.sh. Note that the script analyze.sh will overwrite the default results.

Tutorial

A guided step-by-step analysis of the above toy dataset using KnockoffZoom is available at: https://msesia.github.io/knockoffzoom/tutorial.html.

Large-scale applications

KnockoffZoom is computationally efficient and we have successfully applied it to the analysis of the genetic data in the UK Biobank. For more information, visit https://msesia.github.io/knockoffzoom/ukbiobank.html. The analysis of large datasets cannot be carried out on a personal computer. The computational resources required for the analysis of the UK Biobank data are summarized in the accompanying paper.

The modular nature of our method allows the code contained in each of the 5 main scripts to be easily deployed on a computing cluster for large-scale applications. This task will require some additional user effort compared to the toy example, but the scripts for each module are documented and quite intuitive.

Authors

Contributors

  • Eugene Katsevich (University of Pennsylvania). User interface of the Shiny visualization app.

License

This software is distributed under the GPLv3 license.

Further references

Read more about the broader framework of knockoffs.

About

A flexible tool for the multi-resolution localization of causal variants across the genome

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 76.8%
  • Shell 23.1%
  • CSS 0.1%