Fast calculation of LD in large-scale cohorts
Tomahawk is a machine-optimized library for computing linkage-disequilibrium from population-sized datasets. Tomahawk permits close to real-time analysis of regions-of-interest in datasets of many millions of diploid individuals on a standard laptop. All algorithms are embarrassingly parallel and have been successfully tested on datasets with up to 10 million individuals using thousands of cores on hundreds of machines using the Wellcome Trust Sanger Institute compute farm.
Tomahawk is unique in that it constructs complete haplotype/genotype contigency matrices for each comparison, perform statistical tests on the output data, and provide a framework for investigating the produced data.
- Read the documentation
- Get started with the CLI
- Get started with rtomahawk
- Get started with Docker and Tomahawk
- Your processor should support SSE4.2 (It is supported by most Intel and AMD processors released since 2008.)
- C++11 compliant compiler (GCC is assumed)
- A Linux-like distribution is assumed by the makefile
For Ubuntu, Debian, and Mac systems, installation is easy: just run
git clone --recursive https://github.com/mklarqvist/tomahawk cd tomahawk ./install.sh
install.sh file depends extensively on
apt-get, so it is unlikely to run without extensive modifications on non-Debian-based systems.
If you do not have super-user (administrator) privileges required to install new packages on your system then run the local installation:
When installing locally, the required dependencies are downloaded and built in the root directory. This approach will require additional effort if you intend to move the compiled libraries to a different directory.
Interested in contributing? Fork and submit a pull request and it will be reviewed.
We are actively developing Tomahawk and are always interested in improving its quality. If you run into an issue, please report the problem on our Issue tracker. Be sure to add enough detail to your report that we can reproduce the problem and address it. We have not reached version 1.0 and as such the specification and/or the API interfaces may change.
This is Tomahawk 0.7.0. Tomahawk follows semantic versioning.
Marcus D. R. Klarqvist (firstname.lastname@example.org)
Department of Genetics, University of Cambridge
Wellcome Sanger Institute
Tomahawk is licensed under MIT