Scala Python Jupyter Notebook C++ CSS Java Other
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.github Create issue_template.md (#2256) Sep 26, 2017
gradle/wrapper Added gradle wrapper. Aug 17, 2016
prebuilt/lib Fix? (#3956) Jul 18, 2018
project Add basic sbt support (#2846) Feb 15, 2018
python edit tutorials documentation (#4098) Aug 17, 2018
scripts fixed (api2) links in tutorial (#2957) Feb 23, 2018
src [RVD2] partitioner (#4094) Aug 17, 2018
www s/Gitter/Zulip/g (#3791) Jun 19, 2018
.dockerignore prep for new CI system (#4111) Aug 9, 2018
.gitignore Add files created by test run to .gitignore (#4079) Aug 3, 2018
AUTHORS Added LICENSE and AUTHORS. Apr 21, 2016
Dockerfile.pr-builder Revert functools to decorator (#4139) Aug 14, 2018
LICENSE Fix intellij replace errors Jun 16, 2016
Makefile Revert functools to decorator (#4139) Aug 14, 2018
README.md fix swapped links (#4168) Aug 18, 2018
acknowledgements.txt Create acknowledgements.txt (#1053) Nov 3, 2016
build.gradle Fix hail-ci incompatibility: /bin/sh is not bash (#4129) Aug 10, 2018
build.sbt RVD spicy meatball (now with only +876/-587) (#3414) May 4, 2018
changes.md Fixed broken link in changes.md. Aug 16, 2016
code_style.xml Added code_style.xml to easily import Intellij settings Sep 13, 2016
deployed-spark-versions.txt add deployed-spark-versions.txt (#2941) Feb 20, 2018
generate-build-info.sh Updated the build to match the website changes: (#2034) Jul 27, 2017
generate-dist-links.sh edit getting started docs and developer docs (#4061) Aug 2, 2018
gradlew Added gradle wrapper. Aug 17, 2016
gradlew.bat Added gradle wrapper. Aug 17, 2016
hail-ci-build-image Revert functools to decorator (#4139) Aug 14, 2018
hail-ci-build.sh Fix hail-ci incompatibility: /bin/sh is not bash (#4129) Aug 10, 2018
settings.gradle Explicitly set the rootProject.name to 'hail'. Otherwise, it defaults… Aug 24, 2016
style-guide.md fix markdown indenting (#3918) Jul 11, 2018
testng.xml Moved org.broadinstitute.hail to is.hail. (#1265) Jan 17, 2017

README.md

Hail

Zulip CI Status

Hail is an open-source, scalable framework for exploring and analyzing genomic data.

The Hail project began in Fall 2015 to empower the worldwide genetics community to harness the flood of genomes to discover the biology of human disease. Since then, Hail has expanded to enable analysis of large-scale datasets beyond the field of genomics.

Here are two examples of projects powered by Hail:

  • The gnomAD team uses Hail as its core analysis platform. gnomAD is among the most comprehensive catalogues of human genetic variation in the world, and one of the largest genetic datasets. Analysis results are shared publicly and have had sweeping impact on biomedical research and the clinical diagnosis of genetic disorders.
  • The Neale Lab at the Broad Institute used Hail to perform QC and stratified association analysis of 4203 phenotypes at each of 13M variants in 361,194 individuals from the UK Biobank in about a day. Results and code are here.

For genomics applications, Hail can:

  • flexibly import and export to a variety of data and annotation formats, including VCF, BGEN and PLINK
  • generate variant annotations like call rate, Hardy-Weinberg equilibrium p-value, and population-specific allele count; and import annotations in parallel through the annotation database, VEP, and Nirvana
  • generate sample annotations like mean depth, imputed sex, and TiTv ratio
  • generate new annotations from existing ones as well as genotypes, and use these to filter samples, variants, and genotypes
  • find Mendelian violations in trios, prune variants in linkage disequilibrium, analyze genetic similarity between samples, and compute sample scores and variant loadings using PCA
  • perform variant, gene-burden and eQTL association analyses using linear, logistic, and linear mixed regression, and estimate heritability
  • lots more!

Hail's functionality is exposed through Python and backed by distributed algorithms built on top of Apache Spark to efficiently analyze gigabyte-scale data on a laptop or terabyte-scale data on a cluster.

Users can script pipelines or explore data interactively in Jupyter notebooks that combine Hail's methods, PySpark's scalable SQL and machine learning algorithms, and Python libraries like pandas's scikit-learn and Matplotlib. Hail also provides a flexible domain language to express complex quality control and analysis pipelines with concise, readable code.

To learn more, you can view our talks at Spark Summit East and Spark Summit West (below).

Hail talk at Spark Summit West 2017

Getting Started

There are currently two versions of Hail: 0.1 (stable) and 0.2 beta (development). We recommend that new users install 0.2 beta, since this version is already radically improved from 0.1, the file format is stable, and the interface is nearly stable.

To get started using Hail 0.2 beta on your own data or on public data:

  • install Hail using the instructions in Installation
  • read the Overview for a broad introduction to Hail
  • follow the Tutorials for examples of how to use Hail
  • check out the Python API for detailed information on the programming interface

You can download phase 3 of the 1000 Genomes dataset in Hail's native matrix table format here.

As we work toward a stable 0.2 release, additional improvements to the interface may require users to modify their pipelines when updating to the latest patch. All such breaking changes will be logged here.

See the Hail 0.1 docs to get started with 0.1. The Annotation Database and gnomAD distribution are currently only directly available for 0.1 but will be updated for 0.2 soon.

User Support

There are many ways to get in touch with the Hail team if you need help using Hail, or if you would like to suggest improvements or features. We also love to hear from new users about how they are using Hail.

Hail uses a continuous deployment approach to software development, which means we frequently add new features. We update users about changes to Hail via the Discussion Forum. We recommend creating an account on the Discussion Forum so that you can subscribe to these updates.

Contribute

Hail is committed to open-source development. Our Github repo is publicly visible. If you'd like to contribute to the development of methods or infrastructure, please:

Hail Team

The Hail team is embedded in the Neale lab at the Stanley Center for Psychiatric Research of the Broad Institute of MIT and Harvard and the Analytic and Translational Genetics Unit of Massachusetts General Hospital.

Contact the Hail team at hail@broadinstitute.org.

Follow Hail on Twitter @hailgenetics.

Citing Hail

If you use Hail for published work, please cite the software:

Acknowledgements

We would like to thank Zulip for supporting open-source by providing free hosting, and YourKit, LLC for generously providing free licenses for YourKit Java Profiler for open-source development.