Skip to content

rcownie/hail

 
 

Repository files navigation

Hail

Zulip

Hail is an open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data. Hail is used throughout academia and industry as the analytical engine for major studies, projects, and services, including the Genome Aggregation Database (gnomad.broadinstitute.org) and Neale lab mega-GWAS (nealelab.is/uk-biobank).

Unlike the Python and R scientific computing stacks, Hail:

  • scales from laptop to large compute cluster or cloud, with the same code
  • is designed to work with datasets that do not fit in memory
  • has first-class support for multi-dimensional structured data, like genomic data as in this tutorial

Hail's methods are primarily written in Python, using primitives for distributed queries and linear algebra implemented in Scala, Spark, and increasingly C++. We welcome the scientific community to leverage Hail to develop, share, and apply new methods at scale!

See the homepage for more info on using Hail.

Contribute

Hail is committed to open-source development. If you'd like to contribute to the development of methods or infrastructure, please:

Hail uses a continuous deployment approach to software development, which means we frequently add new features. We update users about changes to Hail via the Discussion Forum. We recommend creating an account on the Discussion Forum so that you can subscribe to these updates as well.

Hail Team

The Hail team is embedded in the Neale lab at the Stanley Center for Psychiatric Research of the Broad Institute of MIT and Harvard and the Analytic and Translational Genetics Unit of Massachusetts General Hospital.

Contact the Hail team at hail@broadinstitute.org.

Follow Hail on Twitter @hailgenetics.

Citing Hail

If you use Hail for published work, please cite the software:

Acknowledgements

We would like to thank Zulip for supporting open-source by providing free hosting, and YourKit, LLC for generously providing free licenses for YourKit Java Profiler for open-source development.

About

Scalable genomic data analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 53.4%
  • Python 39.1%
  • C++ 2.5%
  • Jupyter Notebook 2.0%
  • Shell 0.6%
  • Makefile 0.6%
  • Other 1.8%