HaploGrep - fastest way to classify your mtDNA profiles into haplogroups.
Java
Clone or download
Latest commit 0cb2d05 Jun 22, 2018
Permalink
Failed to load latest commit information.
data restructure data Jun 21, 2018
libs/genepi/haplogrep/2.0 move haplogrep folder to top Jun 21, 2018
src remove sysout Jun 22, 2018
test-data Delete h100.hsd Jun 22, 2018
.classpath fix paths Jun 21, 2018
.gitignore restructure data Jun 21, 2018
.project restructure data Jun 21, 2018
.travis.yml Update .travis.yml Jun 21, 2018
LICENSE Initial commit Mar 1, 2018
README.md Update README.md Jun 22, 2018
pom.xml Prepare release HaploGrep v2.1.9 Jun 21, 2018

README.md

GitHub Downloads Build Status

We provide a fast and free haplogroup classification service. You can upload your mtDNA profiles (vcf or hsd format) and receive mitochondrial haplogroups in return. So far, HaploGrep and the updated HaploGrep 2 have been cited over 400 times (Google Scholar - June 2018). Please join our HaploGrep Google User Group for future updates and ongoing discussions.

Command-line Version for local usage

Download and execute the latest release (v2.1.9).

  java -jar haplogrep-2.1.9.jar --in <input> --format vcf/hsd --out haplogroups.txt

HaploGrep requires Java 8 and works for Windows, Linux and Mac operating systems.

Additional Parameters

  • For adding additional output columns (e.g. found or remaining polymorphisms) please add the --extend-report flag (Default: off).
  • To change the metric to Hamming or Jaccard add the --metric parameter (Default: kulczynski).
  • The used Phylotree version can be changed using the --phylotree parameter (Default: 17).
  • If your variants are from genotyping arrays, please add the --chip parameter. The range will then be limited to array SNPs only (Default: off). This will only work for VCF. To get the same behaviour for hsd files, please add only the variants to the range, which are included in the array or in the range you have sequenced (e.g. control region). Range can be sepearted by a semicolon ;, both ranges and single positions are allowed (e.g. 1-576; 34).
  • To output the complete path from rCRS root to your input sample use the --lineage parameter. (Default: off). We provide a textual format (*.lineage.txt) and a Graphviz DOT format. You can upload the HaploGrep *.graphviz.txt file here or process it with the Graphviz library.

File Formats

The default input format is VCF. You can also specify your profiles in hsd format, which is a simple tab-delimited file format consisting of 4 columns (ID, Range, Haplogroup and Polymorphisms). For readability, the polymorphisms are also tab-delimited (so columns > 4). A hsd example can be found here.

Reference sequence

Several mtDNA references exist, HaploGrep currently assumes that everything is aligned to rCRS. Please checkout our blog post to learn more about this topic.

Genotyping arrays

If you are using HaploGrep for genotyping array, please have a look at the --chip parameter above.

Heteroplasmies (VCF only)

Heteroplasmies are often stored as heterozygous genotypes (0/1). If a HF field (= Heteroplasmy Frequency of variant allele; introduced by MToolBox) is specified in the VCF header, we add variants with a HF > 0.96 to the input profile.

Please have a look at mtDNA-Server to check for heteroplasmies and contamination in your NGS data.

Blog

Check out our blog regarding mtDNA topics.

Cite use

If you use HaploGrep, please cite our latest HaploGrep2 paper in combination with Phylotree 17. The first HaploGrep paper can be found here.

Contact

Sebastian Schoenherr (@seppinho) and Hansi Weissensteiner (@haansi); Division of Genetic Epidemiology, Medical University of Innsbruck;