Graph-based alignment (Hierarchical Graph FM index)
Clone or download
Permalink
Failed to load latest commit information.
doc Homepage update Jun 8, 2017
evaluation Minor fix to prevent rflen from going beyond the true reference lengt… Jun 7, 2017
example Added a new example Sep 6, 2015
hisat2.xcodeproj Minor fix to prevent rflen from going beyond the true reference lengt… Jun 7, 2017
hisatgenotype_modules . Jun 6, 2017
hisatgenotype_scripts Minor fix to prevent rflen from going beyond the true reference lengt… Jun 7, 2017
li_hla . Feb 5, 2016
msvcc Changed MSVCC to msvcc and removed typeof Nov 9, 2016
scripts Skipped alternative variants containing Ns May 1, 2016
third_party initial git commit Mar 30, 2015
.gitattributes Added .gitattributes to enable union merge for *.pbxproj Apr 22, 2015
.gitignore Updated .gitignore Jun 30, 2015
AUTHORS initial git commit Mar 30, 2015
LICENSE initial git commit Mar 30, 2015
MANUAL Replaced "bowtie" with "hisat2" Feb 23, 2018
MANUAL.markdown . Jun 1, 2017
Makefile Homepage update Jun 8, 2017
NEWS Slight update Jun 21, 2015
README.md Update README with clickable link to website. Jul 6, 2016
TUTORIAL Tutorial for HISAT2 (not bowtie2) Dec 11, 2017
VERSION Fixed inconsistent alignment stat. of HISAT2, and changed ulimit -n f… May 30, 2017
aligner_bt.cpp Second modification for improving template length estimation using sp… May 16, 2016
aligner_bt.h initial git commit Mar 30, 2015
aligner_cache.cpp initial git commit Mar 30, 2015
aligner_cache.h initial git commit Mar 30, 2015
aligner_driver.cpp initial git commit Mar 30, 2015
aligner_driver.h Added codes for hisat2-align, hisat2-inspect May 31, 2015
aligner_metrics.h initial git commit Mar 30, 2015
aligner_report.h initial git commit Mar 30, 2015
aligner_result.cpp Fixed a compile error on a recent version of XCODE Jun 25, 2018
aligner_result.h Updates for 2.0.4 release May 18, 2016
aligner_seed.cpp Added codes for hisat2-align, hisat2-inspect May 31, 2015
aligner_seed.h initial git commit Mar 30, 2015
aligner_seed2.cpp Added codes for hisat2-align, hisat2-inspect May 31, 2015
aligner_seed2.h Implemented HGFM based alignment Jun 18, 2015
aligner_seed_policy.cpp Implemented --soft-clip Sep 10, 2016
aligner_seed_policy.h In the middle of implementing transcriptome assembly option Sep 5, 2015
aligner_sw.cpp Second modification for improving template length estimation using sp… May 16, 2016
aligner_sw.h initial git commit Mar 30, 2015
aligner_sw_common.h initial git commit Mar 30, 2015
aligner_sw_driver.cpp initial git commit Mar 30, 2015
aligner_sw_driver.h Fixed a bug that reads are aligned beyond genome sequences Sep 19, 2015
aligner_sw_nuc.h initial git commit Mar 30, 2015
aligner_swsse.cpp initial git commit Mar 30, 2015
aligner_swsse.h Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
aligner_swsse_ee_i16.cpp Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
aligner_swsse_ee_u8.cpp Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
aligner_swsse_loc_i16.cpp Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
aligner_swsse_loc_u8.cpp Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
aln_sink.cpp initial git commit Mar 30, 2015
aln_sink.h Modified hisat2 perl script to handle path names including space, and… Jun 1, 2017
alphabet.cpp initial git commit Mar 30, 2015
alphabet.h initial git commit Mar 30, 2015
alt.h . Apr 4, 2017
assert_helpers.h initial git commit Mar 30, 2015
banded.cpp initial git commit Mar 30, 2015
banded.h initial git commit Mar 30, 2015
binary_sa_search.h initial git commit Mar 30, 2015
bitpack.h Designed and implemented GFM index, but FTAB, EFTAB, and OFFSET remai… May 28, 2015
blockwise_sa.h Additional implementation for haplotypes Jan 8, 2016
bp_aligner.h initial git commit Mar 30, 2015
btypes.h Implemented Parallelization of RefGraph construction using tinythread Jun 11, 2015
ccnt_lut.cpp Add ftab and eftab for GFM correctly, implemented rank_M, and impleme… Jun 2, 2015
classifier_li.h initial git commit Mar 30, 2015
diff_sample.cpp initial git commit Mar 30, 2015
diff_sample.h Modified hisat2 perl script to handle path names including space, and… Jun 1, 2017
dp_framer.cpp initial git commit Mar 30, 2015
dp_framer.h initial git commit Mar 30, 2015
ds.cpp Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
ds.h implemented resizeNoCopyExact and changed resizeExact calls to it Jul 15, 2015
edit.cpp initial git commit Mar 30, 2015
edit.h . May 17, 2016
endian_swap.h Fixed joinedToTextOff Mar 14, 2016
extract_exons.py Updated build_indexes.py script Mar 12, 2016
extract_splice_sites.py Updated build_indexes.py script Mar 12, 2016
fast_mutex.h Replace 64 bit assembler code with InterlockedExchange. Nov 6, 2016
filebuf.h initial git commit Mar 30, 2015
formats.h initial git commit Mar 30, 2015
gbwt_graph.h Added haplotypes to index Mar 30, 2017
gfm.cpp Implemented to report index version in hisat2-inspect Aug 9, 2015
gfm.h Alignment using haplotypes partially implemented and genotype scripts… Apr 17, 2017
gp.h Alignment using haplotypes partially implemented and genotype scripts… Apr 17, 2017
group_walk.cpp initial git commit Mar 30, 2015
group_walk.h . Jan 18, 2016
hgfm.h Alignment using haplotypes partially implemented and genotype scripts… Apr 17, 2017
hi_aligner.h Minor fix to prevent rflen from going beyond the true reference lengt… Jun 7, 2017
hier_idx_common.h Used different line rates for linear and graph indexes Nov 19, 2015
hisat2 Cleanup child processes, when the hisat2(wrapper) terminated Mar 15, 2018
hisat2-build Added HGFM and scripts for hisat2 binaries May 27, 2015
hisat2-inspect Added HGFM and scripts for hisat2 binaries May 27, 2015
hisat2.cpp Replace "bowtie" with "HISAT2" Mar 9, 2018
hisat2.sln Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
hisat2_build.cpp improve hisat2-build --help Jun 15, 2017
hisat2_build_main.cpp added HISAT VERSION2 Apr 23, 2015
hisat2_extract_exons.py Added hisat2 to the names of python scripts, and changed make scripts… Jan 19, 2016
hisat2_extract_snps_haplotypes_UCSC.py Improved the usage information of SNP and haplotype extracting scripts May 7, 2016
hisat2_extract_snps_haplotypes_VCF.py Improved the usage information of SNP and haplotype extracting scripts May 7, 2016
hisat2_extract_splice_sites.py Update hisat2_extract_splice_sites.py Jul 17, 2017
hisat2_inspect.cpp fix wrong --usage for hisat-inspect Jun 15, 2017
hisat2_main.cpp Added codes for hisat2-align, hisat2-inspect May 31, 2015
hisat2_simulate_reads.py Fixed reading chromosome names Mar 7, 2016
hisat_bp.cpp Replace "bowtie" with "HISAT2" Mar 9, 2018
hisatgenotype.py Additional fixes for hisatgenotype.py Jun 4, 2017
hisatgenotype_build_genome.py . May 27, 2017
hisatgenotype_extract_reads.py Fixed some problems in hisatgenotype_extract_reads.py, and tested the… May 28, 2017
hisatgenotype_extract_vars.py . Jun 6, 2017
hisatgenotype_hla_cyp.py . Jan 16, 2017
hisatgenotype_locus.py Fixed a bug in identify_ambigious_diffs fuction Jun 5, 2017
ival_list.cpp initial git commit Mar 30, 2015
ival_list.h initial git commit Mar 30, 2015
limit.cpp initial git commit Mar 30, 2015
limit.h initial git commit Mar 30, 2015
ls.cpp initial git commit Mar 30, 2015
ls.h initial git commit Mar 30, 2015
make initial git commit Mar 30, 2015
mask.cpp initial git commit Mar 30, 2015
mask.h initial git commit Mar 30, 2015
mem_ids.h initial git commit Mar 30, 2015
mm.h initial git commit Mar 30, 2015
multikey_qsort.cpp Parallelized some parts of linear-index building Dec 18, 2015
multikey_qsort.h fix failure to fully sort diff cov sample Jun 8, 2017
opts.h Implemented --avoid-pseudogene to try to avoid aligning reads to pseu… Jun 1, 2017
outq.cpp initial git commit Mar 30, 2015
outq.h initial git commit Mar 30, 2015
pat.cpp . Sep 27, 2016
pat.h initial git commit Mar 30, 2015
pe.cpp Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
pe.h initial git commit Mar 30, 2015
presets.cpp initial git commit Mar 30, 2015
presets.h initial git commit Mar 30, 2015
processor_support.h Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
qual.cpp initial git commit Mar 30, 2015
qual.h Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
radix_sort.h changed all tabs to four spaces, no functional changes Jul 22, 2015
random_source.cpp initial git commit Mar 30, 2015
random_source.h initial git commit Mar 30, 2015
random_util.cpp initial git commit Mar 30, 2015
random_util.h initial git commit Mar 30, 2015
read.h initial git commit Mar 30, 2015
read_qseq.cpp initial git commit Mar 30, 2015
ref_coord.cpp initial git commit Mar 30, 2015
ref_coord.h Started working on extension operation Jun 12, 2015
ref_read.cpp Ignored reference seqeunces consisting of only Ns Jun 9, 2016
ref_read.h initial git commit Mar 30, 2015
reference.cpp . Apr 29, 2016
reference.h initial git commit Mar 30, 2015
sam.h --sam-no-qname-trunc is correctly executed Oct 12, 2016
scoring.cpp initial git commit Mar 30, 2015
scoring.h Parallelized some parts of linear-index building Dec 18, 2015
search_globals.h initial git commit Mar 30, 2015
sequence_io.h initial git commit Mar 30, 2015
shmem.cpp initial git commit Mar 30, 2015
shmem.h initial git commit Mar 30, 2015
simple_func.cpp initial git commit Mar 30, 2015
simple_func.h initial git commit Mar 30, 2015
splice_site.cpp Modification on the strand information of splice sites May 5, 2016
splice_site.h Modification on the strand information of splice sites May 5, 2016
splice_site_mem.h initial git commit Mar 30, 2015
splice_site_new.cpp initial git commit Mar 30, 2015
spliced_aligner.h Fixed an alignment issue for long sequences >10kb May 22, 2017
sse_util.cpp initial git commit Mar 30, 2015
sse_util.h initial git commit Mar 30, 2015
sstring.cpp initial git commit Mar 30, 2015
sstring.h initial git commit Mar 30, 2015
str_util.h initial git commit Mar 30, 2015
threading.h initial git commit Mar 30, 2015
timer.h initial git commit Mar 30, 2015
tinythread.cpp Increased stack size to 4MB per thread Apr 15, 2016
tinythread.h initial git commit Mar 30, 2015
tokenize.h initial git commit Mar 30, 2015
tp.h Implemented --avoid-pseudogene to try to avoid aligning reads to pseu… Jun 1, 2017
unique.cpp initial git commit Mar 30, 2015
unique.h Changed the maximum value of Mapping Qualith from 255 to 60, and fixe… May 17, 2016
util.h Minor code changes to allow build using Microsoft Visual Studio 2015 Nov 5, 2016
word_io.h initial git commit Mar 30, 2015
zbox.h initial git commit Mar 30, 2015

README.md

hisat2

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for a graph [1], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents general population, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover human population). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of sequencing reads. This new indexing scheme is called Hierarchical Graph FM index (HGFM). We have developed HISAT2 based on the HISAT [2] and Bowtie 2 [3] implementations. See the HISAT2 website for more information.

A few notes:

  1. HISAT2's index (HGFM) size for the human reference genome and 12.3 million common SNPs is 6.2GB. The SNPs consist of 11 million single nucleotide polymorphisms, 728,000 deletions, and 555,000 insertions. Insertions and deletions used in this index are small (usually <20bp). We plan to incorporate structural variations (SV) into this index.

  2. HISAT2 also allows for mapping reads directly to transcriptome, similar to that of TopHat2.

  3. The memory footprint of HISAT2 is relatively low, 6.7GB.

  4. The runtime of HISAT2 is estimated to be slightly slower than HISAT (30–100% slower for some data sets).

  5. HISAT2 provides greater accuracy for alignment of reads containing SNPs.

  6. We released a first (beta) version of HISAT2 in September 8, 2015.

References:

[1] Sirén J, Välimäki N, Mäkinen V (2014) Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11: 375–388. doi: 10.1109/tcbb.2013.2297101

[2] Kim D, Langmead B, and Salzberg SL HISAT: a fast spliced aligner with low memory requirements, Nature methods, 2015

[3] Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359