My experimental tools on top of htslib. NOT OFFICIAL!!!
C Makefile
Clone or download
#2 Compare This branch is 119 commits ahead of samtools:lite.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc updated quick ref to v2.2 Aug 17, 2014
htslib r346: fixed segfault during BAM indexing Mar 29, 2018
test UTF-8 tag test Aug 18, 2014
.gitignore added git ignore May 3, 2012
Makefile moved two useless lines in Makefile Jul 6, 2017
README.md added new functionality to README Apr 24, 2015
abreak.c r320: a different way to set cut off Apr 4, 2016
bam2bed.c r276: bam2bed to output /[12] Dec 31, 2014
bam2fq.c r278: skip supplementary alignments!!! Jan 25, 2015
bamidx.c new indexing algo/fmt; NOT working yet Jun 22, 2012
bamshuf.c detailed vcfview cmd prompt May 17, 2012
bcfidx.c new indexing algo/fmt; NOT working yet Jun 22, 2012
bedidx.c r270: meticulous improvements to pileup Dec 18, 2014
bgzip.c r320: a different way to set cut off Apr 4, 2016
boxver.h r345: optionally support the * allele Feb 16, 2018
depth.c r315: fixed the bug in last commit Feb 12, 2016
faidx.c add reference allele in abreak sv calling mode May 18, 2015
faidx.h add reference allele in abreak sv calling mode May 18, 2015
genreg.c added genreg for parallel snp calling Mar 11, 2015
kthread.c r289: fixed a gcc warning Apr 2, 2015
kvec.h r326: added peovlp Jul 14, 2016
main.c box-r337: disregard "*" from mapchk Jul 23, 2017
mapchk.c r340: mapchk: -1 and -2; pileup: old behavior Jul 26, 2017
peovlp.c r327: fixed two warnings; changed def ovlp len Jul 14, 2016
pileup.c r345: optionally support the * allele Feb 16, 2018
qualbin.c r288: fixed an edge case Apr 2, 2015
razf.c r284: fixed a segfault in razf.c Apr 2, 2015
razf.h lite-r227: fixed g++ compilation errors Aug 12, 2014
razip.c r285: another bug in razip.c (caused by a typo) Apr 2, 2015
samsort.c r310: added samsort - lite internal BAM sort Jul 23, 2015
samview.c lite-r330: support long-cigar Jul 6, 2017
tabix.c new indexing algo/fmt; NOT working yet Jun 22, 2012
vcfview.c r312: fixed a typo in htsbox paf view Oct 9, 2015

README.md

Introduction

HTSbox is a fork of early HTSlib. It is a collection of small experimental tools manipulating HTS-related files. While some of these tools are already part of the official SAMtools package, others are for niche use cases in my own work, so are maintained by myself. Please keep in mind that HTSbox is NOT the official repository. It lags far behind HTSlib in terms of features, activity, clarity and robustness. If you are looking for a high-quality library and related tools, the SAMtools organization repositories are the right place.

Usage by examples

  1. Summary pileup:

    htsbox pileup -f ref.fa sorted1.bam sorted2.bam

    This gives the number of observed alleles, including InDels, in each input BAM. Additional options can be applied for filtering:

    htsbox pileup -f ref.fa -Q20 -q30 -l90 sorted.bam

    This filters alignments shorter than 90bp and with mapping quality lower than 20 and filters bases with quality lower than 20.

  2. Naive variant calling:

    htsbox pileup -f ref.fa -Q20 -q30 -cs3 sorted.bam

    The output in VCF gives positions with ALT alleles appearing 3 or more times. Note that variant calling in this way is very crude. It is usually not recommended to use this for whole-genome and exon variant calling from short reads. Nonetheless, `pileup' is a proper tool to call variants from contigs, in particular unitigs produced by fermi:

    htsbox pileup -cuf ref.fa unitig.bam

    Option -u enables multiple settings.

  3. Generate consensus FASTA:

    htsbox pileup -f ref.fa -Q20 -q30 -Fs3 sorted.bam
  4. Pairwise alignment summary:

    htsbox samview -p aln.bam

    In the output, each line gives QName, QLen, QStart, QEnd, Strand, RName, RLen, RStart, REnd, PerBaseDivergence, MapQ and semicolon-delimited misc information.

  5. Count alignment break points (mainly used to evaluate misassemblies):

    htsbox abreak name-srt.sam.gz

    or call structural variantions:

    htsbox abreak -u name-srt.sam.gz
  6. Reduce quality resolution with Illumina binning:

    htsbox qualbin -t2 -bm7 in.bam

    This is the only command so far that explores multi-threading.

  7. Evaluate empirical base quality, similar to MAQ's mapcheck:

    htsbox mapchk sorted.bam ref.fa

    The output is a bit complicated. Let me know if you are interested (I will see if it is worth documenting the output).

  8. Generate per-base read depth:

    htsbox depth -p0 sorted.bam

    The last two columns give the total read depth and depth of reads mapped with high mapQ (threshold defaults to 20). When -p takes a value between 0 and 1, it will use the CODOC strategy to find windows with relatively stable read depth. The output can be much smaller as it loses some information.