High Throughput Algorithm for Long Read Error Correction
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src Update Nov 12, 2017
thirdparty 9/6 Sep 6, 2016
Makefile bugfixing Aug 6, 2017
Readme.md docs:update readme Feb 11, 2018
runHALC.py Allow other python default location Jun 11, 2018

Readme.md

LATEST NEWS

The HALC paper is accepted for publication in BMC Bioinformatics!

Overview

HALC is software that makes error correction for long reads with high throughput.

Copy right

HALC is under the Artistic License 2.0.

Short manual

  1. System requirements

    HALC is suitable for 32-bit or 64-bit machines with Linux operating systems. At least 4GB of system memory is recommended for correcting larger data sets.

  2. Installation

    Aligner BLASR and error correction software LoRDEC (only for -ordinary mode) are required to run HALC.

    • The source files in 'src' and 'thirdparty' folders can be compiled to generate a 'bin' folder by running Makefile: make all.
    • Put BLASR, LoRDEC and the 'bin' folder to your $PATH: export PATH=PATH2BLASR:$PATH , export PATH=PATH2LoRDEC:$PATH and export PATH=PATH2bin:$PATH, respectively.
  3. Inputs

    • Long reads in FASTA format.
    • Contigs assembled from the corresponding short reads in FASTA format.
    • The initial short reads in FASTA format (only for -ordinary mode; obtained with cat left_reads.fa >short_reads.fa and then cat right_reads.fa >>short_reads.fa).
  4. Using AlignGraph

    runHALC.py long_reads.fa contigs.fa [-options|-options]
    

    Options (default value):
    -o/-ordinary short_reads.fa (yes)
    Ordinary mode utilizing repeats to make correction. The error correction software LoRDEC and the initial short reads are required to refine the repeat corrected regions. It is exclusive with the -repeat-free option.
    -r/-repeat-free (no)
    Repeat-free mode without utilizing repeats to make correction. It is exclusive with the -ordinary option.
    -b/-boundary n (4)
    Maximum boundary difference to split the subcontigs.
    -a/-accurate (yes)
    Accurate construction of the contig graph.
    -c/-coverage n (auto)
    Expected coverage on contigs. If not specified, it can be automatically calculated.
    -w/-width n (4)
    Maximum width of the dynamic programming table.
    -k/-kmer n (25)
    Kmer length for LoRDEC refinement.
    -t/-threads n (auto)
    Number of threads for one process to create. It is automatically set to the number of computing cores.
    -l/-log (no)
    System log to print.

  5. Outputs

    • Error corrected full long reads.
    • Error corrected trimmed long reads.
    • Error corrected split long reads.

Chinese name

HALC's Chinese name is 浩克.