Skip to content

Releases: mapleforest/HaploMerger2

HaploMerger2_20180603

04 Dec 03:52
Compare
Choose a tag to compare

This release of HaploMerger2 has been tested on CentOS/RedHat v5/6 and Ubuntu v10.04-64bit.

Currently, this release provides a tarball (HaploMerger2_20180603.tar.gz) of binaries and executable, in which the source code also included. Please download the latest standalone HaploMerger2 package from this website and unpack it to any location you like.

Features:

  • Robust, high sensitivity and flexibility.
  • Zero sequence loss (no sequence data would be thrown away unless specifically required by the user).
  • Raw diploid assembly separated into two complementary (mosaic) haploid assembles: the reference one (contains the high-quaility alleles) and the alternative one (contains the other alleles).
  • Come up with a real-world complex example to help the users to test/understand the procedure, capabilities and performance of the pipeline (which takes ~2 hours to finish a demo run; and users are welcomed to use this example to benchmark other similar tools).

Citation:

Requirements:

If you like to know more about the details on requirements, installation and usages before downloading, please read the latest manual.

If you like to test HaploMerger2 with a real fragmented highly-polymorphic diploid genome assembly, you may download the soft-masked (by winMasker) diploid assembly (~700Mb, 4-5% heterozygosity) of a wild-type amphioxus Branchiostoma belcheri here (bbv18wm.fa.gz). This is the same assembly submitted to NCBI under accession GCA_001625405.1 (GenBank acc: AYSR00000000.1). You could put this file (no need to unpack) into the directory "HaploMerger2_20180603/project_example3/" and then use it to test the pipeline.

If you like to further test the re-scaffolding function on haploid assemblies, you may also download the 2kb, 3kb, 8kb and 20kb mate-pair libraries here (scaffolding_libraries.tar.gz), or, from NCBI's sequence reads archive (SRA) database (SRX1421071-SRX1421074). You should unpack them to the directory "HaploMerger2_20180603/project_example3/libraries/" and then use them to test the pipeline.

If you like to further test the gap-filling function on haploid assemblies, you may also download two short-read libraries from the NCBI SRA database: SRX1364942 and SRX1364943. Please separate read#0 and read#1 into two gzipped Fastq file named 5_1/2.cor.fq.gz (for SRX1364942) and 7_1/2.cor.fq.gz (for SRX1364943). Then put them to the directory "HaploMerger2_20180603/project_example3/libraries/" and use them for testing.

Notices/ideas on creating high-quality phased haplotype assembly:

  • Strategies based on HaploMerger2 and Canu. As suggested in Canu's FAQ page, one would have three possible strategies: A) "Avoid collapsing the genome" in Canu to get raw diploid assembly as complete&separate as possible >> to end up with a raw diploid assembly; B) "Smash haplotypes together" in Canu to get the smallest diploid assembly (not haploid guaranteed) >> HapCUT2 or other phasing tools to get a corresponding phased assembly; C) "Avoid collapsing the genome" in Canu to get raw diploid assembly as complete&separate as possible >> HaploMerger2 to get high-quality reference and alternative haploid assemblies >> HapCUT2 or other phasing tools to get the high-quality haplotype assembly based on the reference haploid assembly. We recommend strategy C here.
  • People also ask about HaploMerger2 and Falcon/Falcon-unzip. First of all, one should know that so far, Falcon/Falcon-unzip may not give you fully-phased and complete haplotype assembly (may be the Falcon-phase could, but I have not yet try it). Ideally, HaploMerger2 should work between or after Falcon and Falcon-unzip. Method 1: Falcon is used to create a de novo raw diploid assembly (phased in certain degree); HaploMerger2 is then used to reconstruct the reference and alternative haploid (mosaic/unphased) assemblies from the raw diploid assembly (both the p-tigs and a-tigs from the Falcon-generated diploid assembly should be pooled together and fed to HaploMerger2); from here, one may choose to use HapCUT2 or other phasing tools to get the high-quality haplotype assembly. Method 2: Alternatively, after obtaining the Falcon diploid assembly, Falcon-unzip can be used to create better-phased and better-quality diploid assembly (p-tigs and H-tigs now); then HaploMerger2 can be used to further analyze and separate this diploid assembly (both p-tigs and H-tigs should be pooled together and fed to HaploMerger2). See Falcon/Falcon-unzip manual for information about p-tigs, a-tigs, H-tigs and the limitation of Falcon/Falcon-unzip on haplotype phasing.

Tips for trouble shooting:

  1. Check if Lastz and chainNet programs run correctly on your system; sometimes recompilation is needed for new OS.
  2. Check if the number of openable file handles is sufficient, because ChainNet may simultaneously open many files when there are many contig fasta files (set by invoking "ulimit –n 655350" with root privilege).
  3. Check if you invoke scripts in the root directory for your project (it should be placed under HM2 directory, like "..HaploMerger2_20180603/project_example1").
  4. Check if all the HM2 scripts and binaries are in your $PATH and can be accessed from your project directory and batch files.
  5. Avoid to run HM2's perl scripts directly unless you are familiar with HM2. Modify and use the batch files (batchA-E & run_all.batch) instead.
  6. Check if the fasta files are in correct format and contain no illigal characters (bin/faDnaPolishing.pl can be used to clean the fasta files).
  7. Proper soft-masking is required for both performance and accuracy; or HM2 might run forever on large genome assemblies. Hard-masking is not supported.
  8. If problems occur, it is recommanded to test HM2 with Example 3, which takes 2-3 hours to finish batchA-B.

Please feel free to contact us if you have any technical questions.