Release Canu v1.5 · marbl/canu

These are release notes for Canu version 1.5, which was released on April 17th, 2017. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).

Minimum Requirements

Perl 5.12.0, or File::Path 2.08
Java SE 8
GCC 4.5 (for compilation only)
OS X 10.10 (for binaries only)
gnuplot (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code (the file can be named either canu-v1.5.tar.gz or just v1.5.tar.gz, depending on how it is downloaded):

gunzip -dc canu-v1.5.tar.gz | tar -xf -
cd canu-1.5/src
make -j 8
cd ..

To install from a binary distribution:

xz -dc canu-1.5.*.tar.xz |tar -xf -

In both cases, canu is installed directory in canu-1.5/-, for example, canu-1.5/Linux-amd64. You can run the assembler with:

canu-1.5/*/bin/canu

Changes

Add preliminary support for object storage.
Paths used in the various shell scripts and the diagnostic output are no longer full paths.
Use Edlib for read alignments during correction and consensus, which is both faster and generates higher quality results compared to the previous alignment algorithms.
Add options rawErrorRate and correctedErrorRate, both specifying the expected error in an alignment of two reads. The previous errorRate option is still accepted, and is equivalent to 1/3 * correctedErrorRate. Details are in the tutorial.
Add experimental options overlapper=mhap and utgReAlign=true which are significantly faster on ultra-long sequences. Both options need to be supplied. Currently has limited testing and is run at your own risk. On large genomes (>200mb) it can produce a less contiguous assembly than the default.
The GFA output now has correct CIGAR strings for all links.
Support staging of some data on local disk for greatly improved performance during read correction.
Significantly better support for PBSPro and LSF. Many thanks to the users that helped us work through problems.
Fix error when more than 10,000 jobs were created using using the ovsMethod=parallel overlap store creation algorithm.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

Large memory usage while unitig consensus calling on unitigs over 100MB in size; a 140Mb contig required approximately 75GB.
Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment.
Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canu v1.5