Assets 4

These are release notes for Canu version 1.7.1, which was released on June 18th, 2018. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • GCC 4.5 (for compilation only); GCC 6 recommended
  • macOS 10.10 Yosemite (for macOS/Darwin binaries only)
  • gnuplot 5.2 (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code (the file can be named either canu-v1.7.1.tar.gz or just v1.7.1.tar.gz, depending on how it is downloaded):

gunzip -dc canu-v1.7.1.tar.gz | tar -xf -
cd canu-1.7.1/src
make -j 8
cd ..

To install from a binary distribution:

xz -dc canu-1.7.1.*.tar.xz |tar -xf -

In both cases, canu is installed in directory canu-1.7.1/-, for example, canu-1.7.1/Linux-amd64. You can run the assembler with:

canu-1.7.1/*/bin/canu

Changes

This release contains only bug fixes made since Canu v1.7 was released. No featrues were added or removed.

Canu v1.7.1 is compatible with assemblies started with Canu v1.7.

Canu v1.7 and v1.7.1 ARE NOT compatible with assemblies started with Canu v1.6.

Bug Fixes

*Fix many bogart issues, including the dreaded "Assertion `cnt > 0' failed". Issues #930, #874, #873, #844, #718, #546. Backported from 6f3c375.
*Fix Read Error Detection (RED) configuration to prevent single-read jobs. Issues #935, #854, #831, #815. Backported from eeef601.
*Fix excessive memory usage when loading evalues into the ovlStore. Issues #956, #758, #755. Backported from 858eff8.
*Fix a (potential) performance problem when computing overlaps for large assemblies: don't set a one-size-fits-all ovlHashBits, base it on the genome size. Backported from a580131.
*Fix a compilation error with GCC 8. Issue #927. Backported from f251336.

Known Issues

*Downloads before 22 June 2018 incorrectly reported the version as "1.7".

See the issues page for up-to date open issues, or to report a problem.

  • Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment. The -fast option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp.
  • TrioCanu is not yet optimized for memory usage, as a result it requires higher than default memory for large genomes, the options gridOptionsExecutive="--mem=250g" griodOptionsMeryl='--partition=largemem --mem=1000g' (or the equivalent memory request on your grid) should be sufficient for a 3 Gbp genome.
  • Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.

@brianwalenz brianwalenz released this Feb 27, 2018 · 363 commits to master since this release

Assets 4

These are release notes for Canu version 1.7, which was released on February 27th, 2018. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • GCC 4.5 (for compilation only)
  • OS X 10.10 (for binaries only)
  • gnuplot (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code (the file can be named either canu-v1.7.tar.gz or just v1.7.tar.gz, depending on how it is downloaded):

gunzip -dc canu-v1.7.tar.gz | tar -xf -
cd canu-1.7/src
make -j 8
cd ..

To install from a binary distribution:

xz -dc canu-1.7.*.tar.xz |tar -xf -

In both cases, canu is installed in directory canu-1.7/-, for example, canu-1.7/Linux-amd64. You can run the assembler with:

canu-1.7/*/bin/canu

Changes

This release was originally planned to only include changes to read correction, but we opportunistically added: improved support for plasmids via read rescue; an initial implementation of trio binning; a 'fast mode' for Nanopore reads (though not automatic); and sneaked in some major changes to the gkpStore/tigStore read/contig database for future use. So much for the plan.

Assemblies started in Canu v1.6 ARE NOT compatible with Canu v1.7.

  • Ensure that every raw read is either corrected or used as evidence for correcting some other raw read. This serves to rescue short plasmids in high coverage datasets, and it is no longer necessary to set corOutCoverage to achieve the same result.
  • Initial support of TrioCanu (biorxiv) added.
  • Add a '-fast' option for using a faster (but still not rigorously validated) overlap method. Useful for long Nanopore reads.
  • In anticipation of future features, all reads - raw, corrected and trimmed versions - are stored in a single gkpStore in the root assembly directory.
  • Read correction was almost completely re-engineered.
    • Stability of the computation was increased by removing multiple processes communicating through a pipe.
    • Layouts of the raw reads used to correct a read are saved for future use (e.g., during consensus). With the gkpStore change above, it is now possible to track a raw read through to the final contig outputs.
    • Only a single corrected read is generated for each raw read. Previously, PacBio reads containing multiple sub-reads could create multiple (redundant) corrected reads.
  • Overlap Error Detection (RED and OEA) memory usage when configuring compute jobs has been reduced.
  • Overlap Error Detection (RED and OEA) job sizes were increased to reduce disk contention.
  • overlapInCore (OBTOVL and UTGOVL) job sizes were increased to reduce disk contention and to take advantage of generally larger memory sizes available.
  • The ovlRefBlockSize parameter was removed; use ovlRefBlockLength instead.
  • Update to Snappy v1.1.7.
  • Add basic support for RNA by translating input U bases to T bases. Output files are NOT translated back to U bases.
  • Restrict the parallel overlap store creation method to grid runs. ovsMethod=forceparallel was added to force the usage of the parallel method on non-grid runs.
  • Add the preExec option to allow a single command to run before any Canu program is run. Useful for, e.g., loading a Canu module.
  • Use more standard locations for installing binaries and perl modules.

Bug Fixes

  • In non-grid mode, Canu was running too many jobs concurrently and exhausting memory.
  • Memory needed for consensus jobs is now set based on the largest contig.
  • The VN tag in GFA outputs was set, incorrectly, to the name of the program creating the file. It is now reflecting the format version of the GFA file.
  • Numerous not-very-exciting pedantic coding errors resolved. Stuff like failing to close a single input file, failing to release a block of memory, failing to check if an operation successfully completed, et cetera, that were technically incorrect but not significantly so.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

  • The Overlap Error Adjustment step does not properly configure its memory usage, include redMemory=8 oeaMemory=8 as a workaround.
  • Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment. The -fast option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp.
  • TrioCanu is not yet optimized for memory usage, as a result it requires higher than default memory for large genomes, the options gridOptionsExecutive="--mem=250g" griodOptionsMeryl='--partition=largemem --mem=1000g' (or the equivalent memory request on your grid) should be sufficient for a 3 Gbp genome.
  • Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.

@brianwalenz brianwalenz released this Aug 14, 2017 · 643 commits to master since this release

Assets 4

These are release notes for Canu version 1.6, which was released on August 14th, 2017. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • GCC 4.5 (for compilation only)
  • OS X 10.10 (for binaries only)
  • gnuplot (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code (the file can be named either canu-v1.6.tar.gz or just v1.6.tar.gz, depending on how it is downloaded):

gunzip -dc canu-v1.6.tar.gz | tar -xf -
cd canu-1.6/src
make -j 8
cd ..

To install from a binary distribution:

xz -dc canu-1.6.*.tar.xz |tar -xf -

In both cases, canu is installed in directory canu-1.6/-, for example, canu-1.6/Linux-amd64. You can run the assembler with:

canu-1.6/*/bin/canu

Changes

  • Improved detection of unitig and contig edges in GFA outputs.
  • Repeats that are confirmed correct no longer form unitigs. This increases unitig length and greatly simplifies the unitig GFA.
  • Small plasmids are no longer flagged as 'unassembled' sequences. Note that the contigFilter option values have changed and old values run the risk of filtering incorrectly.
  • Improved contig consensus accuracy (longer alignments to reference).
  • Added a unitig to contig mapping via a BED output.
  • Better memory management in bogart should reduce memory footprint slightly and run slightly faster.
  • Remove the ovlStore for correction and trimming when those stages are finished. saveOverlaps=stores will retain them. The correction overlaps are usually the single largest consumer of disk space during the assembly.
  • Remove the partitioned gkpStore copy when consensus is finished.
  • Use file names with five digits, instead of four, for overlap error adjustment.
  • Options minMemory and minThreads are now implemented.
  • Use all overlaps, not just the best, to position reads in unitigs/contigs, resulting in more accurate repeat and edge detection.
  • Implement the 'suggestCircular' flag in contigs and unitigs. It is set to 'true' if the single sequence can be circularized. Note: the flag is 'false' if two or more contigs are needed to form the circular chromosome.
  • Stability improvements to overlap store building when ovsMethod=parallel (the default for large genomes).
  • Easier restarts: if restarted from within the assembly directory, the -p, -d and read files can be omitted.
  • Improved logging: citations are output at the start of the run for any included software within Canu.

Bug Fixes

  • Fixed CIGAR multithreading bug in unitig and contig graphs which dropped some true edges.
  • Fix invalid characters in corrected reads due to out of bounds array access.
  • Fix useGrid=remote which failed to output commands when multiple jobs needed to be submitted.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

  • When running each step (correct/trim/assemble) by hand, the assemble step will use corrected not trimmed reads when all steps are run with the same -d option. Run with different -d options as a workaround.
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size; a 140Mb contig required approximately 75GB.
  • Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment. The options overlapper=mhap utgReAlign=true is significantly faster but may produce slightly less contiguous assemblies on genomes >200 Mbp in size.
  • Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.

@brianwalenz brianwalenz released this Apr 17, 2017 · 855 commits to master since this release

Assets 5

These are release notes for Canu version 1.5, which was released on April 17th, 2017. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • GCC 4.5 (for compilation only)
  • OS X 10.10 (for binaries only)
  • gnuplot (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code (the file can be named either canu-v1.5.tar.gz or just v1.5.tar.gz, depending on how it is downloaded):

gunzip -dc canu-v1.5.tar.gz | tar -xf -
cd canu-1.5/src
make -j 8
cd ..

To install from a binary distribution:

xz -dc canu-1.5.*.tar.xz |tar -xf -

In both cases, canu is installed directory in canu-1.5/-, for example, canu-1.5/Linux-amd64. You can run the assembler with:

canu-1.5/*/bin/canu

Changes

  • Add preliminary support for object storage.
  • Paths used in the various shell scripts and the diagnostic output are no longer full paths.
  • Use Edlib for read alignments during correction and consensus, which is both faster and generates higher quality results compared to the previous alignment algorithms.
  • Add options rawErrorRate and correctedErrorRate, both specifying the expected error in an alignment of two reads. The previous errorRate option is still accepted, and is equivalent to 1/3 * correctedErrorRate. Details are in the tutorial.
  • Add experimental options overlapper=mhap and utgReAlign=true which are significantly faster on ultra-long sequences. Both options need to be supplied. Currently has limited testing and is run at your own risk. On large genomes (>200mb) it can produce a less contiguous assembly than the default.
  • The GFA output now has correct CIGAR strings for all links.
  • Support staging of some data on local disk for greatly improved performance during read correction.
  • Significantly better support for PBSPro and LSF. Many thanks to the users that helped us work through problems.
  • Fix error when more than 10,000 jobs were created using using the ovsMethod=parallel overlap store creation algorithm.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

  • Large memory usage while unitig consensus calling on unitigs over 100MB in size; a 140Mb contig required approximately 75GB.
  • Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment.
  • Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.

@brianwalenz brianwalenz released this Dec 13, 2016 · 1060 commits to master since this release

Assets 4

These are release notes for Canu version 1.4, which was released on December 13, 2016. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need (even the Perl modules!) to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • GCC 4.5 (for compilation only)
  • OS X 10.10 (for binaries only)
  • Gnuplot (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code:

gunzip -dc v1.4.tar.gz |tar -xf -
cd canu-1.4/src
make -j8
cd ..

To install from a binary distribution:

xz -dc canu-1.4.*.tar.xz |tar -xf -

In both cases, canu is installed directory in canu-1.4/-, for example, canu-1.4/Linux-amd64. You can run the assembler with:

canu-1.4/*/bin/canu

Changes

  • Removed dependency on Filesys::Df.
  • Reduced size of overlap stores by 33 1/3%.
  • Added inline Snappy compression overlaps, instead of a separate gzip process. This greatly reduces resources required for building large overlap stores.
  • Memory mapped files are no longer used. Performance on distributed file systems should be improved. Virtual memory usage is greatly reduced.
  • Fixed a variety of issues in GFA output on unitigs, and added GFA output on contigs.
  • Added options onSuccess and onFailure to run a command when Canu terminates successfully or fails unexpectedly.
  • Added support for PBSPro.
  • Fixed the usual assortment of random bugs.
  • Added other minor improvements.

Known Issues

See the issues page for up-to date open issues. The currently known issues are:

  • For AT/GC rich eukaryotic genomes, it is beneficial to increase the filtering stringency over the default. Specifying corMaxEvidenceErate=0.15 (from the default of 0.2) is generally sufficient.
  • As a computational optimization, you can decrease the error rate (errorRate=0.013), especially for inbred strains, on Oxford Nanopore R9 2D data and high-coverage P6 PacBio data.
  • LSF support has limited testing
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
  • Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.

@skoren skoren released this Jun 8, 2016

Assets 4

These are release notes for Canu version 1.3, which was released on June 8, 2016. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need (even the Perl modules!) to create a binary distribution for your own specific OS.

Citation

Installation

Requirements

  • Java SE 8 +
  • GCC 4.5+ (for compilation only)
  • Filesys::Df Perl module (for binaries only, depending on perl version)
  • OS X 10.10 or newer (for binaries only)
  • Gnuplot (optional for generating HTML graphs)

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code:

gzip -dc canu-1.3.tar.gz | tar -xf -
cd canu-1.3/src
make -j8
cd ..

To install from a binary distribution:

bzip2 -dc canu-1.3*.tar.bz2 | tar -xf -

In both cases, canu is installed directory in canu-1.3/-, for example, canu-1.3/Linux-amd64. You can run the assembler with:

canu-1.3/*/bin/canu

Changes

  • Rewritten bogart algorithm to auto-set error rate and avoid false-breaks due to repeats.
  • Updated GFA output to include all edges in the graph.
  • Updated MHAP release to 2.1 for further speed improvements and improved repeat suppression.
  • Auto-set MHAP and other parameters based on genome coverage.
  • Fix slow 3-overlapErrorAdjustment runtime.
  • Fix memory request for 3-overlapErrorAdjustment.
  • Pipeline bug fixes

Known Issues

See the issues page for up-to date open issues. The currently known issues are:

  • For AT/GC rich eukaryotic genomes, it is beneficial to increase the filtering stringency over the default. Specifying corMaxEvidenceErate=0.15 (from the default of 0.2) is generally sufficient.
  • As a computational optimization, you can decrease the error rate (errorRate=0.013), especially for inbred strains, on Oxford Nanopore R9 2D data and high-coverage P6 PacBio data.
  • LSF support has limited testing
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
  • Distributed file systems (such as GPFS) causes issues with memory mapped files, slowing down parts of Canu, including meryl (0-mercounts) and falcon-sense (2-correction).

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.

@skoren skoren released this Apr 7, 2016 · 1653 commits to master since this release

Assets 4

These are release notes for Canu version 1.2, which was released on April 7, 2016. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need (even the Perl modules!) to create a binary distribution for your own specific OS.

Citation

Installation

Requirements

  • Java SE 8 +
  • GCC 4.5+ (for compilation only)

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code:

gzip -dc canu-1.2.tar.gz | tar -xf -
cd canu-1.1/src
make -j8
cd ..

To install from a binary distribution:

bzip2 -dc canu-1.2*.tar.bz2 | tar -xf -

In both cases, canu is installed directory in canu-1.2/-, for example, canu-1.2/Linux-amd64. You can run the assembler with:

canu-1.2/*/bin/canu

Changes

  • Fix bug of not filtering overlaps sufficiently before input to falcon_sense, leading to fewer corrected reads and a low-quality assembly.

Known Issues

See the issues page for up-to date open issues. The currently known issues are:

  • For large high-coverage genomes it is beneficial to use the fast MHAP mode (generally over 70X).
  • Bogart (unitigger) has false positives in repeat breaking. Currently, the temporary workaround is to increase the minimum overlap size to avoid detecting false repeats caused by short overlaps. Canu will automatically do this for large (>10MB) genomes while the fixed algorithm is tested.
  • For AT/GC rich genomes, it is beneficial to increase the filtering stringency over the default. Specifying corMaxEvidenceErate=0.15 (from the default of 0.2) is generally sufficient.
  • LSF support has limited testing
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
  • Distributed file systems (such as GPFS) causes issues with memory mapped files, slowing down parts of Canu, including meryl (0-mercounts) and falcon-sense (2-correction).

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.

@skoren skoren released this Mar 11, 2016

Assets 4

These are release notes for Canu version 1.1, which was released on March 11, 2016. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need (even the Perl modules!) to create a binary distribution for your own specific OS.

Citation

Installation

Requirements

  • Java SE 8 +
  • GCC 4.5+ (for compilation only)

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code:

gzip -dc canu-1.1.tar.gz | tar -xf -
cd canu-1.1/src
make -j8
cd ..

To install from a binary distribution:

bzip2 -dc canu-1.1*.tar.bz2 | tar -xf -

In both cases, canu is installed directory in canu-1.1/-, for example, canu-1.1/Linux-amd64. You can run the assembler with:

canu-1.1/*/bin/canu

Changes

  • Support for reads up to 2Mbp in size (up from 130Kbp).
  • Incorporate MHAP 2.0 which is 5X faster than previous version and has higher specificity
  • Add corMhapSensitivity=fast option which can generate correction overlaps for a human genome in < 2500 CPU hours (full assembly <25,000 CPU hours). This option is recommended for genomes with deeper coverage (60X+).
  • Add GFA output
  • Improve diploid-aware assembly by categorizing output as primary contigs or unmerged bubbles. Annotate repeat and unique contigs in the output.
  • Enable parallel overlap store construction on large genomes
  • Enable minimap as an option for generating overlaps during correction step. Corrected reads are generated as before with falcon_sense.
  • Fix bug using shorter rather than longer reads for corrected reads/consensus computation
  • Fix bug resuming without providing input sequences which would incorrectly set error rates
  • Fix bug in bogart which would demote contained sequences as spurs incorrectly
  • Fix bugs in falcon_sense which would hang when input had N bases and limit corrected reads to 65Kbp
  • Fix falcon_sense support on OSX <10.10.
  • Fix various pipeline bugs

Known Issues

See the issues page for up-to date open issues. The currently known issues are:

  • Canu 1.1 is not backwards compatible with Canu 1.0. If you have an in-process assembly, do not upgrade to Canu 1.1 until it completes.
  • There is a bug with filtering overlaps before passing them to falcon_sense for generating corrected reads on repetitive genomes. If you are assembling large/repetitive genomes (generally >500Mb), you must specify 'corMaxEvidenceErate=0.2' ('corMaxEvidenceErate=0.3' for low-coverage datasets) while the fix is tested.
  • For large high-coverage genomes it is beneficial to use the fast MHAP mode (generally over 70X).
  • Bogart (unitigger) has false positives in repeat breaking. Currently, the temporary workaround is to increase the minimum overlap size to avoid detecting false repeats caused by short overlaps. Canu will automatically do this for large (>100MB) genomes while the fixed algorithm is tested.
  • LSF support has limited testing
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
  • Distributed file systems (such as GPFS) causes issues with memory mapped files, slowing down parts of Canu, including meryl (0-mercounts) and falcon-sense (2-correction).

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.

@skoren skoren released this Dec 29, 2015 · 1898 commits to master since this release

Assets 4

These are release notes for Canu version 1.0, which was released on Dec 29th, 2015. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found online

This distribution package provides a stable, tested, documented version of the software. The distribution is usable on most Unix-like platforms, and some platforms have pre-compiled binary distributions ready for installation.
The source code package includes full source code, Makefiles, and scripts.

Citation

Compilation and Installation

Requirements

  • Java SE 8 +
  • GCC 4.5+ (for compilation only)

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To use the source code, execute these commands on any unix-like platform:

gzip -dc canu-1.0.tar.gz | tar -xf -
cd canu-1.0/src
make -j8
cd ..

To use the binary distributions, choose a platform, download that package, then unpack it with some unix command like this:

bzip2 -dc canu-1.0*.tar.bz2 | tar -xf -

In both cases, you can run the assembler with:

canu/*/bin/canu

Known Issues

See the issues page for up-to date open issues. The currently known issues are:

  • LSF support is untested
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
  • Distributed file systems (such as GPFS) causes issues with memory mapped files, slowing down parts of Canu, including meryl (0-mercounts) and falcon-sense (2-correction).

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.