Skip to content

2.0.0

Compare
Choose a tag to compare
@nextstrain-bot nextstrain-bot released this 28 Jun 11:03

Nextclade 2.0.0

Rust

Nextclade core algorithms and command-line interface was reimplemented in Rust (replacing C++ implementation).

Rust is a modern, high performance programming language that is pleasant to read and write. Rust programs have comparable runtime performance with C++, while easier to write. It should provide a serious productivity boost for the dev team.

Also, it is now much simpler to contribute to Nextclade. If you wanted to contribute, or to simply review and understand the codebase, but were scared off by the complexity of C++, then give it another try - the Rust version is much more enjoyable! Check our developer guide for getting started. We are always open for contributions, reviews and ideas!

Alignment algorithm rewritten with adaptive bands

  • Feature: Previously, the alignment band width was constant throughout a given sequence. Now, band width is adaptive: narrow where seed matches indicate no indels, wide where seed matches indicate indels.

  • Performance is improved for sequences with indels

  • Fix: Terminal alignment errors, particularly common in BA.2, are fixed due to wider default band width between terminal seed matches and sequence ends

  • Fix: More robust seed matching allows some previously unalignable sequences to be aligned

  • Fix: Terminal indels for amino acid alignments are only free if the nucleotide alignment indicates a gap. Otherwise, they are penalized like internal gaps. This leads to more parsimonious alignment results.

  • Feature: Additional alignment parameters can now be tuned:

    • "Excess band width" parameter controls the extra band width that is necessary for correct alignment if both deletions and insertions occur between two seed matches.

    • "Terminal band width" controls the extra band width that is necessary for correct alignment if terminal indels occur.

  • Feature: "Min match rate" parameter is added, which sets required rage of seed matches in a sequence (number of matched seeds divided by total number of attempted seeds). If the measured rate is below required, alignment will not be attempted, as for such sequences, there is a high chance of infeasible memory and computational requirements. The default value is 0.3.

  • Fix: 3' terminal insertions are now properly detected

  • Feature: "Retry reverse complement" alignment parameter is added. When enabled, an additional attempt of seed matching is made after initial attempt fails. The second attempt is performed on reverse-complemented sequence.

    As a consequence:

    • the output alignment, peptides and analysis results correspond to this modified sequence and not to the original
    • sequence name gets a suffix appended to it for all output files (fasta, seqName column, node name on the tree etc.)
    • in output files, there is a new field/column: isReverseComplement, which contains true if the corresponding sequence underwent reverse-complement transformation

    This functionality is opt-in and the default behavior is unchanged: skip sequence and emit a warning.

Genes on reverse (negative) strand

Nextclade now correctly handles genes on reverse (negative) strand, which is particularly important for Monkeypox virus.

Nextclade Web

  • Feature: Nextclade Web is now substantially faster, both to startup and when analysing sequences, due to general algorithmic improvements.

  • Feature: Drag&drop box for fasta files now supports multiple files. The files are concatenated in this case.

  • Feature: Sequence view and peptide views now show insertions. They are denoted as purple triangles.

  • Fix: Tree view now longer shows duplicate clade annotations

Input files

  • Fix: gene map GFF3 file now correctly accepts "gene" and "locus_tag" attributes. This should allow to use genome annotations from GeneBank with little or no modifications.

  • Feature: Nextclade now reads virus-specific alignment parameters from virus_properties.json file from the dataset. It is equivalent to passing alignment tweaks using command-line flags, but is more convenient. If a parameter is provided in both virus_properties.json and as a flag, then the flag takes precedence.

Nextclade CLI

  • Feature: BREAKING CHANGE Command-line interface was redesigned to make it more consistent and ergonomic. The following invocation should be sufficient for most users:

    nextclade run --input-dataset=dataset/ --output-all=out/ sequences.fasta

    short version:

    nextclade run -D dataset/  -O out/ sequences.fasta
    • Nextalign CLI and Nextclade CLI now require a command as the first argument. To reproduce the behavior of Nextclade v1, use nextalign run instead of nextalign and nextclade run instead of nextclade. See nextalign --help or nextclade --help for the full list of commands. Each command has it own --help menu, e.g. nextclade run --help.

    • --input-fasta flag is removed in favor of providing input sequence file names as positional arguments. Multiple input fasta files can be provided. Different compression formats are allowed:

      nextclade run -D dataset/ -O out/ 1.fasta 2.fasta.gz 3.fasta.xz 4.fasta.bz2 5.fasta.zst
    • If no fasta files provided, it will be read from standard input (stdin). Reading from stdin does not support compression.

    • If a special filename (-) is provided for one of the individual output file flags (--output-*), the corresponded output will be printed to standard output (stdout). This allows integration into Unix-style pipelines. For example:

      curl $fasta_gz_url | gzip -cd | nextclade run -D dataset/ --output-tsv=- | my_nextclade_tsv_processor
      
      xzcat *.fasta.xz | nextalign run -r ref.fasta -m genemap.gff -o - | process_aligned_fasta
    • The flag --output-all (-O) replaces --output-dir flag and allows to conveniently output all files with a single flag.

    • The new flag --output-selection allows to restrict what's being output by the --output-all flag.

    • If the --output-basename flag is not provided, the base name of output files will default to "nextclade" or "nextalign" respectively for Nextclade CLI and Nextalign CLI. They will no longer attempt to guess base file name from the input fasta.

    • The new flag --output-translations is a dedicated flag to provide a file path template which will be used to output translated gene fasta files. This flag accepts a template string with a template variable {gene}, which will be substituted with a gene name. Each gene therefore receives it's own path. Additionally, the translations are now independent from output directory and can be omitted if they are not necessary.

    Example:

    If the following is provided:

    --output-translations='output_dir/gene_{gene}.translation.fasta'

    then for SARS-CoV-2 Nextclade will write the following files:

    output_dir/gene_ORF1a.translation.fasta
    output_dir/gene_ORF1b.translation.fasta
    ...
    output_dir/gene_S.translation.fasta
    

    Make sure you properly quote and/or escape the curly braces in the variable {gene}, so that your shell, programming language or pipeline manager does not attempt to substitute the variable.

  • Feature: New --excess-bandwidth, --terminal-bandwidth, --min-match-rate, --retry-reverse-complement arguments are added (see "Alignment algorithm rewritten with adaptive bands" section for details)

  • Feature: Nextclade CLI and Nextalign CLI now accept compressed input files. If a compressed fasta file is provided, it will be transparently decompressed. Supported compression formats: gz, bz2, xz, zstd. Decompressor is chosen based on file extension.

  • Feature: Nextclade CLI and Nextalign CLI can now write compressed output files. If output path contains one of the supported file extensions, it will be transparently compressed. Supported compression formats: gz, bz2, xz, zstd.

  • Feature: Nextclade can now write outputs in newline-delimited JSON format . Use --output-ndjson flag for that. NDJSON output is equivalent to JSON output, but is not hierarchical, so it can be easily streamed and parsed one entry at a time.

  • Feature: Nextclade dataset get and dataset list commands now can fetch dataset index from a custom server. The root URL of the dataset server can be set using --server=<URL> flag.

  • Feature: Nextclade dataset get command can output downloaded dataset in the form of a zip archive, using --output-zip flag. The dataset zip is simply the dataset directory, but compressed, and it can be used as a replacement in the --input-dataset flag of the run command.

  • Feature: Nextalign CLI and Nextclade CLI provide a command for generating shell completions: see nextclade completions --help for details.

  • Feature: Verbosity of can be tuned using wither --verbosity=<severity> flag or one or multiple occurences of -v and -q flags. By default Nextclade and Nextalign show messages with severity "warn" or above (i.e. only warning and errors). Flag -v increases and flag -q decreases verbosity one step, -vv and -qq - two steps, etc.

Feedback

If you found a bug or have a suggestion, feel free to:

We hope you enjoy using Nextclade 2.0.0 as much as we enjoyed building it!

Commit history

(click to expand)
  • [2a150f7] fix(web): forbid entering /tree page on reload

The tree page is crashing without its inputs and users have no business navigating there themselves or reloading the page there, so let's' redirect them to /.

In fact, let's redirect from anything that's not /.

But not in dev mode.

  • [cf5fada] Merge remote-tracking branch 'origin/master' into fix/web-reload-tree-crash

  • [b778006] docs: adjust user docs for v2

  • [cf4b143] docs: remove datasets-local file

This is now described in dev docs

  • [03e8d4b] docs: simplify readme

  • [c36eab8] fix(cli): correct template param in translations help example

  • [a067625] Merge pull request #893 from nextstrain/fix/cli-translations-help-example

  • [42bcb6b] docs: correct install instructions for v2

  • [00542bb] docs: correct csv column descriptions

  • [b44bd5b] docs: remove warning about windows binaries

They are now available

  • [2f272d3] docs: add a note about gnu vs musl

  • [a212544] docs: updated alignment algo description

  • [015bc29] fix(cli): avoid errors when certain subcommands are invoked

Resolves #899

This consolidates subcommand handling on one place and prevents inconsistencies when certain subommands need to be handled in 2 places but it's not done, causing a crash.

  • [348e65c] Merge pull request #902 from nextstrain/fix/cli-subcommand-handling

  • [887f862] docs: fix url

  • [36dd0c5] docs: fix windows cli commands in docs

  • [f923f1c] Merge pull request #894 from nextstrain/docs/user-docs-v2

  • [0f7c523] Merge pull request #863 from nextstrain/fix/web-reload-tree-crash

  • [16d6b07] chore: remove cargo update from release script [skip ci]

  • [27aba89] chore: release cli 2.0.0