Release MMseqs2 Release 10-6d92c · soedinglab/MMseqs2

At a glance: The MMseqs2 command line interface is cleaner and validates user input. Many MMseqs2 modules use less memory and run faster.

Known Issues

High sensitivity searches (higher than -s 6) with precomputed indices should fail. Pass --db-load-mode 3 as a workaround to the MMseqs2 call.

Breaking Changes

Default taxonomy mode is assigning the same taxonomic label as the top hit. The previous "approximate 2bLCA" mode can be used with --lca-mode 3 or the non-approximated 2bLCA with --lca-mode 2
MMseqs2 will refuse to compile on compilers without OpenMP support (Use -DREQUIRE_OPENMP=0 to force a single-threaded no OpenMP build)
The confusingly named (and probably non-functional) --global-alignment parameter is gone
File names of the latest precompiled binaries changed. All archives contain a copy of the user guide and the MMseqs2 binary in the same subfolder (see further down for binaries of release 10-6d92c):

SIMD	Linux	macOS	Windows
SSE4.1	mmseqs-linux-sse41.tar.gz	mmseqs-osx-sse41.tar.gz	mmseqs-win64.zip
AVX2	mmseqs-linux-avx2.tar.gz	mmseqs-osx-avx2.tar.gz	-

Known Issues

MMseqs2 on Windows seems to not scale well on multiple threads
MMseqs2 on Windows can crash when built with AVX2 support (mostly on VMs)

Features

createindex can precompute split indices to improve runtime when searching against a database that is larger than the system memory. Precomputed databases also require less overhead RAM, since only the required parts are loaded
easy-search, easy-taxonomy, easy-linclust and easy-cluster workflows can take any number of query FASTA or FASTQ files
MMseqs2 validates database types. It will exit with an error message on wrong input, where it would previously crash
kmermatcher reports the diagonal with the most k-mer matches
kmermatcher scales the number of k-mers with sequence length (--kmer-per-seq-scale)
rescorediagonal got two new rescore modes, one for global alignment scoring and one for scoring a quasi global alignment fullfilling a local window criterion
Peak memory usage for reading in very large databases is greatly reduced. 128GB nodes should comfortably be able to deal with up to the maximum of 4.2 billion entries
Parameters taking byte values support syntax with a SI suffix (e.g., --split-memory-limit 64G)
Nucleotide substitution matrices should be user definable
Taxonomy report is compatible with Pavian. Thanks to Florian Breitwieser!
cluster workflow learned a reassignment mode --cluster-reassign. This mode corrects errors that occured because of cascaded clustering
extractorfs can directly translate a nucleotide ORF to an amino acid sequence
result2stats can write TSV files
createsubdb supports softlinks instead of always hard copying the whole file to disk
reduced harddisk space usage for all cascaded clusterings
easy-taxonomy reports the top hit alignment as a separate output file with the suffix tophit_aln
createindex checks if an index needs to be recomputed were improved

Bug fixes

MMseqs2 did not compile on FreeBSD. Please let us know about free continuous integration options to make sure it will keep working in the future
proteinaln2nucl could return wrong coordinates
apply would deadlock when running with multiple threads
MPI searches are way more reliable, there were various issues around merging the separate results. MPI logic of split and merge is also integrated into the regression tests suite
prefilter splits nucleotide searches if not enough memory is available
kmermatcher could corrupt memory
rescorediagonal could produce wrong sequence identities when aligning mixed-case sequences
macOS builds were not actually static (still dynamically link libsystem however)
lca module could corrupt memory and crash
createdb does not crash on systems with only 4GB of RAM anymore
AVX2 and SSE4.1 builds could produce slightly different results
summarizeresults does not crash on empty alignments results anymore
fix wrong tophit_report in easy-taxonomy
Precompiled Windows builds were broken
Precomputed indices of databases with very short sequences could truncate alignments if the query sequences were longer

Developers

Tools using MMseqs2 as a framework do not need to export MMseqs2 modules again anymore
MMseqs2 uses Azure Pipelines for all platforms to run our regression tests suite and provide precompiled binaries
MMseqs2 runs under ASan without any issues. We fixed various small memory leaks

The regression suite is directly linked through a submodule

It can be used by running:

git submodule update --init
./util/regression/run_regression.sh $PATH_TO_MMSEQS/mmseqs $TMP_DIR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMseqs2 Release 10-6d92c

Known Issues

Breaking Changes

Known Issues

Features

Bug fixes

Developers