Skip to content

@martin-steinegger martin-steinegger released this Sep 1, 2020 · 36 commits to master since this release

Breaking changes

  • Remove --add-internal-id parameter from result2msa
  • filterdb --shuffle is now randomly instead of deterministically shuffled
  • Taxonomy expressions in filtertax(seq)db interpret , as || now #320
  • convertalis pident output field now correctly reports percentage (0-100) sequence identity instead of fraction (0.00-1.00), use fident to print the fraction instead

Features

  • Support nucleotide clustering in cluster and easy-cluster
  • Support other architectures (SSE2/ARM64/POWER8/POWER9/etc) through SIMDe
  • Linclust is much faster on systems with a lot of CPU cores
  • Clustering update is faster, more stable and correctly deals with deleted sequences #272
  • Add easy workflow for reciprocal best hit searches easy-rbh
  • Add SILVA, Pfam-B, dbCAN2 to databases
  • databases produces taxonomy information for NR
  • Replace old greedy incremental clustering with new memory efficient version
  • Add result2dnamsa module to create MSAs of nucleotide sequences
  • Continued progress on profile-profile searching (result2pp,expandaln,expand2profile) , stay tuned!
  • Add multi-parameter to support to overwrite sequence type specific parameters: e.g. --gap-open "nucl:5,aa:11"
  • Add ORF information as output options to convertalis (qOrfStart/qOrfEnd, dbOrfStart, dbOrfEnd)
  • Speed up sorting using ips4o
  • Speed up masking through new version of tantan
  • Speed up multi-threaded writing of clustering results
  • Speed up reading of database indices and merging target split databases
  • Add memory tracking to account for index size when computing available memory (--split-memory-limit should be more reliable when searching/clustering billions of sequences).
  • Add --search-type 4 (translated/translated search) to createindex
  • Add convertalis --format-mode 3 HTML output based on MMseqs2 app (app.mmseqs.com)
  • Improve memory management in result2msa and result2profile modules
  • Add msa2result module to create an alignment result db from MSAs
  • Add filterresult to slim down result dbs with pairwise HHblits filtering #316
  • Add --kmers-per-sequence-scale to linsearch to extract a k-mer fraction instead of a fixed count
  • Add a random integer to --local-tmp path to avoid race conditions if multiple MMseqs2 happen on the same machine
  • Add --max-seqs to ungappedprefilter
  • Add --tax-lineage-mode 2 parameter to print numeric taxids

Bugs fixed

  • rbh workflow was broken due to issues with filterdb
  • Fix -a in RBH search to show alignments
  • Fix PDB70 database creation in databases
  • Fix aria2c download support
  • Fix memory issues and MPI in kmermatcher
  • Fix memory issues in extractorfs when using AVX2
  • Fix --cluster-reassign to respect --cov-mode
  • Set-cover supports up to 2^32 sequences (previously crashed with more than 2^31)
  • Exit correctly if there is not have enough disk space instead of crashing in the next module
  • Fix prefilter order instability when searching very redundant databases
  • Correctly parse keys from data files in filterdb --filter-file, this was causing instability in linsearch
  • Allow overwriting string parameters with empty strings
  • Fix ASAN issue in extractorf when using AVX2
  • Microtar would try to seek backwards constantly resulting in horrible gzip read performance
  • Avoid lookup writing to corrupt memory if an accession is too long
  • Fix various inconsistencies and usability issues in alignall:
    • --alignment-mode inconsistent with align module
    • --add-backtrace did not do anything
  • Fix restart of clusterings using reassignment cluster --cluster-reassign
  • Fix createdb did not correctly read gz/bzip files with --createdb-mode 1 #323
Assets 13

@martin-steinegger martin-steinegger released this Feb 11, 2020 · 323 commits to master since this release

At a glance: The MMseqs2 command line interface is cleaner and validates user input. Many MMseqs2 modules use less memory and run faster. The new databases module helps to download and setup database. We now have a chat support at chat.mmseqs.com.

Known Issues

  • rbh crashes due to invalid sorting mode (#290)
  • Homebrew's macOS version does not use multiple cores (#289)
  • prefilter results can be unstable between different runs for extremely redundant databases (#277)
  • linclust/cluster can crash for very small input sets (#274)

Breaking Changes

  • kmermatcher --skip-n-repeat-kmer parameter was replaced with --ignore-multi-kmer
    Does not discard whole sequences anymore if a k-mer occured to often, instead it skips the specific k-mers.
    Either mode is only used in Plass and not in Linclust
  • --lca-ranks from (easy-)taxonomy and lca has to be delimited with semicolons (;) instead of colons (:)
  • --dont-shuffle flag was renamed to --shuffle true/false

Features

  • new databases workflow to list and download common databases.
    Supported databases:
  Name                	Type      	Taxonomy	Url
- UniRef100           	Aminoacid 	     yes	https://www.uniprot.org/help/uniref
- UniRef90            	Aminoacid 	     yes	https://www.uniprot.org/help/uniref
- UniRef50            	Aminoacid 	     yes	https://www.uniprot.org/help/uniref
- UniProtKB           	Aminoacid 	     yes	https://www.uniprot.org/help/uniprotkb
- UniProtKB/TrEMBL    	Aminoacid 	     yes	https://www.uniprot.org/help/uniprotkb
- UniProtKB/Swiss-Prot	Aminoacid 	     yes	https://uniprot.org
- NR                  	Aminoacid 	       -	https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA
- NT                  	Nucleotide	       -	https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA
- PDB                 	Aminoacid 	       -	https://www.rcsb.org
- PDB70               	Profile   	       -	https://github.com/soedinglab/hh-suite
- Pfam-A.full         	Profile   	       -	https://pfam.xfam.org
- Pfam-A.seed         	Profile   	       -	https://pfam.xfam.org
- eggNOG              	Profile   	       -	http://eggnog5.embl.de
- Resfinder           	Nucleotide	       -	https://cge.cbs.dtu.dk/services/ResFinder
- Kalamari            	Nucleotide	     yes	https://github.com/lskatz/Kalamari
  • (easy-)search --slice-search is now usable. Slice search finds all hits that fulfill the alignment criteria while using only as much disk space as defined by --disk-space-limit
  • createdb and the various easy- workflows learned to read query input from STDIN
  • taxonomyreport learned to display the summarized taxonomy result with Krona
  • new filtertaxseqdb module for filtering sequence DBs with taxonomy information according to provided taxa
  • --taxon-list parameter understands expressions. E.g. get all bacterial and human sequences --taxon-list "2||9606"
  • easy-search and convertalis can now output taxonomic information using --format-output
taxid      Taxonomic identifier
taxname    Taxon Name
taxlineage Taxonomic lineage
  • speed up in (easy-)cluster/linclust by improving k-mer extraction
  • MMseqs2 consistently creates .source and .lookup files to match from which input file a sequence came from
    E.g.: mmseqs createdb input1.fa input2.fa seqDB each sequence in seqDB can tell if it came from input1.fa or input2.fa
  • createdb learned to index an existing (single-line-seq per entry) FASTA file without copying the FASTA content to a new database
  • align and rescorediagonal learned to align circular sequences
  • align exposes the z-drop parameter of its Banded Nucleotide alignment algorithm
  • reverseseq learned to reverse profiles
  • filterdb can filter rows with value within given percentage of first row
  • new aggragatetax module to assign a taxonomic label to contigs according to the fragments matched on the contig
  • Adjusting --max-seq-len is not required anymore, MMseqs2 automatically increases the length now.
  • MMseqs2 on Cygwin/Windows uses nedmalloc as its memory allocator now and does not massively slow down due to lock contention
  • new tar2db module to efficiently transform content of tar archives to MMseqs2 databases

Bug fixes

  • createindex would create corrupted indices for profile target databases
  • rbh workflow would create its result DB at an unexpected (wrong) location
  • (easy)-taxonomy --lca-mode 3 (Approx. LCA) was aligning invalid sequences in the second iteration and producing bad results
  • lca (and (easy)-taxonomy) add empty columns for unclassifed sequences to be valid TSVs
  • kmermatcher uses xxhash for hashing now (faster)
  • kmermatcher avoid crash machine has not enough memory to process data at once (affects linclust/cluster)
  • kmermatcher correctly deals with sequences longer than MAX_SHRT now
  • kmermatcher fixed various edge cases (e.g. alignment of 1-char sequences)
  • kmermatcher hash-shift would be ignored
  • offsetalignment could produce wrong results in the minus-strand
  • clust now correctly and consistently handles alignment DB input
  • clusthash better deals with nucleotide input now and several multi-threaded inefficiencies were resolved
  • (easy-)cluster --single-step-clustering could cluster unrelated sequences due to hash collisions
  • prefilter --diag-score 0 respects --min-ungapped-score
  • createseqfiledb could print empty sequence lines
  • taxonomyreport could crash if no sequence was unclassified
  • result2flat could crash with long sequence input
  • result2msa, result2profile, msa2profile backport filtering fix from HHblits
  • align could produce bad alignments if all sequence lenghts in query DB where a lot shorter than in target DB
  • splitsequence fix issues with splitsequence if combined with compressed
  • result2profile fix Filter2 bug of HH-suite in MMseqs2
  • apply would crash due to reading wrong entry lengths
  • filterdb --filter-expression was not thread safe and could corrupt results
  • filterdb --extract-lines and --trim-to-one-column are compatible with each other

Developers

  • Internal representation of sequences changed from 4-byte per character to 1-byte per character
  • Compilation under AppleClang + libomp works now (see util/build_osx.sh)
  • Tools inheriting from MMseqs2 can now add their own citations
  • MMseqs2 on macOS compiles with the macOS 10.9 SDK (removed symlinkat call; relevant for bioconda)
Assets 8

@martin-steinegger martin-steinegger released this Aug 23, 2019 · 631 commits to master since this release

At a glance: The MMseqs2 command line interface is cleaner and validates user input. Many MMseqs2 modules use less memory and run faster.

Known Issues

  • High sensitivity searches (higher than -s 6) with precomputed indices should fail. Pass --db-load-mode 3 as a workaround to the MMseqs2 call.

Breaking Changes

  • Default taxonomy mode is assigning the same taxonomic label as the top hit. The previous "approximate 2bLCA" mode can be used with --lca-mode 3 or the non-approximated 2bLCA with --lca-mode 2
  • MMseqs2 will refuse to compile on compilers without OpenMP support (Use -DREQUIRE_OPENMP=0 to force a single-threaded no OpenMP build)
  • The confusingly named (and probably non-functional) --global-alignment parameter is gone
  • File names of the latest precompiled binaries changed. All archives contain a copy of the user guide and the MMseqs2 binary in the same subfolder (see further down for binaries of release 10-6d92c):
SIMD Linux macOS Windows
SSE4.1 mmseqs-linux-sse41.tar.gz mmseqs-osx-sse41.tar.gz mmseqs-win64.zip
AVX2 mmseqs-linux-avx2.tar.gz mmseqs-osx-avx2.tar.gz -

Known Issues

  • MMseqs2 on Windows seems to not scale well on multiple threads
  • MMseqs2 on Windows can crash when built with AVX2 support (mostly on VMs)

Features

  • createindex can precompute split indices to improve runtime when searching against a database that is larger than the system memory. Precomputed databases also require less overhead RAM, since only the required parts are loaded
  • easy-search, easy-taxonomy, easy-linclust and easy-cluster workflows can take any number of query FASTA or FASTQ files
  • MMseqs2 validates database types. It will exit with an error message on wrong input, where it would previously crash
  • kmermatcher reports the diagonal with the most k-mer matches
  • kmermatcher scales the number of k-mers with sequence length (--kmer-per-seq-scale)
  • rescorediagonal got two new rescore modes, one for global alignment scoring and one for scoring a quasi global alignment fullfilling a local window criterion
  • Peak memory usage for reading in very large databases is greatly reduced. 128GB nodes should comfortably be able to deal with up to the maximum of 4.2 billion entries
  • Parameters taking byte values support syntax with a SI suffix (e.g., --split-memory-limit 64G)
  • Nucleotide substitution matrices should be user definable
  • Taxonomy report is compatible with Pavian. Thanks to Florian Breitwieser!
  • cluster workflow learned a reassignment mode --cluster-reassign. This mode corrects errors that occured because of cascaded clustering
  • extractorfs can directly translate a nucleotide ORF to an amino acid sequence
  • result2stats can write TSV files
  • createsubdb supports softlinks instead of always hard copying the whole file to disk
  • reduced harddisk space usage for all cascaded clusterings
  • easy-taxonomy reports the top hit alignment as a separate output file with the suffix tophit_aln
  • createindex checks if an index needs to be recomputed were improved

Bug fixes

  • MMseqs2 did not compile on FreeBSD. Please let us know about free continuous integration options to make sure it will keep working in the future
  • proteinaln2nucl could return wrong coordinates
  • apply would deadlock when running with multiple threads
  • MPI searches are way more reliable, there were various issues around merging the separate results. MPI logic of split and merge is also integrated into the regression tests suite
  • prefilter splits nucleotide searches if not enough memory is available
  • kmermatcher could corrupt memory
  • rescorediagonal could produce wrong sequence identities when aligning mixed-case sequences
  • macOS builds were not actually static (still dynamically link libsystem however)
  • lca module could corrupt memory and crash
  • createdb does not crash on systems with only 4GB of RAM anymore
  • AVX2 and SSE4.1 builds could produce slightly different results
  • summarizeresults does not crash on empty alignments results anymore
  • fix wrong tophit_report in easy-taxonomy
  • Precompiled Windows builds were broken
  • Precomputed indices of databases with very short sequences could truncate alignments if the query sequences were longer

Developers

  • Tools using MMseqs2 as a framework do not need to export MMseqs2 modules again anymore

  • MMseqs2 uses Azure Pipelines for all platforms to run our regression tests suite and provide precompiled binaries

  • MMseqs2 runs under ASan without any issues. We fixed various small memory leaks

  • The regression suite is directly linked through a submodule

    It can be used by running:

    git submodule update --init
    ./util/regression/run_regression.sh $PATH_TO_MMSEQS/mmseqs $TMP_DIR
    
Assets 8

@martin-steinegger martin-steinegger released this May 4, 2019 · 921 commits to master since this release

At a glance: Improved taxonomy, add colors to user output, improve computation progress bar, small speed ups and many bug fixes

Features

  • Add support for Kraken style taxonomy reports. Thanks to Florian Breitwieser
  • New easy-taxonomy workflow
  • New progress bar to reduce output
  • Colored errors and warnings

Bugs

Assets 8
  • 8-fac81
  • fac81fa
  • Compare
    Choose a tag to compare
    Search for a tag
  • 8-fac81
  • fac81fa
  • Compare
    Choose a tag to compare
    Search for a tag

@martin-steinegger martin-steinegger released this Apr 1, 2019 · 1058 commits to master since this release

At a glance: Faster searches and clustering through improved IO and better seeding. More search modes like tblastx, reciprocal best hit and linsearch. New output format SAM. Support for compressed databases to reduce hard disk and memory requirements.

Known Issues

  • Iterative search only works up to 2 iterations

Breaking Changes

  • MMseqs2 now saves a lot on IO by not merging result datafiles
    There is still a single .index file, but the corresponding data files are split into multiple parts (as many as threads were used previously)
  • MMseqs2 now uses the VTML80 [1,2] substitution matrix to speed up the prefiltering (changeable by --seed-sub-mat), the final alignment is still computed with the Blosum62 (still changeable by --sub-mat)
  • All databases have now a .dbtype file
  • MMseqs2 Docker image is now based on Debian instead of Alpine
  • Changed Orf header format to be more space efficent. The new format is now orignIdentifer startPos(-/+)len flag
  • prefilter returns ungapped-alignment scores instead of e-values
  • createindex the file extention is now .idx instead of the previous .[s]k[6,7] format

Features

  • Support for tblastx-style nucl-nucl translated searches
    mmseqs search nuclDB1 nuclDb2 aln tmp --search-mode 2
  • Support for nucleotide searches
    mmseqs search nuclDB1 nuclDb2 aln tmp --search-mode 3
  • convertalis has learned to return SAM formatted output (preview)
  • Database can be compressed by applying zstd on each entry (--compressed 1)
    • Also added compress and decompress modules
  • rbh workflow for reciprocal best hit searches added
  • linclust can now cluster nucleotide sequences on both forward and reverse strand
  • Added linsearch, a lightning fast search for proteins and nucleotide sequences (preview; easy workflow variant easy-linsearch also added)
  • createlinindex computes an index for linsearch
  • taxonomy uses --orf-start-mode 1 to annotate more sequences
  • Added approx. 2bLCA to speed up computation, this is now the new default. The old mode can be turned on by --lca-mode 2
  • createdb recognizes sequences containing Uracil as DNA sequences
  • createdb is now faster through speeding up its shuffle operations
  • view module to view single entry in an MMseqs2 database
  • align module has learned --min-aln-len parameter to filter by minimal alignment length
  • Alignment modules (rescorediagonal, align) can align longer sequences now (not limited to 2^15 length)
  • Input sequences can now be softmasked (lower letter masking) instead of only hard masking (replacing with X) ``--mask-lower-case. The masking only applies to the prefilter stages kmermatcher` or `prefilter` and can be combined with `--mask`
  • filterdb has learned --filter-expression parameter and mode that allows filtering by simple mathematical expressions
  • alignbykmer can be used for nucleotide searches
  • MMseqs2 did-you-mean functionality gives better suggestions
  • MMseqs2 does not repeat the whole parameter list for each submodule call anymore

Bugs

  • Default parameters of map workflow are now set correctly
  • Some modules were using the wrong coverage parameter
  • Sliced profile search was losing high E-value hits
  • Sliced profile search is now stable
  • Profile-Sequence alignment E-values where slightly too high
  • result2msa was crashing with profiles on the target side
  • result2msa should not crash with --alow-deletion anymore
  • Some parameters were never visible (with or without -h)
  • Various issues with MPI were resolved

Developers

  • Continous integration enforces no compile warnings now
  • Continous integration now tries to build AArch64 builds with Docker and Qemu
  • We added a first draft of our developer guide to the wiki

References

[1] Müller T & Martin Vingron, Modeling Amino Acid Replacement, J Comput Biol. 2000;7:761–76. doi: 10.1089/10665270050514918.

[2] Müller T, Spang R, Vingron M. Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol. 2002;19:8–13. doi: 10.1093/oxfordjournals.molbev.a003985

Assets 8

@martin-steinegger martin-steinegger released this Nov 29, 2018 · 1434 commits to master since this release

Changes since release 6-f5a1c

New features

  • Simplified taxonomy. We add tools the tools to create the taxonomical annotated database createtaxdb. It is possible to filter result databaese based on taxonomy with filtertaxdb and addtaxonomy to append taxonomy information to result databases
  • index (createindex) support for translated target databaes searches
  • add nucleotide search (experimental)
  • support NEON CPU architecture (experimental)
  • improve performance of prefilter if L2 is greater 256K
  • easy-search automatically computes backtrace if requested by --format-output
  • Create search-2m workflow, similiar to 2bLCA but without the LCA computation
  • We add a database preload mode. Database preload mode 0: auto, 1: fread, 2: mmap, 3: mmap+touch. The processing time per query with fread is 15% faster but the read in is slower. mmap is use for the MMseqs2 webserver, it enables instance searches if the database is already in memory, mmap+touch uses mmap an touches every page.
  • We add a new tool touchdb, it loads the database in memory. This can be useuful for "--db-load-mode 2.
  • add local hard disks support --local-tmp for MPI runs. This reduces pressure from the NFS
  • Introduce sortresult tool to sort an unordered sequence db (e.g. from mergeresult)
  • prefilter supports now indexes with k-mer ranges > 2^31
  • convertkb can read multiple files
  • speed up mmap memory touch function

breaking changes

  • new index version. Recomputation of old indexes in needed
  • --format-output is now comma separated
  • changed taxonomy database format, old taxonomy databaes are not supported anymore

default parameter change

  • extractorfs default is now --orf-start-mode 1. This is important for translated searches in organisms with introns.

Bug fixes

  • Fix wrong alignment positions for translated searches
  • Fix of by one error in extratalignedregion
  • Fix bug in NcbiTaxonomy tool
  • Fix e-value threshold if -e < --e-profile

Developer

  • Update to newest ALP version
Assets 8

@martin-steinegger martin-steinegger released this Oct 9, 2018 · 1602 commits to master since this release

Changes since release 5-9375b

New features

  • Support user defined output format in convertalis.
  • Add parameters for gap open and gap extension costs.
  • Improve substitution matrix support. Letters of alphabet can now be chose freely.
  • Add a few PAM matrices to the data folder. Chose them with the --sub-mat parameter.
  • Support IUPAC codes in translated search.
  • Add parameter to define a spaced k-mer pattern.
  • Add a new module ungappedprefilter. It computes an optimal ungapped score using a vectorized algorithm.

Bug fixes

  • Fix easy-linclust parameter parsing issue.
  • Fix coverage filtering in align when the parameter --realign is set.
  • Fix sequence identity computation in rescorediagonal --rescore-mode 2.
  • Fix apply MPI support.
  • Fix representative sequence output bug in result2repseq.
  • Fix possible MPI issues in modules creating symlinks.
  • Fix slightly wrong E-value computed in alignall module.

Known Issues

  • easy-search output has only one column. Workaround: Add parameter --format-output "".
Assets 8

@martin-steinegger martin-steinegger released this Sep 4, 2018 · 1671 commits to master since this release

Changes since release 4-0b8cc

Bug fixes

  • bool flag parameters (e.g. -a) work again
  • swapresults will deterministically rank results
  • shellcompletion does not report run time anymore
Assets 8

@martin-steinegger martin-steinegger released this Sep 4, 2018 · 1674 commits to master since this release

Changes since release 3-be8f6

New features

  • Alternative alignments in search (--alt-ali). Find alignments by masking out previously found regions in the target sequence.
  • Added map workflow for fast near-exact mapping of reads
  • Added easy-linclust workflow, that works on FASTA files
  • Sequence lengths longer than 32k are now supported (default sequence length limit is now 65535)
  • createdb shuffles the order of entries by default (--dont-shuffle to disable), useful for database splits, where one split could take much longer than others
  • linclust now supports MPI
  • linclust adds one hash for the whole sequence, to improve extract sequence matching
  • New sequence identity computation modes, where the normalization happens on the query or target length instead of alignment length
  • New --cov-mode that computes the coverage only based on sequence lengths (--cov-mode 3)
  • search/cluster/linclust workflows have learned --alignment-mode 4 for faster ungapped alignments
  • Translated search sorts now results by E-value and aggregates all ORFs under the corresponding contig identifier
  • prefiltering can now sort hits with score > 255 correctly
  • convertalis now works with profiles
  • Added generalized database transposition tool swapdb (swapresults only makes sense for prefiltering/alignment results)

Performance

  • Speedup extractorf with vectorization
  • Many performance improvements to reduce overhead for web server mode
  • createtsv writes output in parallel
  • Avoid many unnecessary memory allocations in various modules

Bug fixes

  • covertmsa does now correctly parses STOCKHOLM files without accession keys
  • In search when using splits less than --max-seqs sequences would be the limit, now correctly computes the limit (max-seqs/Splits + 4*sqrt(max_seqs/Splits))
  • Fix bug in MsaFilter where wrong sequences would be filtered
  • swapresults will add an empty entry if a target entry has no corresponding query match, instead of no entry at all
  • createindex creates now correctly creates a tmp directory if no directory exists already
  • Fix query split runs for small input databases
  • result2stats was reading the wrong first sequence (from query instead of target database)
  • result2repseq now writes the correct .dbtype file
  • convertalis now reads the correct dbtype for the target sequence
  • Fix empty REG_EMPTY bug on macOS
  • Fix possible memory corruption when searching against database indexed by 'createindex'
  • Report error if -DHAVE_MPI was set and MPI is not installed on the system
  • Avoid race condition in kmermatcher (invalid parallel writing to vector)
  • Fix msa2profile header output format
  • msa2profile uses the FASTA readin mode by default now
  • Target profile databases and databases build with --exact-kmer-matching now correctly extract all k-mers
  • Fix identical score computation of alignment if clustering using profiles
  • Nucleotide backtranslation translateaa would produce invalid codons for X

Others

  • removed --early-exit
  • Output name of program called

Experimental new modules

  • new fast alignment method alignbykmer

Developers

  • Cmake flag -DHAVE_GPROF for profiling MMseqs2 using gprof
  • Fixed most warnings
  • SSTR does not use stringstreams anymore
  • Refactored time measuring
  • Debug::INFO/WARNING/ERROR is now used consistently across the codebase
  • If available (shellcheck)[https://github.com/koalaman/shellcheck] will critique shell scripts and fail the compilation
Assets 8

@martin-steinegger martin-steinegger released this May 28, 2018 · 1898 commits to master since this release

Changes since 2-23394 Release

New Features

  • Create simple workflows fasta/fastq in flat file out for clustering easy-cluster and searching easy-search
  • Add a new clustering greedy incremental clustering algorithm to the clust module which needs less memory
  • Make the new low memory clustering algorithm default if --cov-mode 1 is used in linclust and cluster
  • Add alignall module for all-against-all alignments of e.g. clusters
  • Improved Windows support
  • filterdb learned new modes

Bug fixes

  • Fix wrong merging code in linclust
  • Fix e-value issues in target-split case
  • Fix seg. fault in rescore diagonal if 'z' is used
  • Fix seg. fault when using masking in kmermatcher
  • Fix wrong filterdb default mode
  • prefilter overestimated the required amount of memory and refused to run
  • prefilter scores would saturate to early, now they have the full 2^16 range

Others

  • Profile searches do create less high scoring false positive through better compositional bias correction and masking of low complexity regions of profiles
  • Clustering supports now the whole 2^32 range instead the previously 2^31
  • Speed up clustering when using --cov-mode 1
  • Rework symlinks to the header databaes
  • Support profiles on query and target side in result2profile
Assets 8
You can’t perform that action at this time.