Skip to content

Releases: knights-lab/BURST

BURST v1.00

30 Apr 17:37
1dc5639
Compare
Choose a tag to compare

This is BURST v1.00.

Although this looks like a "final" version number, it is really virtually the same thing as v0.99.8 special edition PT 2.5 (yes, I know...) with no major changes.

However, minor code cleanup and commenting, as well as various minor fixes potentially got in there in the meantime.

The attached binaries work well on Linux and Windows. Someone with a Mac should make a (preferably static) compile for Mac and ping me to put it up here. 👍

Enjoy!

BURST v0.99.8

14 Jun 18:35
c2c9fef
Compare
Choose a tag to compare

New:

  • First release in the 0.99.8 series.
  • A general refinement release with misc. performance/stability improvements

BURST v0.99.7f

23 Nov 06:26
e1228f0
Compare
Choose a tag to compare

New:

  • Old prepass mode (-p) repealed and replaced; see below
  • You may add heuristics to normal alignment engine with -hr (the purpose is to enable lower identities without falling back to the ultra-slow mode for shotgun DBs; identities >=95% are still guaranteed optimal)
  • Entirely new and separate heuristic alignment mode with -p (all bets are off on optimality)
  • -p can take an integer "effort" parameter. Default is 16. Sensible values from 1-999
  • -p can combine with -hr to make things even more heuristic (yuck!)
  • Some modest general speedups in default optimal modes, too
  • [0.99.7a] N's now penalized in query as well as reference (improved simulation results)
  • [0.99.7a] Improved building of accelerators with ambiguities (less "bad" when n penalized)
  • [0.99.7a] --taxacut (-bc) now accepts floating point "confidence" as parameter ([0-1]) as well
  • [0.99.7b] New database formats now produced; no conversion step
  • [0.99.7d] Miscellaneous fixes
  • [0.99.7f] Added ability to suppress the progress indicator

Binaries for old systems (pre-AVX) or Macs are available on request (if they don't come soon).

BURST v0.99.5a

26 Oct 17:25
Compare
Choose a tag to compare

New:

  • Miscellaneous fixes
  • New CUBICLUST clustering engine (disabled by default; must build with -D CUBICLUST to enable). Much higher quality clusters in much less RAM, but very time-consuming.
  • New alignment mode: "ANY". This introduces a new guarantee: if any valid match above the desired threshold exists in the database, report any one such match, even if not a best or minimizing match. Slightly faster for contaminant screening.
  • Compressive database generation: Now reduces database size when many redundant genomes are present by dynamically adjusting shearing points and deduplicating repeats. This is a lossless abstraction -- all information about location and original references is retained, and no alignments are jeapordized. Additionally, this mode is faster than previous sheared DB methods if run without fingerprints/clustering. Enable with -d DNA -s (shearing is required for this mode).
  • The --dbpartition flag has been introduced to save memory when using compressive genomics databases. Using N partitions will consume ~1/N the RAM but at the expense of some duplication detection (the search range is limited to within partitions). Alignment quality is unaffected.

Binaries for old systems (pre-AVX) or Macs are available on request (if they don't come soon).

BURST v0.99.4a

31 Jul 18:49
Compare
Choose a tag to compare
BURST v0.99.4a Pre-release
Pre-release

New:

  • More robust fingerprint clustering engine (less ram, faster alignments for large DBs)
  • Changed defaults:
    • --npenalize now default, disable with --nwildcard
    • CAPITALIST is default alignment mode
    • Some debug-centric options removed
    • Still compatible with old commandlines

Note: this version has not been extensively tested for making DBs. Alignment with existing DBs should be fine.

BURST v0.99.3d

13 Jun 00:43
Compare
Choose a tag to compare
BURST v0.99.3d Pre-release
Pre-release

New: Now comes in 2 flavors: DB15 and DB12. The '15' flavor creates and uses accelerators intended for very huge databases (think gigabytes) while the '12' flavor creates and uses accelerators suited to smaller (think megabytes) databases.

15 database:
Pros: faster searches, up to 5x faster than DB12 on shotgun.
Cons: larger accelerator size (~4GB larger than DB12), only works for identities ~95% and up

12 database:
Pros: smaller accelerator size, works for identities of 93.5% and up. Great for amplicons
Cons: slower searches (especially on shotgun)

Aside from their respective DB12 or DB15 accelerator formats, everything else is interchangeable between the two variants and both behave identically when used without accelerators.

New in 0.99.3:

  • Much faster searches with accelerator
  • New compressed database and accelerator formats (edx and acx which replace edb and acc respectively)
  • Improvements to output in CAPITALIST mode (some better minpath picks)
  • Much faster query parser and preprocessing for huge shotgun runs
  • Much reduced memory use
  • Introduced "--skipambig" parameter to automatically exclude query sequences with 5 or more ambiguous bases from analysis (serves both QC and speedup role; recommended in most cases)
  • Fixed: mixed memory model for ambiguous queries
  • Fixed: multiline loose-format fasta parser for references (queries still only accept linearized fasta)

Binaries (aka "executables" aka the program itself that you run):

  • Windows and Linux users can use the same binaries now. Windows users must have Windows 10 Creators Update and use the native WSL bash environment.

Older/custom editions may be available upon request (different DB accelerator formats for lower id/smaller accelerators, native Win64 .exe binaries, binaries for older computer processors, etc).

EMBALMER v0.99.2

02 Mar 20:00
Compare
Choose a tag to compare
EMBALMER v0.99.2 Pre-release
Pre-release

Major new release. As before, download the 'embalm' program appropriate for your computer below.

  • ACCELERATOR support (-a). Order of magnitude or greater speedup on large shotgun databases.
  • clustradius now supports % similarity in cluster generation rather than fixed distance, interpreted as -cr (value - 1000 / 1000)
  • More binaries coming soon.
  • New RefSeq 16S database included (has fasta file, edb file, accelerator file, tree file, and taxonomy file in QIIME-compatible format):
    Example run (after extracting the RS170301.zip file), calling taxonomy down to 93.5% identity (genus-level cutoff):
    embalm -r RS170301/RS170301.edb -a RS170301/RS170301.acc -b RS170301/RS170301.tax -f -i 0.935 -bs -m CAPITALIST -n -o myaligns.b6 -q myqueries.fa

Note that this is an early release in the .99.2 branch. Some of the new features may lack polish.

EMBALMER v0.99.1c

11 Feb 05:48
Compare
Choose a tag to compare
EMBALMER v0.99.1c Pre-release
Pre-release

Binaries for Linux, Mac, Windows available.

  • "Modern" computers are generally newer than 2011. They should have "AVX." These are the recommended versions (the filenames are uncontaminated with extra stuff).
  • "Older" computers are circa 2009 (the very first core-i7's). They should have "SSE4.2."
  • "Buzzard" computers are, well... newer than 2006 like the first "cores." They should have "SSSE3."
    Even older computer? The code can compile for any 64-bit processor, but you're on your own (or contact me).

The RefSeq archive is a 16S microbial database for embalmer. EMBALMER also works with plain FASTA files as references. Also included is a taxonomy file for the same RefSeq 16S genes. You should extract the database before using.

To see some of embalmer's new taxonomy interpolation features, you can use the DB and taxonomy file like so on some bacterial 16S reads to align all of the reads to all of their best hits and call full interpolated taxonomy for each:

embalm -r RefSeq_16S.edb -q MY_FASTA_QUERIES.fa -o alignments_and_taxonomy.b6 -f -n -m CAPITALIST -b RefSeq_16S.tax -bs -i 0 -p fast

For a more speculative taxonomy, you can try:

embalm -r RefSeq_16S.edb -q MY_FASTA_QUERIES.fa -o alignments_and_taxonomy.b6 -f -n -m CAPITALIST -b RefSeq_16S.tax -bc 3 -i 0 -p fast

EMBALMER v0.99.1b

04 Feb 00:52
Compare
Choose a tag to compare
EMBALMER v0.99.1b Pre-release
Pre-release

See f12519a

Binaries for Linux, Mac, Windows available.

  • "Modern" computers are generally newer than 2011. They should have "AVX." These are the recommended versions (the filenames are uncontaminated with extra stuff).
  • "Older" computers are circa 2009 (the very first core-i7's). They should have "SSE4.2."
  • "Buzzard" computers are, well... newer than 2006 like the first "cores." They should have "SSSE3."
  • Even older computer? The code can compile for any 64-bit processor, but you're on your own (or contact me).

The GG_97 is a database for embalmer. You can also use plain multi-fastas without making a database. Also included is a taxonomy file. You should extract the database before using.

To see some of embalmer's features, you can use the DB and taxonomy file like so on some bacterial 16S reads:
embalm -r GG_97_S320_id97.edb -q MY_FASTA_QUERIES.fa -o alignments_tax_clustered.b6 -f -n -m CAPITALIST -b taxonomy.txt