Assets 3

This is a minor update:

  • added ability to store full sequence ID in binary format (previously this was only supported with fasta inputs)
  • improve allocation strategy for repeat filtering so large sets of filtered k-mers don't cause long startup times

@konstantinberlin konstantinberlin released this Oct 4, 2016 · 12 commits to master since this release

Assets 3

Fixes for incorrectly processed tf-idf flags.

@konstantinberlin konstantinberlin released this Jul 13, 2016 · 22 commits to master since this release

Assets 3

Changelog:

  • Added repeat aggression flag (--repeat-weight) that controls how aggressively the transition is from no suppression to maximum suppression.
  • Added an option to also supress rare k-mers (--supress-noise), defined as k-mers not listed in the k-mer filter file (-f).
  • Various bugs

@skoren skoren released this Mar 2, 2016 · 40 commits to master since this release

Assets 3

Changelog:

  • Up to 10X speedup (5x average).
  • Second-stage filter is now a bottom-sketch rather than random sampling, improving memory usage and speed.
  • Distance (1-identity) is reported from Jaccard score using the mash distance
  • Complete switch to fastutil collections API for speed/memory improvements.
  • Maven build system to consolidate into single jar and remove lib directory dependency.
  • Code cleanup.
  • Bug fixes.

@skoren skoren released this May 25, 2015 · 1 commit to 1.6 since this release

Assets 3

Changelog:

  • Improved weighting (discretized td-idf) in first-stage filter.
  • Code cleanup, leading to 15% speedup
  • Support for bzip2/gzip input files
  • Bug fixes

@skoren skoren released this Feb 23, 2015 · 9 commits to v1.5b1 since this release

Assets 3

Major updates:

  • Eliminate repetitive k-mer filtering in index lookup, why filter k-mers when you can down-weight them.
  • Increased performance of ordered k-mer second stage filter.

Changelog:

  • Implemented weighted (discretized td-idf) MinHashing in first-stage filter.
  • Random subsampling in second-stage filter.
  • k-mer size is now unlimited.
  • Reduced memory footprint and disk footprint of binary sketch representation, allowing a larger set of sequences to fit in memory.

Known Issues:

  • If no repeat k-mer filter is specified, MHAP will use an experimental implementation of a count-min sketch to identify repeat k-mers and down-weight them. This option has not been full tested and may not always work. Users should always specify a filter file using the -f option.

Please see documentation at http://mhap.readthedocs.org/en/

@skoren skoren released this Feb 23, 2015 · 1 commit to v1.0 since this release

Assets 3

Minor update. Changelog:

  • Fix issues #3 and #4
  • Update overlap validation in EstimateROC
  • Minor speed improvements

Please see documentation at http://mhap.readthedocs.org/en/

Pre-release
Pre-release

@skoren skoren released this Jul 13, 2014 · 130 commits to master since this release

Assets 3

First release of MHAP. Please see documentation at http://mhap.readthedocs.org/en/