Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 28 million developers.Sign up
This is a minor update:
- added ability to store full sequence ID in binary format (previously this was only supported with fasta inputs)
- improve allocation strategy for repeat filtering so large sets of filtered k-mers don't cause long startup times
- Added repeat aggression flag (--repeat-weight) that controls how aggressively the transition is from no suppression to maximum suppression.
- Added an option to also supress rare k-mers (--supress-noise), defined as k-mers not listed in the k-mer filter file (-f).
- Various bugs
- Up to 10X speedup (5x average).
- Second-stage filter is now a bottom-sketch rather than random sampling, improving memory usage and speed.
- Distance (1-identity) is reported from Jaccard score using the mash distance
- Complete switch to fastutil collections API for speed/memory improvements.
- Maven build system to consolidate into single jar and remove lib directory dependency.
- Code cleanup.
- Bug fixes.
- Improved weighting (discretized td-idf) in first-stage filter.
- Code cleanup, leading to 15% speedup
- Support for bzip2/gzip input files
- Bug fixes
- Eliminate repetitive k-mer filtering in index lookup, why filter k-mers when you can down-weight them.
- Increased performance of ordered k-mer second stage filter.
- Implemented weighted (discretized td-idf) MinHashing in first-stage filter.
- Random subsampling in second-stage filter.
- k-mer size is now unlimited.
- Reduced memory footprint and disk footprint of binary sketch representation, allowing a larger set of sequences to fit in memory.
- If no repeat k-mer filter is specified, MHAP will use an experimental implementation of a count-min sketch to identify repeat k-mers and down-weight them. This option has not been full tested and may not always work. Users should always specify a filter file using the -f option.
Please see documentation at http://mhap.readthedocs.org/en/