Folddisco Release 2 at a glance: new sparse index format, E‑value statistics for hits, structure-similarity metrics with better sorting
and filtering, new `analyze` command, column‑based output formatting, and faster hashing/retrieval. Folddisco has its own Marv now and was published:
Breaking changes
- New unified index format. The old
SimpleHashMap/alloc indexing modes were replaced by a singleFolddiscoIndex(built on the newSparseIndex). Not compatible with older indices (cb099b5, 157497d, ac1e91e, 9dd06f1). - New hashes are limited to 30 bits, changing the on‑disk hash layout (21300eb).
- Default sort strategy is now IDF+RMSD (1a33f06, 4579cff).
New features and enhancements
- E‑value estimation for matches based on the IDF score, with per‑match IDF and multiple fitting functions (#32, d395c8a, c452eca, fdfeadb, 5ab3881, 3043123, de68e15) by @jwyoon05.
- Structure‑similarity metrics exposed to the CLI, enabling result filtering and metric‑aware sorting (16696a4, 408ffa9, a99e45a, 013f5c8).
- Column‑based TSV/CSV output formatting and updated match‑result header (88dc390, 1e115a9, f92169f).
mmaptype now handles index files larger than available memory (765aeeb).- Version handling via environment variable for build environments without git (716f25f).
- Two new binning schemes, including shift binning for higher sensitivity (217bb7d, d3700a8, c708dea).
- Improved foldcomp path resolution (95abc73, 8d7a776).
Performance
- Faster hashing with
FxHasher(23133c0). - Accelerated residue mapping (5f3796b).
- Residue retrieval checks distances first, then fetches features, and drops an unnecessary distance‑map copy (c6f57e5, 60a70e0).
- Query residues are no longer sorted unnecessarily (d589728).
Bug fixes
- Fixed CIF parsing for structures with numeric values (e.g. ligand code
919) and relative paths; no longer truncates multi‑char chain IDs (#41, 1bab90c) by @adobles96. - Fixed duplicated entry issue in the index (82cd989).
- Fixed a benchmark bug (#29, 3392bf2).
Documentation & developer notes
- Add
analyzecommand for index distribution analysis: reports total/empty/density and amino‑acid pair counts, with a--top-noption (10d42cb, c28c502, 58aa0f7, 7d6c911, 10eb601). - Add docker containers and conda/docker install instructions to README (9c13b9d, c6ee2ef).
- Updated preprint version, download links, and pre‑built indices; added Cargo and PDB dataset links (f928970, 335a295, 9375a2d).
- Removed obsolete doctests and the foldcomp feature test; test cleanup (2ebad32, 0f1d715, 5fedfb4).
