Release v0.8.0 · oritwoen/kangaroo

v0.8.0 brings multi-GPU support, a full performance sweep across the solver pipeline, and a lot less dead code.

👀 Highlights

🎮 Multi-GPU solving — kangaroo can now dispatch work across multiple GPUs simultaneously (#69). Been the most requested feature for a while - if you have more than one card, the solver finally uses them all.

⚡ GPU pipeline got significantly faster. Compute pipelines are cached across repeated solves now (#100) so you don't pay compilation cost every time. Lock scope during pipeline compilation was way too broad - narrowed it down (#101). Dispatch and DP readback run pipelined instead of sequential (#95). Together these cut GPU overhead substantially for repeated or batched solves.

CPU hot paths got attention too. Post-jump affine was recomputing field inversions that don't change between walks - cached (#97). Hot loop for x-coordinate extraction was hitting the allocator on every iteration - gone (#91).

Solver startup for small ranges used to do full initialization even when the search space was tiny. Cut that overhead (#99). StoredDP distances are fixed-size arrays now instead of heap-allocated (#88), and walk + DP readback logic is tighter (#87).

🩹 GPU poll waits could hang indefinitely under certain device conditions - bounded now (#85). DP counter wasn't resetting between calibration probes, which gave wrong calibration numbers on repeated runs (#83). Provider bounds rejected exact-fit ranges that should've been valid (#76). Oversized U256 hex input panicked instead of returning error (#75).

💅 Swept dead code across the whole crate - dropped unused SharedResources, dead constructors, is_provider predicate, LE arithmetic helpers, DP mask helpers, stale dead_code allows, and unused dashmap/thiserror/futures deps (#84-#94). Net result: cleaner dependency tree and less surface area.

✅ Upgrading

cargo install kangaroo

👉 Changelog

compare changes

🚀 Features

gpu: Allow multiple devices (#69)

⚡ Performance

gpu: Cache compute pipelines across repeated solves (#100)
gpu: Narrow pipeline cache lock scope during compilation (#101)
solver: Pipeline GPU dispatch and DP readback (#95)
solver: Cut startup overhead for small-range solves (#99)
solver: Tighten walk and DP readback (#87)
cpu: Cache post-jump affine to skip redundant field inversions (#97)
cpu: Avoid heap alloc in hot loop x-coordinate extraction (#91)
dp_table: Inline StoredDP dist as fixed-size array (#88)

🩹 Fixes

solver: Bound GPU poll waits to prevent indefinite hangs (#85)
solver: Reset DP counter between calibration probes (#83)
provider: Allow exact-fit search ranges for provider bounds (#76)
crypto: Avoid panic on oversized U256 hex input (#75)
cli: Make benchmark dispatch explicit (#77)
ci: Gate crates publish with release checks (#82)

💅 Refactors

deps: Drop unused dashmap, thiserror, and futures (#84)
solver: Drop unused SharedResources and dead constructors (#89)
crypto: Drop dead utils, remove stale dead_code allows (#90)
math: Drop dead LE arithmetic and DP mask helpers (#92)
provider: Drop unused is_provider predicate (#93)
dp_table: Drop dead len and is_empty methods (#94)
dp_table: Use Neg trait instead of Scalar::ZERO - x (#96)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.8.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

👀 Highlights

✅ Upgrading

👉 Changelog

🚀 Features

⚡ Performance

🩹 Fixes

💅 Refactors

❤️ Contributors

Contributors

Uh oh!