Release v0.4.2: Warp-Shuffle Reductions, __nanosleep, libcu++ Atomics · mivertowski/RustCompute

What's New

This release upgrades the CUDA codegen with practical findings from CUDA hardware research, targeting CC 6.0+ GPUs with the existing cudarc 0.18.2 runtime.

Warp-Shuffle Block Reductions

Two-phase warp-shuffle reduction replaces tree reduction in all generated CUDA reduction code
Phase 1: Intra-warp __shfl_down_sync(0xFFFFFFFF, val, offset) — zero __syncthreads() calls
Phase 2: Cross-warp reduction via shared memory — one __syncthreads() call
Reduces barrier count from O(log N) to 1 per block reduction (e.g., 9 → 1 for 512-thread blocks)
Applied to: persistent FDTD energy reduction, standalone block/grid reduce helpers, and all inline reduction generators

`__nanosleep()` Power Efficiency

Persistent FDTD idle spin-wait now uses __nanosleep() instead of volatile counter loop
Software grid barrier spin-loop uses __nanosleep(100) to reduce power consumption
Configurable via PersistentFdtdConfig::with_idle_sleep(ns) (default: 1000ns)

libcu++ Ordered Atomics (opt-in)

Opt-in cuda::atomic_ref support for H2K/K2H queue operations and software barriers
Uses memory_order_acquire/memory_order_release instead of __threadfence_system() pairs
Software barrier uses cuda::thread_scope_device (narrower scope) with memory_order_acq_rel
Compile-time CUDA 11.0+ version guard
Enable via PersistentFdtdConfig::with_libcupp_atomics(true)

Files Changed

crates/ringkernel-cuda-codegen/src/persistent_fdtd.rs — config fields, nanosleep, warp-shuffle reduction, libcu++ atomics
crates/ringkernel-cuda-codegen/src/reduction_intrinsics.rs — warp-shuffle upgrade for all reduction helpers

Test Results

215 codegen unit tests + 12 integration tests — all passing
6 CUDA GPU execution tests — verified on RTX 2000 Ada (CC 8.9)
Full workspace — zero failures

Full Changelog: v0.4.1...v0.4.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.2: Warp-Shuffle Reductions, __nanosleep, libcu++ Atomics

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Warp-Shuffle Block Reductions

`__nanosleep()` Power Efficiency

libcu++ Ordered Atomics (opt-in)

Files Changed

Test Results

Uh oh!

Uh oh!

v0.4.2: Warp-Shuffle Reductions, __nanosleep, libcu++ Atomics

What's New

Warp-Shuffle Block Reductions

__nanosleep() Power Efficiency

libcu++ Ordered Atomics (opt-in)

Files Changed

Test Results

Uh oh!

`__nanosleep()` Power Efficiency