Skip to content

fix(threads): rayon multithread verification + cgroup-aware thread cap (#263)#265

Merged
d-laub merged 13 commits into
mainfrom
worktree-rayon-multithread-verification
Jul 1, 2026
Merged

fix(threads): rayon multithread verification + cgroup-aware thread cap (#263)#265
d-laub merged 13 commits into
mainfrom
worktree-rayon-multithread-verification

Conversation

@d-laub

@d-laub d-laub commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Makes gvl's rayon-parallel paths runnable and verified under real multithreaded load, fixes the cap_threads() oversubscription bugs behind #263, and releases the GIL around the rayon FFI so parallel workers no longer serialize on it or oversubscribe/park.

Closes #263.

What changed

Thread-count resolver (python/genvarloader/_threads.py)

  • GVL_FORCE_PARALLEL env knob bypasses the size gate so the multithreaded paths run on small inputs (tests, repro harnesses).
  • CFS-cpu-quota-aware detection: reads cgroup v2 cpu.max (falling back to v1 cpu.cfs_quota_us/cpu.cfs_period_us). A quota is invisible to sched_getaffinity, so a 15.3-core container previously reported the full host core count and oversubscribed. Detection now takes min(affinity, quota).
  • cap_threads() now overwrites RAYON_NUM_THREADS with GVL's resolved cap (was setdefault).

FFI (src/ffi/mod.rs)

  • Releases the GIL around all 11 rayon parallel entry points via py.detach(...) (PyO3 0.28's rename of allow_threads). Every PyReadonly*/PyReadwrite* guard is resolved to an ndarray view before the closure; only Ungil views + POD are captured; every into_pyarray runs after. Byte-identical serial==parallel==golden parity is preserved.

Tests

  • tests/integration/test_rayon_forced_parallel.py — forced-parallel dataset[:, :] is byte-identical to the serial path end-to-end (compares both ragged .data and .offsets).
  • tests/integration/test_rayon_stress.py (slow) — repeated spawn-worker waves, oversubscribed rayon, with a per-launch timeout as the deadlock detector.
  • tests/parity/* — marked the pytest.skip except branches NoReturn (trailing raise) to clear pre-existing pyrefly unbound-name errors, unblocking the project-wide pre-commit pyrefly hook.

⚠️ Behavior change — RAYON_NUM_THREADS is now overwritten

cap_threads() previously respected an externally-set RAYON_NUM_THREADS; it now clobbers it with GVL's cgroup-derived cap. This is the point of the fix — an inherited value (e.g. from a base image) must not defeat the cap and cause oversubscription (#263).

Escape hatch: users who want an explicit worker count should set GVL_NUM_THREADS (which becomes the resolved cap and is written to RAYON_NUM_THREADS). Setting a bare RAYON_NUM_THREADS alone no longer wins.

Root cause / Task 8

The spawn-worker stress reproducer completed all launches cleanly (no deadlock) once the oversubscription fixes + GIL release were in place — so the contingent root-cause task was not needed. The #263 hang was driven by per-worker oversubscription, addressed by the cgroup-aware cap + unconditional RAYON_NUM_THREADS overwrite + GIL release.

Verification

  • cargo-test: 110 + 4 passed; cargo build --release: clean.
  • test_rayon_equivalence.py (byte-identical parity): 5 passed.
  • Full tree (pytest tests -q, slow excluded): 945 passed, 54 skipped, 4 xfailed.
  • Slow tier (stress reproducer): 1 passed.
  • pyrefly: 0 errors; ruff check/ruff format --check: clean. Pre-commit hooks pass without --no-verify.

No public API / __all__ / SKILL.md change (env-only knobs), so no skill update needed.

Note: this branch also carries the initiative's spec and plan docs (docs/superpowers/{specs,plans}/2026-06-30-rayon-multithread-verification*), which were committed to local main ahead of origin.

🤖 Generated with Claude Code

d-laub and others added 13 commits June 30, 2026 21:42
Design for issue #263: force-parallel override, cap_threads CFS-quota
fix + RAYON_NUM_THREADS overwrite, defensive allow_threads around rayon
FFI, and a spawn-worker stress reproducer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ation plan

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…263)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cap_threads() overwrites RAYON_NUM_THREADS with GVL's resolved count, so a
bare RAYON_NUM_THREADS=8 in the worker was clobbered by the host's detected
cpu count. Set GVL_NUM_THREADS=8 so each worker deterministically runs 8
rayon threads (N_WORKERS*8 oversubscription) regardless of host cores.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-laub d-laub merged commit 54a9c89 into main Jul 1, 2026
8 checks passed
@d-laub d-laub deleted the worktree-rayon-multithread-verification branch July 1, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nondeterministic rayon deadlock when iterating Dataset from concurrent (spawn) worker processes (v0.36.0)

1 participant