Post-PR-#62 follow-ups: env-setup gating + equivalence-test rewrite (closes #63, #64, #65) by lucapinello · Pull Request #66 · pinellolab/chorus

lucapinello · 2026-04-29T23:26:36Z

Closes #63, #64, #65 — the three follow-ups from PR #62's Linux/CUDA spot-check audit. Tightly coupled, so handled together.

Summary

Two commits, three issues:

`bc923b4` — fix(env-setup): timeout-soft policy + plug load_pretrained_model gate (#64)

Two fixes that should have been one:

chorus/core/base.py:_setup_environment now distinguishes transient validation timeouts (slow NFS / cold-cache import jax probes) from genuine missing-dep failures. Timeout-only failures keep use_environment=True; genuine failures keep the v26 P1 fix: AlphaGenome JAX Metal guard + macOS fresh-install audit v4 #11 invariant (downgrade + raise on next user call).
Each oracle's load_pretrained_model now calls self._check_env_ready() as its first action (7 oracles, 1-line edit each). predict() already did this; load_pretrained_model didn't, so a silent downgrade still surfaced as ModuleNotFoundError from _load_direct instead of the intended EnvironmentNotReadyError.
tests/test_environment_setup_gating.py — 4 fast-suite tests pinning both branches without building real conda envs. Run in <2 s.

The audit-script workaround (o.use_environment = True; o._env_setup_error = None) is no longer needed.

`fe9b08d` — test(alphagenome): rewrite equivalence test via chorus API (#63, #65)

The pre-rewrite test compared JAX and PyTorch at the raw model head and asserted shape parity. JAX exposes 305 DNase tracks and PyTorch exposes 384 — different upstream filtering, not different weights. Shape assertion was bound to fail on any platform.

Rewrite:

Uses oracle.predict() which goes through chorus's local_index slicing — post-slicing arrays are shape-compatible across backends regardless of raw head shape.
Drops the ~70-line JAX_RUNNER + PT_RUNNER subprocess heredocs.
Drops bare subprocess.run([\"mamba\", ...]) calls (closes tests/test_alphagenome_backends_equivalence.py: subprocess.run(["mamba", ...]) fails inside "mamba run -n chorus pytest" #65).
Uses EnvironmentManager.environment_exists() for skip detection (chorus's own resolver handles MAMBA_EXE / shutil.which / fallback chain).
Hardcodes 3 canonical DNase identifiers (K562, HepG2, hepatocyte) verified against alphagenome_metadata.search_tracks() at write time.
Tolerances unchanged: max(|pt - jax|) < 0.1, mean rel < 5%.

Test plan

Fast suite locally: 368 passed, 1 skipped (was 364) — +4 new gating tests
Linux CI: should pass on the same fast-suite criterion
macOS integration: pytest tests/test_alphagenome_backends_equivalence.py -m integration -v — both backends must produce equivalent outputs at SORT1 (audit numbers were 0.74–1.85% per-track rel diff)
Linux/CUDA integration on user's other box: same command, no o.use_environment = True; o._env_setup_error = None workaround needed
Manual smoke for chorus.core.base: env-validation timeout silently flips use_environment=True → False, then crashes on import in wrong env #64: on a slow-NFS box, chorus.create_oracle('alphagenome', use_environment=True) should log a single timeout warning and oracle.use_environment should still be True afterward; subsequent oracle.load_pretrained_model() should NOT raise ModuleNotFoundError

🤖 Generated with Claude Code

…closes #64) Two related fixes for issue #64: 1. chorus/core/base.py:_setup_environment now distinguishes transient validation timeouts (slow NFS / cold-cache `import jax` probes) from genuine missing-dep failures. validate_environment returns issues prefixed with "Timeout while checking dependency" for the timeout case and "Missing dependency" / "Error checking" for real failures. Previously, BOTH paths set use_environment=False and recorded an _env_setup_error, leading to oracle.load_pretrained_model() falling through to _load_direct in the wrong env on cold-NFS lab boxes. New policy: - Timeout-only failures: log a warning, KEEP use_environment=True (don't set _env_setup_error). The actual subprocess invocation has its own per-call timeout and will surface a real error if the env is genuinely broken. - Genuine missing-dep failures: keep the v26 P1 #11 invariant — downgrade + raise EnvironmentNotReadyError on next user call. The conjunctive 'all timeouts' check means a mixed timeout+missing-dep issue list is treated as a genuine failure (the missing-dep is the real signal). 2. Each oracle's load_pretrained_model now calls self._check_env_ready() as its first action. predict() already does this (base.py:215); load_pretrained_model didn't, so a silent downgrade still surfaced as ModuleNotFoundError from _load_direct instead of the intended EnvironmentNotReadyError. 7 oracles, 1-line edit each: - alphagenome.py - alphagenome_pt.py - borzoi.py - chrombpnet.py - sei.py - legnet.py - enformer.py _check_env_ready is a no-op when _user_asked_for_env=False (base.py:180), so this is safe for tests that pass use_environment=False. 3. tests/test_environment_setup_gating.py — 4 fast-suite tests pinning: - Timeout-only: use_environment stays True, _env_setup_error stays None - Missing-dep: downgrade + EnvironmentNotReadyError on load_pretrained_model - Mixed timeouts + missing-dep: treated as genuine failure - use_environment=False: validation path never runs, _check_env_ready never raises Tests monkeypatch EnvironmentManager.validate_environment so they don't need a real conda env or GPU. Run in <2 s. Audit-script workaround at audits/2026-04-29_alphagenome_pt_stress_test/ (o.use_environment = True; o._env_setup_error = None) is no longer needed after this fix. Linux/CUDA spot-check should confirm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#65) The pre-rewrite test compared JAX and PyTorch outputs at the *raw model head* and asserted shape parity. JAX's DNase head exposes 305 tracks and the PyTorch port's exposes 384 — different upstream filtering choices, not different weights. The shape assertion was bound to fail on any platform; it just didn't surface until the Linux/CUDA spot-check on `audit/linux-cuda-pr62`. Rewrite uses oracle.predict() which goes through chorus's local_index slicing in chorus/oracles/alphagenome.py:_predict — that selects the user-requested tracks by identifier from the shared 5,731-track metadata cache (alphagenome_tracks.json). Post-slicing arrays are shape-compatible across backends regardless of raw head shape. Concrete changes: - Drop subprocess-driven JAX_RUNNER + PT_RUNNER heredocs (~70 lines) and the bare subprocess.run(["mamba", ...]) calls in _env_exists() and _run() (closes #65). The test now uses chorus's own oracle API which spawns into per-oracle envs via EnvironmentRunner.run_code_in_environment — the canonical path that resolves mamba/conda binaries via EnvironmentManager's MAMBA_EXE / shutil.which / fallback chain. - Use EnvironmentManager.environment_exists(oracle) for skip detection instead of rolling our own `mamba env list --json` shellout. - Hardcode three canonical DNase identifiers (K562 EFO:0002067, HepG2 EFO:0001187, hepatocyte CL:0000182), verified against alphagenome_metadata.get_metadata().search_tracks() at write time. Same set as v30 macOS Tier 1 audit so equivalence numbers stay comparable across audits. - Tolerances unchanged: max abs diff < 0.1, mean rel diff < 5%. Both M3 Ultra and A100 audits reported 0.74–1.85% per-track rel error, well inside this bound. Test stays @pytest.mark.integration; skips cleanly when either env or hg38.fa is missing. Run on Linux/CUDA via: mamba run -n chorus pytest tests/test_alphagenome_backends_equivalence.py -m integration -v — no longer fails with FileNotFoundError on bare `mamba` (the inner subprocess goes through chorus's resolver, not the test's own subprocess.run). #64 fix in the previous commit removes the need for the audit-script workaround (o.use_environment = True; o._env_setup_error = None). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lp698 and others added 2 commits April 29, 2026 19:20

lucapinello merged commit b3bf94a into main Apr 29, 2026
1 check passed

lucapinello deleted the fix/post-pr62-followups branch April 29, 2026 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-PR-#62 follow-ups: env-setup gating + equivalence-test rewrite (closes #63, #64, #65)#66

Post-PR-#62 follow-ups: env-setup gating + equivalence-test rewrite (closes #63, #64, #65)#66
lucapinello merged 2 commits intomainfrom
fix/post-pr62-followups

lucapinello commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lucapinello commented Apr 29, 2026

Summary

bc923b4 — fix(env-setup): timeout-soft policy + plug load_pretrained_model gate (#64)

fe9b08d — test(alphagenome): rewrite equivalence test via chorus API (#63, #65)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`bc923b4` — fix(env-setup): timeout-soft policy + plug load_pretrained_model gate (#64)

`fe9b08d` — test(alphagenome): rewrite equivalence test via chorus API (#63, #65)