Post-PR-#62 follow-ups: env-setup gating + equivalence-test rewrite (closes #63, #64, #65)#66
Merged
lucapinello merged 2 commits intomainfrom Apr 29, 2026
Merged
Conversation
…closes #64) Two related fixes for issue #64: 1. chorus/core/base.py:_setup_environment now distinguishes transient validation timeouts (slow NFS / cold-cache `import jax` probes) from genuine missing-dep failures. validate_environment returns issues prefixed with "Timeout while checking dependency" for the timeout case and "Missing dependency" / "Error checking" for real failures. Previously, BOTH paths set use_environment=False and recorded an _env_setup_error, leading to oracle.load_pretrained_model() falling through to _load_direct in the wrong env on cold-NFS lab boxes. New policy: - Timeout-only failures: log a warning, KEEP use_environment=True (don't set _env_setup_error). The actual subprocess invocation has its own per-call timeout and will surface a real error if the env is genuinely broken. - Genuine missing-dep failures: keep the v26 P1 #11 invariant — downgrade + raise EnvironmentNotReadyError on next user call. The conjunctive 'all timeouts' check means a mixed timeout+missing-dep issue list is treated as a genuine failure (the missing-dep is the real signal). 2. Each oracle's load_pretrained_model now calls self._check_env_ready() as its first action. predict() already does this (base.py:215); load_pretrained_model didn't, so a silent downgrade still surfaced as ModuleNotFoundError from _load_direct instead of the intended EnvironmentNotReadyError. 7 oracles, 1-line edit each: - alphagenome.py - alphagenome_pt.py - borzoi.py - chrombpnet.py - sei.py - legnet.py - enformer.py _check_env_ready is a no-op when _user_asked_for_env=False (base.py:180), so this is safe for tests that pass use_environment=False. 3. tests/test_environment_setup_gating.py — 4 fast-suite tests pinning: - Timeout-only: use_environment stays True, _env_setup_error stays None - Missing-dep: downgrade + EnvironmentNotReadyError on load_pretrained_model - Mixed timeouts + missing-dep: treated as genuine failure - use_environment=False: validation path never runs, _check_env_ready never raises Tests monkeypatch EnvironmentManager.validate_environment so they don't need a real conda env or GPU. Run in <2 s. Audit-script workaround at audits/2026-04-29_alphagenome_pt_stress_test/ (o.use_environment = True; o._env_setup_error = None) is no longer needed after this fix. Linux/CUDA spot-check should confirm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#65) The pre-rewrite test compared JAX and PyTorch outputs at the *raw model head* and asserted shape parity. JAX's DNase head exposes 305 tracks and the PyTorch port's exposes 384 — different upstream filtering choices, not different weights. The shape assertion was bound to fail on any platform; it just didn't surface until the Linux/CUDA spot-check on `audit/linux-cuda-pr62`. Rewrite uses oracle.predict() which goes through chorus's local_index slicing in chorus/oracles/alphagenome.py:_predict — that selects the user-requested tracks by identifier from the shared 5,731-track metadata cache (alphagenome_tracks.json). Post-slicing arrays are shape-compatible across backends regardless of raw head shape. Concrete changes: - Drop subprocess-driven JAX_RUNNER + PT_RUNNER heredocs (~70 lines) and the bare subprocess.run(["mamba", ...]) calls in _env_exists() and _run() (closes #65). The test now uses chorus's own oracle API which spawns into per-oracle envs via EnvironmentRunner.run_code_in_environment — the canonical path that resolves mamba/conda binaries via EnvironmentManager's MAMBA_EXE / shutil.which / fallback chain. - Use EnvironmentManager.environment_exists(oracle) for skip detection instead of rolling our own `mamba env list --json` shellout. - Hardcode three canonical DNase identifiers (K562 EFO:0002067, HepG2 EFO:0001187, hepatocyte CL:0000182), verified against alphagenome_metadata.get_metadata().search_tracks() at write time. Same set as v30 macOS Tier 1 audit so equivalence numbers stay comparable across audits. - Tolerances unchanged: max abs diff < 0.1, mean rel diff < 5%. Both M3 Ultra and A100 audits reported 0.74–1.85% per-track rel error, well inside this bound. Test stays @pytest.mark.integration; skips cleanly when either env or hg38.fa is missing. Run on Linux/CUDA via: mamba run -n chorus pytest tests/test_alphagenome_backends_equivalence.py -m integration -v — no longer fails with FileNotFoundError on bare `mamba` (the inner subprocess goes through chorus's resolver, not the test's own subprocess.run). #64 fix in the previous commit removes the need for the audit-script workaround (o.use_environment = True; o._env_setup_error = None). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #63, #64, #65 — the three follow-ups from PR #62's Linux/CUDA spot-check audit. Tightly coupled, so handled together.
Summary
Two commits, three issues:
bc923b4— fix(env-setup): timeout-soft policy + plug load_pretrained_model gate (#64)Two fixes that should have been one:
chorus/core/base.py:_setup_environmentnow distinguishes transient validation timeouts (slow NFS / cold-cacheimport jaxprobes) from genuine missing-dep failures. Timeout-only failures keepuse_environment=True; genuine failures keep the v26 P1 fix: AlphaGenome JAX Metal guard + macOS fresh-install audit v4 #11 invariant (downgrade + raise on next user call).Each oracle's
load_pretrained_modelnow callsself._check_env_ready()as its first action (7 oracles, 1-line edit each).predict()already did this;load_pretrained_modeldidn't, so a silent downgrade still surfaced asModuleNotFoundErrorfrom_load_directinstead of the intendedEnvironmentNotReadyError.tests/test_environment_setup_gating.py— 4 fast-suite tests pinning both branches without building real conda envs. Run in <2 s.The audit-script workaround (
o.use_environment = True; o._env_setup_error = None) is no longer needed.fe9b08d— test(alphagenome): rewrite equivalence test via chorus API (#63, #65)The pre-rewrite test compared JAX and PyTorch at the raw model head and asserted shape parity. JAX exposes 305 DNase tracks and PyTorch exposes 384 — different upstream filtering, not different weights. Shape assertion was bound to fail on any platform.
Rewrite:
oracle.predict()which goes through chorus'slocal_indexslicing — post-slicing arrays are shape-compatible across backends regardless of raw head shape.JAX_RUNNER+PT_RUNNERsubprocess heredocs.subprocess.run([\"mamba\", ...])calls (closes tests/test_alphagenome_backends_equivalence.py: subprocess.run(["mamba", ...]) fails inside "mamba run -n chorus pytest" #65).EnvironmentManager.environment_exists()for skip detection (chorus's own resolver handlesMAMBA_EXE/shutil.which/ fallback chain).alphagenome_metadata.search_tracks()at write time.max(|pt - jax|) < 0.1, mean rel < 5%.Test plan
pytest tests/test_alphagenome_backends_equivalence.py -m integration -v— both backends must produce equivalent outputs at SORT1 (audit numbers were 0.74–1.85% per-track rel diff)o.use_environment = True; o._env_setup_error = Noneworkaround neededchorus.create_oracle('alphagenome', use_environment=True)should log a single timeout warning andoracle.use_environmentshould still be True afterward; subsequentoracle.load_pretrained_model()should NOT raiseModuleNotFoundError🤖 Generated with Claude Code