Skip to content

Post-PR-#62 follow-ups: env-setup gating + equivalence-test rewrite (closes #63, #64, #65)#66

Merged
lucapinello merged 2 commits intomainfrom
fix/post-pr62-followups
Apr 29, 2026
Merged

Post-PR-#62 follow-ups: env-setup gating + equivalence-test rewrite (closes #63, #64, #65)#66
lucapinello merged 2 commits intomainfrom
fix/post-pr62-followups

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

Closes #63, #64, #65 — the three follow-ups from PR #62's Linux/CUDA spot-check audit. Tightly coupled, so handled together.

Summary

Two commits, three issues:

bc923b4 — fix(env-setup): timeout-soft policy + plug load_pretrained_model gate (#64)

Two fixes that should have been one:

  1. chorus/core/base.py:_setup_environment now distinguishes transient validation timeouts (slow NFS / cold-cache import jax probes) from genuine missing-dep failures. Timeout-only failures keep use_environment=True; genuine failures keep the v26 P1 fix: AlphaGenome JAX Metal guard + macOS fresh-install audit v4 #11 invariant (downgrade + raise on next user call).

  2. Each oracle's load_pretrained_model now calls self._check_env_ready() as its first action (7 oracles, 1-line edit each). predict() already did this; load_pretrained_model didn't, so a silent downgrade still surfaced as ModuleNotFoundError from _load_direct instead of the intended EnvironmentNotReadyError.

  3. tests/test_environment_setup_gating.py — 4 fast-suite tests pinning both branches without building real conda envs. Run in <2 s.

The audit-script workaround (o.use_environment = True; o._env_setup_error = None) is no longer needed.

fe9b08d — test(alphagenome): rewrite equivalence test via chorus API (#63, #65)

The pre-rewrite test compared JAX and PyTorch at the raw model head and asserted shape parity. JAX exposes 305 DNase tracks and PyTorch exposes 384 — different upstream filtering, not different weights. Shape assertion was bound to fail on any platform.

Rewrite:

  • Uses oracle.predict() which goes through chorus's local_index slicing — post-slicing arrays are shape-compatible across backends regardless of raw head shape.
  • Drops the ~70-line JAX_RUNNER + PT_RUNNER subprocess heredocs.
  • Drops bare subprocess.run([\"mamba\", ...]) calls (closes tests/test_alphagenome_backends_equivalence.py: subprocess.run(["mamba", ...]) fails inside "mamba run -n chorus pytest" #65).
  • Uses EnvironmentManager.environment_exists() for skip detection (chorus's own resolver handles MAMBA_EXE / shutil.which / fallback chain).
  • Hardcodes 3 canonical DNase identifiers (K562, HepG2, hepatocyte) verified against alphagenome_metadata.search_tracks() at write time.
  • Tolerances unchanged: max(|pt - jax|) < 0.1, mean rel < 5%.

Test plan

  • Fast suite locally: 368 passed, 1 skipped (was 364) — +4 new gating tests
  • Linux CI: should pass on the same fast-suite criterion
  • macOS integration: pytest tests/test_alphagenome_backends_equivalence.py -m integration -v — both backends must produce equivalent outputs at SORT1 (audit numbers were 0.74–1.85% per-track rel diff)
  • Linux/CUDA integration on user's other box: same command, no o.use_environment = True; o._env_setup_error = None workaround needed
  • Manual smoke for chorus.core.base: env-validation timeout silently flips use_environment=True → False, then crashes on import in wrong env #64: on a slow-NFS box, chorus.create_oracle('alphagenome', use_environment=True) should log a single timeout warning and oracle.use_environment should still be True afterward; subsequent oracle.load_pretrained_model() should NOT raise ModuleNotFoundError

🤖 Generated with Claude Code

lp698 and others added 2 commits April 29, 2026 19:20
…closes #64)

Two related fixes for issue #64:

1. chorus/core/base.py:_setup_environment now distinguishes transient
   validation timeouts (slow NFS / cold-cache `import jax` probes) from
   genuine missing-dep failures. validate_environment returns issues
   prefixed with "Timeout while checking dependency" for the timeout
   case and "Missing dependency" / "Error checking" for real failures.

   Previously, BOTH paths set use_environment=False and recorded an
   _env_setup_error, leading to oracle.load_pretrained_model() falling
   through to _load_direct in the wrong env on cold-NFS lab boxes.

   New policy:
   - Timeout-only failures: log a warning, KEEP use_environment=True
     (don't set _env_setup_error). The actual subprocess invocation has
     its own per-call timeout and will surface a real error if the env
     is genuinely broken.
   - Genuine missing-dep failures: keep the v26 P1 #11 invariant —
     downgrade + raise EnvironmentNotReadyError on next user call.

   The conjunctive 'all timeouts' check means a mixed timeout+missing-dep
   issue list is treated as a genuine failure (the missing-dep is the
   real signal).

2. Each oracle's load_pretrained_model now calls self._check_env_ready()
   as its first action. predict() already does this (base.py:215);
   load_pretrained_model didn't, so a silent downgrade still surfaced
   as ModuleNotFoundError from _load_direct instead of the intended
   EnvironmentNotReadyError. 7 oracles, 1-line edit each:

   - alphagenome.py
   - alphagenome_pt.py
   - borzoi.py
   - chrombpnet.py
   - sei.py
   - legnet.py
   - enformer.py

   _check_env_ready is a no-op when _user_asked_for_env=False
   (base.py:180), so this is safe for tests that pass
   use_environment=False.

3. tests/test_environment_setup_gating.py — 4 fast-suite tests pinning:
   - Timeout-only: use_environment stays True, _env_setup_error stays None
   - Missing-dep: downgrade + EnvironmentNotReadyError on load_pretrained_model
   - Mixed timeouts + missing-dep: treated as genuine failure
   - use_environment=False: validation path never runs, _check_env_ready
     never raises

   Tests monkeypatch EnvironmentManager.validate_environment so they
   don't need a real conda env or GPU. Run in <2 s.

Audit-script workaround at audits/2026-04-29_alphagenome_pt_stress_test/
(o.use_environment = True; o._env_setup_error = None) is no longer
needed after this fix. Linux/CUDA spot-check should confirm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#65)

The pre-rewrite test compared JAX and PyTorch outputs at the *raw model
head* and asserted shape parity. JAX's DNase head exposes 305 tracks
and the PyTorch port's exposes 384 — different upstream filtering
choices, not different weights. The shape assertion was bound to fail
on any platform; it just didn't surface until the Linux/CUDA spot-check
on `audit/linux-cuda-pr62`.

Rewrite uses oracle.predict() which goes through chorus's local_index
slicing in chorus/oracles/alphagenome.py:_predict — that selects the
user-requested tracks by identifier from the shared 5,731-track
metadata cache (alphagenome_tracks.json). Post-slicing arrays are
shape-compatible across backends regardless of raw head shape.

Concrete changes:

- Drop subprocess-driven JAX_RUNNER + PT_RUNNER heredocs (~70 lines)
  and the bare subprocess.run(["mamba", ...]) calls in _env_exists()
  and _run() (closes #65). The test now uses chorus's own oracle API
  which spawns into per-oracle envs via EnvironmentRunner.run_code_in_environment
  — the canonical path that resolves mamba/conda binaries via
  EnvironmentManager's MAMBA_EXE / shutil.which / fallback chain.
- Use EnvironmentManager.environment_exists(oracle) for skip detection
  instead of rolling our own `mamba env list --json` shellout.
- Hardcode three canonical DNase identifiers (K562 EFO:0002067,
  HepG2 EFO:0001187, hepatocyte CL:0000182), verified against
  alphagenome_metadata.get_metadata().search_tracks() at write time.
  Same set as v30 macOS Tier 1 audit so equivalence numbers stay
  comparable across audits.
- Tolerances unchanged: max abs diff < 0.1, mean rel diff < 5%.
  Both M3 Ultra and A100 audits reported 0.74–1.85% per-track rel
  error, well inside this bound.

Test stays @pytest.mark.integration; skips cleanly when either env
or hg38.fa is missing. Run on Linux/CUDA via:

    mamba run -n chorus pytest tests/test_alphagenome_backends_equivalence.py -m integration -v

— no longer fails with FileNotFoundError on bare `mamba` (the
inner subprocess goes through chorus's resolver, not the test's
own subprocess.run). #64 fix in the previous commit removes the
need for the audit-script workaround
(o.use_environment = True; o._env_setup_error = None).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lucapinello lucapinello merged commit b3bf94a into main Apr 29, 2026
1 check passed
@lucapinello lucapinello deleted the fix/post-pr62-followups branch April 29, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant