Skip to content

Add AlphaGenome oracle with cross-platform support#5

Merged
lucapinello merged 10 commits intomainfrom
alphagenome-oracle
Mar 18, 2026
Merged

Add AlphaGenome oracle with cross-platform support#5
lucapinello merged 10 commits intomainfrom
alphagenome-oracle

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

@lucapinello lucapinello commented Mar 13, 2026

Summary

  • AlphaGenome oracle: Full implementation of Google DeepMind's AlphaGenome model (JAX-based, 5,930 tracks, single bp resolution from 1 MB input) with HuggingFace-hosted gated weights
  • Cross-platform environment support: Platform-aware conda environment setup that auto-adapts dependencies for Linux x86_64 (CUDA), macOS Intel, and macOS ARM64 (Apple Silicon)
  • Comprehensive test suite: 80 tests covering unit, oracle initialization, prediction methods, and smoke tests for all 6 oracles
  • Bug fixes: Sei oracle fixes, Borzoi metadata, environment runner improvements, coolbox/gtfsort macOS compatibility, AlphaGenome JAX Metal workaround

Key changes

  • New chorus/oracles/alphagenome.py + metadata/templates
  • New chorus/core/platform.py — platform detection and per-oracle dependency adaptation
  • Fixed environment.yml — coolbox moved to pip (ARM64 compat), gtfsort made optional with Python fallback
  • AlphaGenome auto-selects CPU on macOS (JAX Metal lacks default_memory_space support)
  • Comprehensive notebook (examples/comprehensive_oracle_showcase.ipynb) demonstrating all 6 oracles
  • Fresh install validation guide (FRESH_INSTALL_TEST.md)

Test plan

  • 80/80 pytest tests pass on macOS ARM64
  • 80/80 pytest tests pass on Linux x86_64
  • Fresh install from scratch validated on both platforms
  • All 6 oracles healthy on both platforms
  • Comprehensive notebook: 25 cells, 0 errors, 8 visualizations on both platforms
  • chorus setup, chorus health, chorus list all work cross-platform

🤖 Generated with Claude Code

lp698 and others added 10 commits March 12, 2026 20:21
Detect system architecture at runtime and adapt oracle environment
YAML configs before creating conda environments. This allows the
canonical Linux x86_64 YAML files to work on Apple Silicon by
substituting incompatible packages (e.g. TensorFlow 2.8 -> 2.15.1,
removing CUDA packages, pre-building igraph/leidenalg via conda).

- New chorus/core/platform.py: PlatformInfo detection, declarative
  adaptation rules per oracle+platform, YAML config transformer
- Modified manager.py: applies adaptations in create_environment(),
  runs post-install pip steps (e.g. modisco-lite --no-deps)
- Adaptations defined for chrombpnet, enformer, borzoi, sei, legnet
  on macos_arm64; all other oracle+platform combos pass through unchanged

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix _pkg_name to handle conda single '=' version pins (e.g. cudatoolkit=11.7)
- Remove 'pytorch' conda channel for sei/borzoi/legnet on macOS ARM
  (pytorch packages available on conda-forge; pytorch channel blocked on some networks)
- Relax sei PyTorch <2.0 upper bound on ARM (PyTorch 2.x is compatible)
- Add setuptools<81 for enformer (tensorflow_hub needs pkg_resources)

Tested all 5 oracles on Apple Silicon:
  ✓ borzoi: Healthy + prediction OK
  ✓ chrombpnet: Healthy + prediction OK
  ✓ enformer: Healthy + prediction OK
  ✓ legnet: Healthy + prediction OK
  ✓ sei: Healthy + prediction OK

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
End-to-end tests that instantiate each oracle in its conda environment,
load a pretrained model, and run predict() on a genomic region from chr1.
Covers chrombpnet, enformer, borzoi, sei, and legnet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add AlphaGenome oracle (JAX, 1Mb context, 5,731 tracks at 1bp resolution)
  with metadata, conda environment, platform adaptations, and templates
- Fix OraclePrediction.end property (was returning .start)
- Fix OraclePredictionTrack.score() (was returning None)
- Fix PATH corruption in environment runner
- Add SPLICE_SITES and PRO_CAP track types to result.py
- Add comprehensive oracle showcase notebook (all 6 oracles, all operations)
- Expand test suite to 80 tests covering score(), subset(), platform, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix wrong notebook filename in README (gata1_comprehensive_analysis -> comprehensive_oracle_showcase)
- Remove outdated "Coming soon" comment for Borzoi (fully implemented)
- Fix "Enchanced" typo in README
- Fix Sei error message saying "Enformer" instead of "Sei"
- Fix Sei _cl2ind() using self._classes_list when already loaded
- Fix Sei download: re-download corrupt/truncated archives instead of failing
- Fix "Dowloading" typos in Sei log messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes 9 issues discovered during a complete fresh-install validation on
Linux with NVIDIA A100 GPUs (CUDA 13.1):

Environment & dependency fixes:
- Add nvidia channel to chorus-borzoi.yml for CUDA package resolution
- Add borzoi linux_x86_64_cuda platform adaptation to remove conflicting
  cudatoolkit/cuda-nvcc (PyTorch bundles its own CUDA runtime)
- Add setuptools<81 to chorus-enformer.yml (tensorflow_hub needs pkg_resources)
- Add compilers to chorus-alphagenome.yml (sorted_nearest requires C build
  on Python 3.11 where no pre-built conda package exists)
- Add oxbow to base environment.yml (required by coolbox for tab file reading)

Runtime fixes:
- Add LD_PRELOAD for env's libstdc++ in runner.py run_script_in_environment()
  (health checks failed with CXXABI_1.3.15 not found on system libstdc++)
- Increase dependency check timeout from 30s to 120s (TensorFlow import >30s)

AlphaGenome HuggingFace auth:
- Add pre-auth check in both _load_direct() and load_template.py to read
  HF_TOKEN env var before model download (prevents EOFError in subprocess)
- Add Step 1b to LINUX_TEST_INSTRUCTIONS.md documenting HF auth setup
- Fix AlphaGenome assay ID format in GPU validation script

Validated: 6/6 envs installed, 6/6 healthy, 80/80 tests pass, notebook
executes (25 cells, 0 errors, 8 visualizations) on both macOS and Linux.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Prerequisites section (Miniforge, disk space, platform support)
- Add verification step to installation instructions
- List all 6 oracle setup commands (was only showing 2)
- Fix AlphaGenome auth to recommend HF_TOKEN env var over conda activate
- Add missing use_environment=True to AlphaGenome code examples
- Fix Borzoi specs (524,288 bp input, 7,610 tracks)
- Improve CUDA/GPU troubleshooting for cross-platform accuracy
- Clarify first-run timing for model weight downloads
- Fix typo: vizualization -> visualization

Validated from scratch: 6/6 envs installed, 6/6 healthy, 80/80 tests
passed, notebook 25 cells / 0 errors / 8 visualizations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move coolbox/oxbow to pip in environment.yml (conda version requires
  pybbi which has no ARM64 build)
- Remove gtfsort from conda deps (Linux-only bioconda package) and add
  Python fallback in sort_gtf() for macOS
- Fix AlphaGenome JAX Metal crash (default_memory_space not supported)
  by setting JAX_PLATFORMS=cpu on macOS before import
- Update README to document Metal limitations for AlphaGenome

Validated: 80/80 tests pass, notebook 25 cells/0 errors/8 plots.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FRESH_INSTALL_TEST.md covers both Linux x86_64 and macOS ARM64 in a
single document, making LINUX_TEST_INSTRUCTIONS.md redundant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lucapinello lucapinello changed the title Add AlphaGenome oracle, fix core bugs, comprehensive notebook Add AlphaGenome oracle with cross-platform support Mar 15, 2026
@lucapinello lucapinello merged commit 5e0384f into main Mar 18, 2026
@lucapinello lucapinello deleted the alphagenome-oracle branch April 22, 2026 18:06
lucapinello added a commit that referenced this pull request Apr 24, 2026
Uniform treatment of user-facing error messages across the CLI and
oracle surfaces, closing the last v26 audit item. No behavioural
changes — just readability and actionability.

CLI (cli/_tokens.py, cli/main.py, cli/_setup_prefetch.py):
- All `logger.error(...)` messages now end with a period.
- HF-token-rejected errors point at
  https://huggingface.co/settings/tokens (retry hint).
- `chorus remove --oracle`, `chorus genome download`, `chorus setup`
  errors include the exact follow-up command to try.
- `_setup_prefetch.py` return-tuple error strings are capitalised
  and period-terminated for uniform rendering under main.py's
  `"  - {err}"` loop.

Oracles (oracles/*.py, not _source/):
- All "Failed to load X model in environment" errors now name the
  conda env (`chorus-X`) and point at `chorus health --oracle X`.
- All "Failed to load X model: {e}" errors end with a period
  (dropping superfluous `str(e)` since f-string formatting handles
  __str__ automatically).
- ChromBPNet's 6 `ValueError` calls for bad assay/cell/fold combos
  become `InvalidAssayError`, matching how Enformer / Borzoi / SEI /
  AlphaGenome handle the same class of user error. Dual
  `ChorusError, ValueError` inheritance (from v26 P2 #19) means
  `except ValueError` still works.
- AlphaGenome's HF-auth error message ends with a period.

Tests: 340 passed, 1 skipped on fast suite.

Co-authored-by: lp698 <lp698@dimm2fv07n65x.partners.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant