Skip to content

Add multi-architecture support for ARM macOS#4

Merged
lucapinello merged 3 commits intomainfrom
multi-architecture
Mar 18, 2026
Merged

Add multi-architecture support for ARM macOS#4
lucapinello merged 3 commits intomainfrom
multi-architecture

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

Summary

  • Platform detection: New chorus/core/platform.py module detects system architecture (ARM/x86_64), OS (macOS/Linux), and CUDA availability at runtime
  • Automatic environment adaptation: EnvironmentManager.create_environment() now adapts conda YAML configs on-the-fly for the detected platform — no separate YAML files needed, Linux x86_64 configs remain the canonical source
  • All 5 oracles supported on Apple Silicon: chrombpnet (TF 2.15.1), enformer (TF >=2.15 + setuptools pin), borzoi (CUDA removal), sei (channel/version fixes), legnet (pytorch-gpu → pytorch)
  • Smoke tests: End-to-end predict() tests for all 5 oracles (tests/test_smoke_predict.py)

Key design decisions

  • Zero impact on Linux x86_64 — adaptations only apply when PlatformInfo.key matches (e.g., macos_arm64)
  • Environment YAML files are unchanged; all platform-specific logic lives in PLATFORM_ADAPTATIONS dict
  • Post-install steps support (e.g., pip install --no-deps for modisco-lite on ARM)
  • Handles blocked anaconda.org by removing pytorch/nvidia channels (conda-forge has ARM builds)

Test plan

  • All 5 oracle environments created successfully on Apple Silicon (M-series Mac)
  • All 5 oracles pass health checks
  • All 5 oracles return valid predictions from predict() on genomic regions
  • pytest tests/test_smoke_predict.py -v -s — 5/5 passed
  • Verify no regression on Linux x86_64 (pass-through, no adaptation applied)

🤖 Generated with Claude Code

lp698 and others added 3 commits March 12, 2026 20:21
Detect system architecture at runtime and adapt oracle environment
YAML configs before creating conda environments. This allows the
canonical Linux x86_64 YAML files to work on Apple Silicon by
substituting incompatible packages (e.g. TensorFlow 2.8 -> 2.15.1,
removing CUDA packages, pre-building igraph/leidenalg via conda).

- New chorus/core/platform.py: PlatformInfo detection, declarative
  adaptation rules per oracle+platform, YAML config transformer
- Modified manager.py: applies adaptations in create_environment(),
  runs post-install pip steps (e.g. modisco-lite --no-deps)
- Adaptations defined for chrombpnet, enformer, borzoi, sei, legnet
  on macos_arm64; all other oracle+platform combos pass through unchanged

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix _pkg_name to handle conda single '=' version pins (e.g. cudatoolkit=11.7)
- Remove 'pytorch' conda channel for sei/borzoi/legnet on macOS ARM
  (pytorch packages available on conda-forge; pytorch channel blocked on some networks)
- Relax sei PyTorch <2.0 upper bound on ARM (PyTorch 2.x is compatible)
- Add setuptools<81 for enformer (tensorflow_hub needs pkg_resources)

Tested all 5 oracles on Apple Silicon:
  ✓ borzoi: Healthy + prediction OK
  ✓ chrombpnet: Healthy + prediction OK
  ✓ enformer: Healthy + prediction OK
  ✓ legnet: Healthy + prediction OK
  ✓ sei: Healthy + prediction OK

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
End-to-end tests that instantiate each oracle in its conda environment,
load a pretrained model, and run predict() on a genomic region from chr1.
Covers chrombpnet, enformer, borzoi, sei, and legnet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lucapinello lucapinello merged commit 2df2711 into main Mar 18, 2026
lucapinello added a commit that referenced this pull request Apr 15, 2026
The 2026-04-16 deep application audit proposed 6 fixes; commit 5ebb328
implemented 5 of them. This commit adds the missing LOW-priority Fix #4:
a short application note on why AlphaGenome DNASE and ChromBPNet ATAC
can report different effects for the same variant in the same cell type.

Three reasons documented: different training data (DNase vs Tn5),
different receptive fields (1 Mb vs 2 kb), different effect aggregation
(binned sum vs peak height).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
lucapinello pushed a commit that referenced this pull request Apr 17, 2026
Fresh-install audit at e99fd66 verifying all 4 v10 fixes on a truly
clean slate. Teardown: 14.2 GB including tfhub_modules/ this time.

All 4 v10 fixes verified live:
- Fix #1 (tfhub recovery): code path exists + first-install smoke
  passes on wiped tfhub cache.
- Fix #2 (IGV HF fallback): 0/16 HTMLs fell back to CDN on the same
  SSL-MITM network that had 6/16 fallbacks in v10.
- Fix #3 (FTO README): accurate HepG2 framing + adipose assay_ids
  block for the ideal run.
- Fix #4 (bgzip PATH): 0 'bgzip is not installed' lines across 235
  notebook cells (v10 had 20/34/60 per notebook).

One minor regression exposed: Fix #4 makes tabix findable, which
reveals a pre-existing bug where download_gencode leaves a stale
.tbi file that coolbox's `tabix -p gff` rejects with "index file
exists". Workaround = delete .tbi; NB1 retry succeeded. Proposed
3-line follow-up fix to annotations.py documented in the report.

Also verified:
- 308/308 pytest on fresh env (17.3 s)
- 6/6 oracle smoke (7 min 2 s) — first Enformer fresh-install with
  wiped tfhub cache
- 12/12 regen within AlphaGenome CPU non-determinism tolerance
- 0 orphan HTMLs after parallel regen
- 3 notebooks: 0 errors, 0 warnings, 0 bgzip spam
- 16/16 HTMLs clean in Selenium
- FTO README spot-check confirms Fix #3 committed correctly

After 11 audit passes — the last two have surfaced no actual chorus
bugs, only environmental quirks (tfhub cache, SSL MITM, PATH
inheritance, stale .tbi).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lucapinello pushed a commit that referenced this pull request Apr 17, 2026
…efreshes GTF

v11 audit exposed a pre-existing bug masked by the pre-Fix-#4 state where
tabix was not on PATH: when download_annotation refreshes a GTF, leftover
coolbox artefacts (file.gtf.bgz + file.gtf.bgz.tbi from a previous session)
point at byte offsets in the old .bgz that no longer match the new one.
coolbox then calls tabix -p gff file.bgz on its next GTF() read, tabix
refuses to overwrite without -f, and the notebook cell crashes with:

    CalledProcessError: Command '['tabix', '-p', 'gff', ...]'
    returned non-zero exit status 1.

Fix: in AnnotationManager.download_annotation, after sort_annotation
writes the fresh GTF, unlink any stale .bgz / .bgz.tbi / .gz.tbi
sharing the same stem. coolbox then regenerates them cleanly on first
GTF() call. Three extra unlink() calls on a not-hot path.

Unit test: TestStaleGTFIndexCleanup in tests/test_error_recovery.py
mocks requests.get + sort_annotation, primes the annotations dir with
stale .bgz and .tbi, and verifies both are removed after
download_annotation returns.

Verified: pytest -m "not integration" → 309 passed (was 308).

Without this fix, a notebook run after any annotation refresh
(download_gencode() called twice across sessions, or a newer GENCODE
version pulled) hits the tabix error on first coolbox visualization cell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant