Add multi-architecture support for ARM macOS by lucapinello · Pull Request #4 · pinellolab/chorus

lucapinello · 2026-03-13T03:11:57Z

Summary

Platform detection: New chorus/core/platform.py module detects system architecture (ARM/x86_64), OS (macOS/Linux), and CUDA availability at runtime
Automatic environment adaptation: EnvironmentManager.create_environment() now adapts conda YAML configs on-the-fly for the detected platform — no separate YAML files needed, Linux x86_64 configs remain the canonical source
All 5 oracles supported on Apple Silicon: chrombpnet (TF 2.15.1), enformer (TF >=2.15 + setuptools pin), borzoi (CUDA removal), sei (channel/version fixes), legnet (pytorch-gpu → pytorch)
Smoke tests: End-to-end predict() tests for all 5 oracles (tests/test_smoke_predict.py)

Key design decisions

Zero impact on Linux x86_64 — adaptations only apply when PlatformInfo.key matches (e.g., macos_arm64)
Environment YAML files are unchanged; all platform-specific logic lives in PLATFORM_ADAPTATIONS dict
Post-install steps support (e.g., pip install --no-deps for modisco-lite on ARM)
Handles blocked anaconda.org by removing pytorch/nvidia channels (conda-forge has ARM builds)

Test plan

All 5 oracle environments created successfully on Apple Silicon (M-series Mac)
All 5 oracles pass health checks
All 5 oracles return valid predictions from predict() on genomic regions
pytest tests/test_smoke_predict.py -v -s — 5/5 passed
Verify no regression on Linux x86_64 (pass-through, no adaptation applied)

🤖 Generated with Claude Code

Detect system architecture at runtime and adapt oracle environment YAML configs before creating conda environments. This allows the canonical Linux x86_64 YAML files to work on Apple Silicon by substituting incompatible packages (e.g. TensorFlow 2.8 -> 2.15.1, removing CUDA packages, pre-building igraph/leidenalg via conda). - New chorus/core/platform.py: PlatformInfo detection, declarative adaptation rules per oracle+platform, YAML config transformer - Modified manager.py: applies adaptations in create_environment(), runs post-install pip steps (e.g. modisco-lite --no-deps) - Adaptations defined for chrombpnet, enformer, borzoi, sei, legnet on macos_arm64; all other oracle+platform combos pass through unchanged Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix _pkg_name to handle conda single '=' version pins (e.g. cudatoolkit=11.7) - Remove 'pytorch' conda channel for sei/borzoi/legnet on macOS ARM (pytorch packages available on conda-forge; pytorch channel blocked on some networks) - Relax sei PyTorch <2.0 upper bound on ARM (PyTorch 2.x is compatible) - Add setuptools<81 for enformer (tensorflow_hub needs pkg_resources) Tested all 5 oracles on Apple Silicon: ✓ borzoi: Healthy + prediction OK ✓ chrombpnet: Healthy + prediction OK ✓ enformer: Healthy + prediction OK ✓ legnet: Healthy + prediction OK ✓ sei: Healthy + prediction OK Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

End-to-end tests that instantiate each oracle in its conda environment, load a pretrained model, and run predict() on a genomic region from chr1. Covers chrombpnet, enformer, borzoi, sei, and legnet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The 2026-04-16 deep application audit proposed 6 fixes; commit 5ebb328 implemented 5 of them. This commit adds the missing LOW-priority Fix #4: a short application note on why AlphaGenome DNASE and ChromBPNet ATAC can report different effects for the same variant in the same cell type. Three reasons documented: different training data (DNase vs Tn5), different receptive fields (1 Mb vs 2 kb), different effect aggregation (binned sum vs peak height). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fresh-install audit at e99fd66 verifying all 4 v10 fixes on a truly clean slate. Teardown: 14.2 GB including tfhub_modules/ this time. All 4 v10 fixes verified live: - Fix #1 (tfhub recovery): code path exists + first-install smoke passes on wiped tfhub cache. - Fix #2 (IGV HF fallback): 0/16 HTMLs fell back to CDN on the same SSL-MITM network that had 6/16 fallbacks in v10. - Fix #3 (FTO README): accurate HepG2 framing + adipose assay_ids block for the ideal run. - Fix #4 (bgzip PATH): 0 'bgzip is not installed' lines across 235 notebook cells (v10 had 20/34/60 per notebook). One minor regression exposed: Fix #4 makes tabix findable, which reveals a pre-existing bug where download_gencode leaves a stale .tbi file that coolbox's `tabix -p gff` rejects with "index file exists". Workaround = delete .tbi; NB1 retry succeeded. Proposed 3-line follow-up fix to annotations.py documented in the report. Also verified: - 308/308 pytest on fresh env (17.3 s) - 6/6 oracle smoke (7 min 2 s) — first Enformer fresh-install with wiped tfhub cache - 12/12 regen within AlphaGenome CPU non-determinism tolerance - 0 orphan HTMLs after parallel regen - 3 notebooks: 0 errors, 0 warnings, 0 bgzip spam - 16/16 HTMLs clean in Selenium - FTO README spot-check confirms Fix #3 committed correctly After 11 audit passes — the last two have surfaced no actual chorus bugs, only environmental quirks (tfhub cache, SSL MITM, PATH inheritance, stale .tbi). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…efreshes GTF v11 audit exposed a pre-existing bug masked by the pre-Fix-#4 state where tabix was not on PATH: when download_annotation refreshes a GTF, leftover coolbox artefacts (file.gtf.bgz + file.gtf.bgz.tbi from a previous session) point at byte offsets in the old .bgz that no longer match the new one. coolbox then calls tabix -p gff file.bgz on its next GTF() read, tabix refuses to overwrite without -f, and the notebook cell crashes with: CalledProcessError: Command '['tabix', '-p', 'gff', ...]' returned non-zero exit status 1. Fix: in AnnotationManager.download_annotation, after sort_annotation writes the fresh GTF, unlink any stale .bgz / .bgz.tbi / .gz.tbi sharing the same stem. coolbox then regenerates them cleanly on first GTF() call. Three extra unlink() calls on a not-hot path. Unit test: TestStaleGTFIndexCleanup in tests/test_error_recovery.py mocks requests.get + sort_annotation, primes the annotations dir with stale .bgz and .tbi, and verifies both are removed after download_annotation returns. Verified: pytest -m "not integration" → 309 passed (was 308). Without this fix, a notebook run after any annotation refresh (download_gencode() called twice across sessions, or a newer GENCODE version pulled) hits the tabix error on first coolbox visualization cell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lp698 and others added 3 commits March 12, 2026 20:21

lucapinello merged commit 2df2711 into main Mar 18, 2026

lucapinello mentioned this pull request Apr 15, 2026

fixes: audit 2026-04-16 follow-up (remove 2 examples + normalizer/report hardening) #10

Merged

4 tasks

lucapinello mentioned this pull request Apr 17, 2026

Fix v10 audit findings: tfhub cache + IGV HF fallback + FTO doc + bgzip PATH #21

Open

2 tasks

lucapinello mentioned this pull request Apr 17, 2026

audit: 2026-04-17 v11 post-v10 verification audit #22

Open

This was referenced Apr 17, 2026

Fix v11 regression: remove stale .bgz/.tbi when download_annotation refreshes GTF #23

Open

audit: 2026-04-20 v12 full UX consistency audit (cross-modality) #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-architecture support for ARM macOS#4

Add multi-architecture support for ARM macOS#4
lucapinello merged 3 commits intomainfrom
multi-architecture

lucapinello commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lucapinello commented Mar 13, 2026

Summary

Key design decisions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant