Skip to content

feat: unify track-rescale across IGV/matplotlib/CoolBox/notebooks + uniform DHS-augmented chrombpnet CDF#78

Merged
lucapinello merged 18 commits intomainfrom
fix/post-v040-followups
May 10, 2026
Merged

feat: unify track-rescale across IGV/matplotlib/CoolBox/notebooks + uniform DHS-augmented chrombpnet CDF#78
lucapinello merged 18 commits intomainfrom
fix/post-v040-followups

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

@lucapinello lucapinello commented May 6, 2026

Summary

Builds on @lorenzoruggerii's PR #79 (mixed-resolution IGV rendering + LegNet normalization fallback) with:

  1. Unified track-rescale across all four rendering paths. chorus.analysis._igv_report.rescale_for_display() is now the single source of truth — IGV variant reports, multi-oracle reports, causal reports, matplotlib figures, CoolBox panels, and notebooks all go through it. Default-call behaviour (no kwargs) auto-loads the per-track normalizer so a fresh user calling track.get_coolbox_representation() or render_track_figures(...) gets CDF-rescaled output without thinking about it. normalize=False opts out.
  2. Symmetric signed rescale. Borzoi RNA / Sei / LentiMPRA tracks now rescale to [-3, +3] using p99(|cdf|) as the unit (signed_floor_rescale_batch) so repressive (negative) effects are visible. Previously they clipped to 0.
  3. Auto-load defaults wired for matplotlib + CoolBox.
  4. OraclePrediction.add() backfills track.assay_id from the dict key so ChromBPNet tracks (which previously left assay_id=None) participate cleanly in the unified rescale.
  5. ChromBPNetOracle.predict_sliding() — slides the 2114-bp model across arbitrary intervals so the multi-oracle IGV panel actually shows ChromBPNet across AlphaGenome's 1 Mb window. _predict() auto-routes wide queries to it (also fixes a pre-existing IndexError in _predict_direct's sliding formula that PR Fix/igv multi resolution normalization #79's wider region triggered).
  6. _calculate_track_bin_size corrected: chrombpnet now uses agg="max" (PR Fix/igv multi resolution normalization #79's docstring said max, code returned mean — diluting 1 bp peaks 20×).
  7. Lower per-layer floors so peak base/shoulder is visible: chromatin_accessibility 0.95→0.90, promoter_activity 0.95→0.85.
  8. DHS-augmented chrombpnet CDF on HuggingFace — uniformly across all 786 tracks (42 ATAC/DNASE + 744 BPNet/CHIP). DHS-anchored sampling (~10K SNPs at random offsets within ±150 bp of Meuleman 2020 DHS summits) makes percentiles more discriminating for cell-type-specific peaks. Auto-fetched on first install.
  9. DHS vocabulary mirrored to chorus-backgrounds HF dataset with auto-fetch in load_dhs_vocabulary(). No more gdown step needed for users who want to rebuild CDFs.
  10. Per-layer CDF sampling guide added to docs/NORMALIZATION_GUIDE.md for new-oracle developers (recipes for chromatin / TF / histone / CAGE / RNA / splicing / MPRA / Sei).

Verification

Three audit docs in audits/:

  • 2026-05-08_post_pr79_merge_audit.md — code-level merge of PR Fix/igv multi resolution normalization #79 + our follow-ups; CDF flow per oracle ("is this a hack or principled?" question); 376 tests passed warm
  • 2026-05-09_dhs_chrombpnet_full_rebuild.md — DHS-augmented CDF rebuild on ml007 (744 CHIP rows, ~6 h on 2× A100), Mac splice (42 ATAC/DNASE rows from local), uniform 786-track NPZ uploaded to HF, round-trip sha-verified
  • 2026-05-09_scorched_earth_fresh_install.md — wiped all 8 chorus envs + caches + genomes, then ran README Steps 1–3 literally as a brand-new user. No P0/P1 friction.

End-to-end on a clean install (M3 Ultra Metal, warm conda cache):

  • Step 1 (env + pip install): 1m 54s
  • Step 2 (chorus setup): 18m 35s (README's 55–75 min estimate is conservative)
  • Step 3 (β-globin SNP via Enformer): 49s
  • pytest cold: 5m 39s, 376 passed, 1 skipped, 5 deselected
  • SORT1 multi-oracle cold: 6m 52s — chrombpnet +1.241 + alphagenome +1.336, both ↑, agree on direction
  • All 18 walkthrough HTMLs IGV-inspected: 0 issues, every panel has data
  • New uniform-DHS chrombpnet NPZ (sha 526beb2c…f83183eb, 78.5 MB) auto-fetched from HF on first install — sha matches upload

Test plan

  • pytest -m "not integration" → 376 passed cold
  • chorus setup from scratch → all 7 oracles ready, ~19 min on M3 Ultra
  • README Step 3 prediction snippet → output matches docs
  • SORT1 multi-oracle regen (chrombpnet + legnet + alphagenome + consolidate) → IGV renders all panels with peaks
  • All 18 walkthroughs programmatically IGV-inspected → 0 issues
  • HF round-trip for chrombpnet_pertrack.npz → sha verified
  • HF round-trip for dhs_vocabulary_hg38.txt.gz → sha verified
  • Signed-layer guard fires for LegNet LentiMPRA:HepG2 → autoscale-symmetric

Closes / supersedes

Closes #79 (Lorenzo's IGV multi-resolution normalization PR — merged into this branch as commit 63df601 with our follow-ups layered on top).

HuggingFace artifacts updated

  • lucapinello/chorus-backgrounds/chrombpnet_pertrack.npz — uniform-DHS 786-track CDF (was 786-track non-DHS)
  • lucapinello/chorus-backgrounds/dhs_vocabulary_hg38.txt.gz — Meuleman 2020 DHS index, mirrored from gdown

🤖 Generated with Claude Code

lp698 and others added 4 commits May 6, 2026 11:21
…F alias

ChromBPNet CHIP predictions emit track IDs like `CHIP:K562:REST:+`/`:-`
but the background CDF stores `CHIP:K562:REST` (no strand). All CHIP
normalization lookups silently returned None, falling back to raw
unscaled values in IGV reports and percentile output.

Fix: strip `:+`/`:-` suffix as a fallback in `_lookup`, `_lookup_batch`,
and `perbin_floor_rescale_batch` in `PerTrackNormalizer`. Both strand
predictions correctly share the single background distribution row.

Also alias `alphagenome_pt` → `alphagenome` in `_ensure_loaded` and
`get_pertrack_normalizer`: since both backends produce identical
predictions (same model + weights), no separate CDF file is needed.
The alias is bypassed automatically if a dedicated
`alphagenome_pt_pertrack.npz` appears in the cache later.

Adds 8 unit tests covering strand-suffixed lookups and the strandless
key fallthrough.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`chorus setup --oracle X` previously had no timeout for either the
`mamba env create` subprocess or the weight download phase. On slow or
unstable connections (e.g. remote lab servers) a stalled download would
hang indefinitely.

Changes:
- `chorus/cli/main.py`: add `--setup-timeout SECONDS` flag (default:
  unlimited). Passes through to both `create_environment()` and
  `prefetch_for_oracle()`.
- `chorus/core/environment/manager.py`: `create_environment()` gains a
  `timeout` parameter. A `threading.Timer` kills the `mamba env create`
  subprocess after N seconds and raises a descriptive `RuntimeError`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Uninstalling chorus previously required manually removing 7 conda envs,
downloaded weights, background CDFs, and reference genomes. `chorus cleanup`
handles this in one command.

Usage:
  chorus cleanup --oracle {name|all}   # env + weights
  chorus cleanup --backgrounds         # ~/.chorus/backgrounds/*.npz
  chorus cleanup --genomes             # downloaded reference genomes
  chorus cleanup --all                 # everything above
  chorus cleanup --all --dry-run       # preview without deleting

- Missing paths silently skipped (idempotent)
- Dry-run prints [DRY RUN] prefix on every action
- Summary line at end: "Removed N environment(s), M weight dir(s), K file(s)"
- README: Upgrading section updated to use `chorus cleanup --all`;
  new Uninstalling subsection added; `--setup-timeout` usage note added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…robe

Two bugs found during scorched-earth teardown/reinstall test:

1. `chorus/utils/genome.py`: `download_with_resume` releases its lock
   after writing `hg38.fa.gz` but before decompression. A concurrent
   `chorus setup` process could decompress+delete the `.gz` between
   those steps, leaving the first process with a FileNotFoundError.
   Fix: check if `fasta_path` already exists before decompressing; use
   `unlink(missing_ok=True)` to tolerate concurrent deletions.

2. `chorus/core/weights_probe.py`: `_probe_chrombpnet` checked for
   `CHORUS_DOWNLOADS_DIR/chrombpnet/DNASE_K562` — the old ENCODE tarball
   path. Since v0.3 the default path (fold=0, chrombpnet_nobias) downloads
   via the HF slim mirror into `~/.cache/huggingface/`, so the probe
   always reported "Not installed" even after a successful setup.
   Fix: switch to `_probe_library_cached` (trust the setup marker),
   matching enformer and borzoi.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lorenzoruggerii and others added 13 commits May 8, 2026 09:35
…GV. Add adaptive downsampling with max pooling for 1bp tracks (ChromBPNet), per-bin size calculation based on native resolution, IGV windowFunction hints, and CDF fallback for models without per-bin distributions (LegNet)
Resolves conflict in chorus/analysis/normalization.py:
- Adopt Lorenzo's _match_track_id / _find_matching_cdf helpers
- Extend _match_track_id to also strip CHIP strand suffix (:+/:-)
  so per-strand track IDs match merged CDF rows
- Move _has_samples guard inside _find_matching_cdf so failed-build
  perbin rows fall through to summary CDF instead of saturating
- Set perbin_floor_rescale_batch max_value default to 3.0 (matches
  _DISPLAY_MAX in IGV)

Lorenzo's other files (_igv_report.py, multi_oracle_report.py, scripts/
regenerate_*.py, example artefacts) auto-merged.
…tebooks

Single source of truth for normalization semantics: every renderer now
goes through `chorus.analysis._igv_report.rescale_for_display()`. By
default (no extra params) all four paths produce CDF-rescaled output —
1.0 = genome-wide p99, 3.0 cap, signed layers symmetric around 0.

Key changes
- New `rescale_for_display(values, layer, normalizer, oracle_name,
  assay_id) → (out, cfg)` returns rescaled values + display config
  (ymin, ymax, signed flag) usable by any renderer.
- `apply_floor_rescale` (the IGV ref/alt wrapper) now returns a
  4-tuple (rescaled, ref, alt, signed) so callers can pick symmetric
  vs unsigned scale_cfg.
- New `signed_floor_rescale_batch` rescales signed values to
  [-DISPLAY_MAX, +DISPLAY_MAX] using p99(|cdf|) so Borzoi RNA / Sei /
  LentiMPRA repressive effects are visible (was clipped to 0 before).
- `is_signed()` and `_match_track_id()` now share fuzzy track-id
  matching incl. CHIP `:+`/`:-` strand suffix stripping, so LegNet
  (`LentiMPRA:HepG2` → CDF row `HepG2`) correctly registers as signed.
- `OraclePrediction.add()` backfills `track.assay_id` from the dict
  key so CoolBox/matplotlib autoload paths can find the right CDF row
  on tracks that left assay_id None (notably ChromBPNet).
- CoolBox `get_coolbox_representation()` and matplotlib
  `render_track_figures()` now auto-load the per-track normalizer
  from `~/.chorus/backgrounds/` when called with no kwargs; pass
  `normalize=False` to opt out for raw values.
- `ChromBPNetOracle.predict_sliding()` slides the 2114-bp model across
  arbitrary intervals with cigar substitutions preserved, so the
  multi-oracle IGV panel covers the full AlphaGenome 1 Mb locus
  instead of 0.2 % of it.  `_predict()` auto-routes wide queries to
  it (PR #79's wider region had been triggering a pre-existing
  IndexError in `_predict_direct`'s sliding formula).
- `_calculate_track_bin_size` uses `(20, "max")` for ChromBPNet (was
  `(20, "mean")` — code/description mismatch in PR #79); max-pool
  preserves 1-bp peak heights instead of diluting them by 20x.
- Lower per-layer floors so peaks have visible base/shoulder:
  `chromatin_accessibility 0.95→0.90`, `promoter_activity 0.95→0.85`.
- Causal-report IGV (`causal._build_causal_igv`) now goes through the
  same helper as variant + multi-oracle reports.

Lorenzo's PR #79 changes preserved
- `_match_track_id` / `_find_matching_cdf` (with CHIP-strand fuzzy
  match added on top), `_calculate_track_bin_size`,
  `windowFunction: "max"` IGV hint, `(per-track norm)` LegNet label
  suffix, `get_max_output_size()` multi-oracle region width.

Tests: 376 passed, 1 skipped (env-gating), 5 deselected (integration).
Updated `test_perbin_none_for_scalar_oracles` (perbin → summary
fallback now succeeds for LegNet) and `test_apply_floor_rescale_passthrough`
(4-tuple).  Added `test_rescale_for_display_unified_helper`.

Annotations directory + screenshot sweeps gitignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Output of the code unification — same model predictions, new IGV
display semantics:
- chrombpnet panel: max-pool over 20 bp/bin so peaks have body, not
  just spikes
- legnet panel: symmetric [-3, +3] scale shows repressive (negative)
  half of LentiMPRA effects that was clipped to 0 before
- chromatin_accessibility floor lowered p95 → p90, promoter_activity
  p95 → p85 so peak base / shoulder is visible

Variant scoring (Effect/Activity percentile values) is unchanged —
the build script's window matches the live scoring window; only the
display layer was touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents what was tested, what passed, and one deferred follow-up
(DHS-augmented chrombpnet CDF needs to be rebuilt for all 786 tracks
incl. BPNet/CHIP before uploading to HuggingFace).

Also fixes two README stale claims that survived the unification work:
display-rescale range is [0, 3.0] not [0, 1.5] (matches _DISPLAY_MAX
in _igv_report.py).  Adds a row for the new signed-layer symmetric
[-3, +3] rescale semantics.

Re-regenerates SORT1 chrombpnet + multi-oracle artefacts against the
HF-shipped 786-track CDF (the production CDF every fresh install
gets) — drops the local DHS-only 42-track CDF that the previous
regen run had used.  SORT1 chrombpnet effect under HF CDF:
+0.318 log2FC, ≥99th %ile, Activity %ile 0.603 (same qualitative
interpretation as the local-DHS run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…et CDF

Augments the chrombpnet/BPNet CDF build with DHS-vocabulary-anchored
samples in addition to the random-genome reservoirs:

- ``--n-dhs-variants`` (default 10000): SNPs at random offsets within
  ±150 bp of Meuleman DHS peak summits, added to the effect CDF
  reservoir alongside the existing ~10 K random SNPs.
- ``--n-dhs-peaks`` (default 5000): DHS summit positions added to the
  baseline (activity) and per-bin reservoirs alongside the existing
  random/cCRE/TSS positions.

Hooks live in ``build_all_models()`` so both ATAC_DNASE (42 models)
and CHIP (1259 BPNet/JASPAR models) build paths pick up the DHS
augmentation when the script is run with ``--assay all``.

Requires ``annotations/dhs_vocabulary_hg38.txt.gz`` (Meuleman 2020
hg38 DHS Index, ~90 MB).  Download with::

    gdown --id 16wbuNmHnwsek3USWM04nR535vPavNZka \\
          -O annotations/dhs_vocabulary_hg38.txt.gz

Run with shards on a multi-GPU host (the canonical 6-shard split was
used for the existing 786-track HF release; see PR #53).  Single-GPU
runs are also supported but take ~6 days serial on M3 Ultra Metal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ChromBPNet/BPNet CDF rebuild pipeline (and any caller of
``load_dhs_vocabulary()``) now auto-fetches the Meuleman et al. 2020
DHS Index from
``huggingface.co/datasets/lucapinello/chorus-backgrounds`` if not
already cached at ``annotations/dhs_vocabulary_hg38.txt.gz``.  The
mirror is a verbatim copy of the original Meuleman distribution
(sha256 ``0a4d2150…1c1c48``, 86.6 MB).

Why: a multi-GPU CDF rebuild needs every shard to consume
byte-identical input.  The previous gdown step worked but required a
manual download per machine and depended on Google Drive remaining
reachable.  Mirroring to HF gives every chorus install (every shard
of every rebuild, every fresh-clone audit) the same input without
extra steps.

Falls back to a clear error pointing at the manual gdown command if
``huggingface_hub`` isn't installed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expands the "Adding a new oracle" walkthrough's CDF build step from
a single 35-line code stub into a per-layer recipe so external
contributors don't have to reverse-engineer what to sample for which
layer.

New subsections under "Step 4: Write the CDF build script":

- **What goes in each CDF** — what effect_cdfs / summary_cdfs /
  perbin_cdfs each capture and target sample counts (~20K / ~30-35K
  / ~1M).

- **Reservoir sampling in practice** — points at the canonical
  ReservoirSampler in build_backgrounds_chrombpnet.py for bounded
  memory + ``to_cdf_matrix(n_points=10000)``.

- **What to sample, layer by layer** — concrete recipes for all
  eight LAYER_CONFIGS layers, each with:
    * the right scoring window / formula / signed flag
    * which positions to sample for the variant reservoir (random
      vs DHS-anchored vs splice-site vs etc.)
    * which positions for the baseline reservoir (random vs cCRE
      vs DHS vs TSS vs exon midpoints vs splice sites)
    * pointers to the canonical example build script for that layer
      (chrombpnet for chromatin/ChIP-TF/histone, borzoi for RNA exon-
      based, alphagenome for CAGE-with-TSS routing, legnet for
      element-level signed, sei for sequence-class signed)

- **Common pitfalls** — five gotchas we've actually hit (all-random
  baseline producing inflated percentiles; missing signed_flags for
  RNA/MPRA/Sei silently clipping repressive halves; build-vs-live
  scoring window mismatch; perbin samples drawn only from peak
  centers; chromosome-edge boundary effects).

- **Build script skeleton** — a 50-line template that mixes the
  position sources from chorus.utils.annotations (sample_ccre_positions,
  sample_dhs_positions, get_gene_tss, get_gene_exons, get_screen_ccres)
  showing the canonical pattern for any new oracle.

This is the documentation a new ChromBPNet-class oracle developer
needed to build the 786-track DHS-augmented CDF without copy-pasting
build_backgrounds_chrombpnet.py and editing in the dark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rebuilt the CHIP rows of chrombpnet_pertrack.npz with the new
--n-dhs-variants / --n-dhs-peaks DHS-vocabulary augmentation. Kept the
42 ATAC/DNASE rows from the v29 production NPZ untouched (random-genome
sampling already covers them well; DHS augmentation matters most for
sparse CHIP binding-site data).

Approach:
- Sliced production NPZ to 42-row ATAC/DNASE base
- Ran --assay CHIP --shard-of 2 variants + baselines on ml007 (2× A100-40)
- merge-shards stitched 744 new CHIP rows onto the 42-row base via
  PerTrackNormalizer.append_tracks (dedup on track_id keeps ATAC/DNASE
  intact)

Compute: ~6 h total wall on ml007 alone (variants 2h 51min + baselines
3h 12min). User estimate of 6 days serial on M3 Ultra Metal vs ~6 h on
2× A100 = ~24× faster per model. ml008 + ml003 fanout aborted due to
another user's job on ml008 + V100 cuDNN errors on ml003.

Final NPZ on HF:
  repo:       lucapinello/chorus-backgrounds (dataset)
  HF commit:  47908dcdc36ab13b5cc1edbb1e3aafc0482d4d29
  sha256:     b8f8148453e8285195b77430970a2187ecd8df2d2a2b0074c5a0a68f37cb9906
  size:       78.5 MB
  rows:       786 (42 ATAC/DNASE preserved + 744 new DHS-augmented CHIP)

CHIP rows: effect_counts=18672 (10K random + 8.7K DHS, exact match to
spec), summary_counts=34004, perbin_counts=1088128, signed_flags False,
CDFs monotonic.

Round-trip verified: rm local + get_pertrack_normalizer('chrombpnet')
re-downloads from HF and matches sha byte-for-byte.

Stage 6 scorched-earth (wipe envs + reinstall on ml007) was deferred —
needs explicit user confirm before destructive run.

Lessons documented in the audit re: --gpu flag overriding
CUDA_VISIBLE_DEVICES at line 206 of build_backgrounds_chrombpnet.py
(scripts/build_backgrounds_chrombpnet.py).

Audit + numbers: audits/2026-05-09_dhs_chrombpnet_full_rebuild.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "ATAC/DNASE rows still v29 non-DHS" caveat from the GPU-
machine handoff (commit 315a0be) by splicing the local 42-row Mac
ATAC/DNASE rebuild (May 7, sha 896e72f1…b151431) into the 786-row HF
NPZ produced on ml007 today, in-place at the existing track-id
positions.  No further model recompute needed.

The new HF NPZ:
- repo:    huggingface.co/datasets/lucapinello/chorus-backgrounds
- file:    chrombpnet_pertrack.npz
- sha256:  526beb2ce8310f6fdb331f766eac55ce3262b67f1a43416532d8bad8f83183eb
- size:    78.5 MB
- 786 tracks, all effect_counts=18672, summary_counts=34004,
  perbin_counts=1088128 (uniform DHS coverage end-to-end)

Verification against the new NPZ (warm Mac M3 Ultra):
- pytest: 376 passed, 1 skipped, 5 deselected
- SORT1 chrombpnet single-oracle (DNASE:HepG2): +0.318 log2FC,
  %ile=0.96 (down from ≥99th under v29 non-DHS — expected
  conservative shift since the augmented background now includes
  ~8.7K DHS-anchored SNPs per track)
- 18 walkthrough HTMLs programmatically IGV-inspected: 0 issues, all
  panels show real data
- HF round-trip sha matches upload

Multi-oracle SORT1 + single-oracle SORT1 chrombpnet artefacts
regenerated against the uniform NPZ.  Audit doc updated with the
"Follow-up — 2026-05-09" section documenting the splice + verification.

Branch is mergeable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "DHS-augmented chrombpnet CDF — rebuild all 786 tracks then
upload" item is no longer deferred — it shipped on 2026-05-09 via
the hybrid build (744 CHIP on ml007 + local 42 ATAC/DNASE splice).
Update the audit doc so the trail is self-consistent for Lorenzo:
deferred-list item 1 marked ✅ closed, header caveat removed, and
both inline references point to audits/2026-05-09_dhs_chrombpnet_full_rebuild.md
for the sub-audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wiped 8 chorus envs + ~/.chorus + genomes/ + annotations/, then
followed README Steps 1-3 literally as a brand-new user.  All
documented commands worked verbatim, no P0/P1 friction.

Timing on M3 Ultra Metal w/ warm conda cache:
- Step 1 (env create + pip install -e .)  : 1m 54s
- Step 2 (chorus setup --hf-token …)       : 18m 35s
- Step 3 (β-globin SNP via Enformer)       : 49s
- pytest -m "not integration" cold         : 5m 39s, 376 passed
- SORT1 multi-oracle cold                  : 6m 52s
- Total wall                               : ~33 min

Round-trip verifications:
- New uniform-DHS chrombpnet NPZ (sha 526beb2c…f83183eb, 78.5 MB)
  auto-fetched from HF on first install — sha matches upload ✓
- DHS vocabulary auto-fetched via load_dhs_vocabulary()'s HF
  auto-download (verified earlier today)
- All 18 walkthrough HTMLs IGV-inspected, 0 issues, every panel
  has data
- Cross-oracle SORT1 consensus: chrombpnet +1.241 + alphagenome
  +1.336 on chromatin accessibility, both ↑, agree on direction

Branch fix/post-v040-followups is safe to merge into main per this
audit + the two preceding (2026-05-08 PR-79-merge audit, 2026-05-09
DHS chrombpnet rebuild audit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lucapinello lucapinello changed the title fix/feat: CHIP normalization, setup timeout, cleanup command, genome race fix feat: unify track-rescale across IGV/matplotlib/CoolBox/notebooks + uniform DHS-augmented chrombpnet CDF May 10, 2026
PR #78 brings substantial changes that justify a minor bump:
- Unified track-rescale across IGV / matplotlib / CoolBox / notebooks
  (single `rescale_for_display()` helper; default-call auto-rescales)
- Symmetric signed rescale for Borzoi RNA / Sei / LentiMPRA
- Uniform DHS-augmented chrombpnet CDF on HuggingFace (786 tracks)
- DHS vocabulary auto-fetch from HF
- ChromBPNet predict_sliding for arbitrary-width regions
- Per-layer CDF-sampling guide for new oracle developers

Verification trail: audits/2026-05-08_post_pr79_merge_audit.md,
audits/2026-05-09_dhs_chrombpnet_full_rebuild.md,
audits/2026-05-09_scorched_earth_fresh_install.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lucapinello lucapinello merged commit 3311351 into main May 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants