Skip to content

prep: ChromBPNet CDF rebuild against chrombpnet_nobias (script + handoff)#67

Merged
lucapinello merged 2 commits intomainfrom
prep/chrombpnet-cdf-rebuild
Apr 30, 2026
Merged

prep: ChromBPNet CDF rebuild against chrombpnet_nobias (script + handoff)#67
lucapinello merged 2 commits intomainfrom
prep/chrombpnet-cdf-rebuild

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

Prep PR for the ChromBPNet CDF rebuild flagged as a deferred follow-up at the 0.3.0 audit (audits/2026-04-28_chrombpnet_slim_mirror/).

Why

CDFs on huggingface.co/datasets/lucapinello/chorus-backgrounds were built against the bias-aware chrombpnet variant. 0.3.0 flipped the default to chrombpnet_nobias. User-facing percentile lookups now do chrombpnet_nobias predictions → chrombpnet empirical CDFs — the bias systematically shifts the mapping.

What's in this PR

No compute yet. Two prep changes:

  1. scripts/build_backgrounds_chrombpnet.py gets a --model-type argparse flag (default chrombpnet_nobias). Previously the script inherited the oracle's default model_type, which was chrombpnet in 0.2.x. The flag makes 0.3+ runs unambiguous and lets a future maintainer pin --model-type chrombpnet for ablation against the legacy CDF.

  2. audits/2026-04-29_chrombpnet_cdf_rebuild/HANDOFF.md — full agent handoff doc for running the rebuild on a CUDA box. Estimated wall-clock: ~3–5 h ATAC/DNase + ~10–20 h CHIP/BPNet on A100. Includes --shard / --shard-of commands for 2-GPU parallelism, spot-check before upload, and the HfApi upload command for lucapinello/chorus-backgrounds.

Why split prep + compute

CDF rebuild needs ~13–25 h on CUDA, not feasible on the macOS dev machine (~78 h on Metal). Separating prep (mergeable now, lets maintainers verify the script + plan) from compute (runs on the lab box, output is a single NPZ + audit report).

Test plan

  • mamba run -n chorus python scripts/build_backgrounds_chrombpnet.py --help shows the new --model-type flag with the expected choices and default.
  • Run the actual rebuild on lab CUDA box per HANDOFF.md (separate PR).
  • Verify the resulting NPZ passes the spot-check criteria in HANDOFF.md (monotone CDFs, all effect_counts > 0, no NaN/Inf).

🤖 Generated with Claude Code

lp698 and others added 2 commits April 29, 2026 19:48
Prep work for the deferred CDF rebuild against `chrombpnet_nobias`
that the 0.3.0 audit at `audits/2026-04-28_chrombpnet_slim_mirror/`
flagged as a follow-up.

Background: chrombpnet_pertrack.npz on
huggingface.co/datasets/lucapinello/chorus-backgrounds was built in
0.2.x against the bias-aware `chrombpnet` variant. After 0.3.0 flipped
the default to `chrombpnet_nobias` (bias-corrected), user-facing
percentile lookups go: `chrombpnet_nobias` predictions → `chrombpnet`
empirical CDFs. The bias systematically shifts the mapping.

Two changes in this prep commit, no compute yet:

1. scripts/build_backgrounds_chrombpnet.py: add `--model-type` argparse
   flag (default `chrombpnet_nobias`, matches 0.3+ chorus default).
   The build script previously called `oracle.load_pretrained_model(
   fold=args.fold, **spec)` without specifying model_type, so it
   inherited the oracle's default — which was `chrombpnet` in 0.2.x and
   is `chrombpnet_nobias` post-0.3. The new flag makes this explicit:
   re-running today produces the right CDF, and a future maintainer
   can pin `--model-type chrombpnet` for ablation against the legacy
   variant. Help text references the rebuild audit dir.

2. audits/2026-04-29_chrombpnet_cdf_rebuild/HANDOFF.md: agent handoff
   for running the rebuild on a CUDA box. Estimated wall-clock:
   ~3-5 h ATAC/DNase + ~10-20 h CHIP/BPNet on A100 (vs ~78 h on M3
   Ultra Metal — too slow for the macOS dev machine). Includes the
   exact `--shard / --shard-of` commands for 2-GPU parallelism, the
   spot-check before upload, and the upload command (HfApi to
   lucapinello/chorus-backgrounds, dataset repo).

The actual rebuild (compute + upload) gets done on the user's lab CUDA
box in a separate run. This commit is just the prep so the script is
unambiguous and the handoff is documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant