You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Added
Real-recipe e2e training benchmarks for ColQwen2 and PyLate
(bench_colpali_e2e.py, bench_pylate_e2e.py). Both instrument the loss
head to record per-MaxSim-call VRAM in-train, replay each recorded shape on
an isolated graph (exact forward/saved/backward brackets), and treat OOM as
a recorded sweep outcome rather than a crash; --variant vanilla|lik
toggles the patch, with summarize_*_e2e.py + scripts/sky_*_e2e.yaml
driving the sweep (fresh process per cell). Measured on 1×H100 80 GB:
ColQwen2's MaxSim op costs 7.81 GiB vanilla vs 61 MiB with LIK at B=128
(~130×), step time at parity, and vanilla OOMs at B=128 (a 1.81 GiB request
with 25 GiB reserved-but-unallocated) where LIK trains it — 2× batch
headroom; PyLate (grad-ckpt regime) drops step peak 54.1 → 29.7 GiB at
B=512, runs 1.07–1.12× faster per step, and trains B=1024 where vanilla
OOMs. The ColQwen2 bench targets released colpali-engine 0.3.16 and shims
its two ContrastiveTrainer bugs under transformers 5.x (fixed upstream in colpali#412, unreleased).
Tables in docs/benchmarks.md.
Changed
patch_pylate() / patch_colpali_engine() defer to the native LIK
backends. PyLate ≥ 1.5.1 (pylate#222)
and colpali-engine ≥ 0.3.17 (colpali#412)
now ship their own LIK dispatch (pip install "pylate[lik]" / "colpali-engine[lik]", via auto / PYLATE_SCORES_BACKEND / COLPALI_SCORES_BACKEND). On those versions the patches are deprecated
no-ops that detect native support by package version and step aside (patching
PyLate would also break ColBERTScores, which forwards backend=); older
versions are unaffected. The native backends call maxsim / maxsim_pairs
/ maxsim_mps by keyword, so those signatures are now pinned by a test.
benchmarks/ is grouped per comparison stack — kernels/ (incl. the
platform-specific bench_mps.py), plaid/, colpali/, and pylate/, each
e2e bench next to its summarizer. Pure moves: --only tags and JSON output
names are unchanged, so existing results stay comparable. bench_lateon.py
→ kernels/bench_longdoc.py (the value is the long-document regime, Ld up
to 16 384), and the sky_run_all_benchmarks.yamlRUN_ONLY tag lateon → longdoc.
Fixed
patch_pylate() works on PyLate 1.5 again. 1.5 renamed the scoring
module (pylate.scores.scores → pylate.scores.colbert) and rerouted the
contrastive losses through ColBERTScores; the patch now detects the
layout, patches the defining module (covering the loss path), and rewrites
only Distillation's import-time capture on 1.5. The pylate extra's >=1.3.3,<2 range is accurate again — no more 1.3.3 pin.
Removed
The previous e2e training benches (bench_colpali_training.py, bench_colpali_realdata.py, bench_pylate_training.py, bench_pylate_realdata.py, bench_pylate_lateon.py), their shared _bench_common.py, and the sky_colpali_benchmark.yaml / sky_pylate_benchmark.yaml jobs — superseded by the e2e harnesses above
(bench_colpali_loss.py is kept; historical numbers stay in docs/benchmarks.md). Plus four stale one-offs: bench_backward_0_5.py, bench_fastplaid.py, bench_training.py, and the autotune-persistence
reproducer (scripts/_bench_autotune_persistence.py + scripts/sky_bench_autotune_persistence.yaml).