Skip to content

v0.4.2: Native LIK in PyLate and colpali-engine

Choose a tag to compare

@tonywu71 tonywu71 released this 09 Jun 13:29
· 4 commits to main since this release
0a89a6b

Added

  • Real-recipe e2e training benchmarks for ColQwen2 and PyLate
    (bench_colpali_e2e.py, bench_pylate_e2e.py). Both instrument the loss
    head to record per-MaxSim-call VRAM in-train, replay each recorded shape on
    an isolated graph (exact forward/saved/backward brackets), and treat OOM as
    a recorded sweep outcome rather than a crash; --variant vanilla|lik
    toggles the patch, with summarize_*_e2e.py + scripts/sky_*_e2e.yaml
    driving the sweep (fresh process per cell). Measured on 1×H100 80 GB:
    ColQwen2's MaxSim op costs 7.81 GiB vanilla vs 61 MiB with LIK at B=128
    (~130×), step time at parity, and vanilla OOMs at B=128 (a 1.81 GiB request
    with 25 GiB reserved-but-unallocated) where LIK trains it — 2× batch
    headroom; PyLate (grad-ckpt regime) drops step peak 54.1 → 29.7 GiB at
    B=512, runs 1.07–1.12× faster per step, and trains B=1024 where vanilla
    OOMs. The ColQwen2 bench targets released colpali-engine 0.3.16 and shims
    its two ContrastiveTrainer bugs under transformers 5.x (fixed upstream in
    colpali#412, unreleased).
    Tables in docs/benchmarks.md.

Changed

  • patch_pylate() / patch_colpali_engine() defer to the native LIK
    backends.
    PyLate ≥ 1.5.1 (pylate#222)
    and colpali-engine ≥ 0.3.17 (colpali#412)
    now ship their own LIK dispatch (pip install "pylate[lik]" /
    "colpali-engine[lik]", via auto / PYLATE_SCORES_BACKEND /
    COLPALI_SCORES_BACKEND). On those versions the patches are deprecated
    no-ops that detect native support by package version and step aside (patching
    PyLate would also break ColBERTScores, which forwards backend=); older
    versions are unaffected. The native backends call maxsim / maxsim_pairs
    / maxsim_mps by keyword, so those signatures are now pinned by a test.
  • benchmarks/ is grouped per comparison stackkernels/ (incl. the
    platform-specific bench_mps.py), plaid/, colpali/, and pylate/, each
    e2e bench next to its summarizer. Pure moves: --only tags and JSON output
    names are unchanged, so existing results stay comparable. bench_lateon.py
    kernels/bench_longdoc.py (the value is the long-document regime, Ld up
    to 16 384), and the sky_run_all_benchmarks.yaml RUN_ONLY tag lateon
    longdoc.

Fixed

  • patch_pylate() works on PyLate 1.5 again. 1.5 renamed the scoring
    module (pylate.scores.scorespylate.scores.colbert) and rerouted the
    contrastive losses through ColBERTScores; the patch now detects the
    layout, patches the defining module (covering the loss path), and rewrites
    only Distillation's import-time capture on 1.5. The pylate extra's
    >=1.3.3,<2 range is accurate again — no more 1.3.3 pin.

Removed

  • The previous e2e training benches (bench_colpali_training.py,
    bench_colpali_realdata.py, bench_pylate_training.py,
    bench_pylate_realdata.py, bench_pylate_lateon.py), their shared
    _bench_common.py, and the sky_colpali_benchmark.yaml /
    sky_pylate_benchmark.yaml jobs — superseded by the e2e harnesses above
    (bench_colpali_loss.py is kept; historical numbers stay in
    docs/benchmarks.md). Plus four stale one-offs: bench_backward_0_5.py,
    bench_fastplaid.py, bench_training.py, and the autotune-persistence
    reproducer (scripts/_bench_autotune_persistence.py +
    scripts/sky_bench_autotune_persistence.yaml).