Skip to content

v0.7.0

Choose a tag to compare

@smirnovlad smirnovlad released this 04 May 10:07
6d3a574

What's New

New Features

  • Single-pass native HS capture with prefix-cache fill and multi-step accumulator (#453)
    • Hook-based capture inside VLLMWithUncertainty._raw_generate (replaces the prior re-encode pass)
    • Two-phase _fill_prefix_gaps:
      • Phase 1 (within-group): copies missing prefix from the highest-prefill donor across requests with identical prompts
      • Phase 2 (cross-group): token-level longest-common-prefix against a global donor — handles vLLM's automatic prefix caching across distinct prompts
    • Multi-step accumulator: correct per-request hidden states across iterative TTC strategies (beam search, MUR, phi-decoding) where capture spans multiple generate() calls

Compatibility

  • Downstream packages using VLLMWithUncertainty for hidden-state attribution should pin lm-polygraph>=0.7.0. The HS-capture contract has expanded to include per-request attribution and prefix-cache-aware filling.
  • ThinkBooster's hook_hs_extension v2 requires this release.

Full Changelog: v0.6.0...v0.7.0