Skip to content

fix: normalize past_key_values across transformers DynamicCache API (#210)#246

Open
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-210
Open

fix: normalize past_key_values across transformers DynamicCache API (#210)#246
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-210

Conversation

@ousamabenyounes
Copy link
Copy Markdown

What does this PR do?

Fixes #210

llmlingua's iterative_compress_prompt relies on past_key_values being a tuple-of-(k,v) per layer: it slices past_key_values[0][0].shape[2] to read the cached length and iterates for k, v in past_key_values four separate times. Since transformers 4.36 the model returns a DynamicCache instead, which is why issue #210 shows AttributeError: 'list' object has no attribute 'get_seq_length' and why the existing baseline tests have been failing with ValueError: too many values to unpack (expected 2) at iterative_compress_prompt.

This PR normalizes past_key_values to the legacy tuple format after every model forward pass (via DynamicCache.to_legacy_cache() on older transformers, and via DynamicCache.layers on transformers >= 5.0 where to_legacy_cache was removed) and converts it back to a DynamicCache (via DynamicCache.from_legacy_cache() or the ddp_cache_data= constructor arg on newer versions) right before the next forward pass. The rest of the slicing/iteration code in iterative_compress_prompt is untouched.

Before submitting

Verification

  • Baseline (before this PR): 2 tests pass, 3 tests fail with ValueError: too many values to unpack (expected 2) in iterative_compress_prompt at llmlingua/prompt_compressor.py:1659.
  • Post-fix: 13 tests pass (5 existing + 8 new regression tests in tests/test_issue_210.py), 0 regressions.

Tested against transformers 5.5.3 (latest on PyPI), which is what triggered the original issue. The helpers fall back to the legacy to_legacy_cache/from_legacy_cache methods when present, so older transformers releases keep working too.

Generated by Claude Code
Vibe coded by ousamabenyounes

…icrosoft#210)

Older llmlingua code assumed past_key_values was a tuple of (k, v) pairs.
Transformers >= 4.36 returns a DynamicCache object, breaking the
`for k, v in past_key_values` loops and the `past_key_values[0][0].shape[2]`
access in iterative_compress_prompt with `AttributeError: 'list' object has
no attribute 'get_seq_length'` (as reported in microsoft#210) or the unpacking
ValueError that the test suite catches.

Normalize to the legacy tuple format after every model forward pass and
convert back to DynamicCache before any call that feeds it to the model.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question]: error when compressing long prompt with LLMLingua1 & LongLLMLingua

1 participant