fix: normalize past_key_values across transformers DynamicCache API (#210) by ousamabenyounes · Pull Request #246 · microsoft/LLMLingua

ousamabenyounes · 2026-04-11T18:53:51Z

What does this PR do?

Fixes #210

llmlingua's iterative_compress_prompt relies on past_key_values being a tuple-of-(k,v) per layer: it slices past_key_values[0][0].shape[2] to read the cached length and iterates for k, v in past_key_values four separate times. Since transformers 4.36 the model returns a DynamicCache instead, which is why issue #210 shows AttributeError: 'list' object has no attribute 'get_seq_length' and why the existing baseline tests have been failing with ValueError: too many values to unpack (expected 2) at iterative_compress_prompt.

This PR normalizes past_key_values to the legacy tuple format after every model forward pass (via DynamicCache.to_legacy_cache() on older transformers, and via DynamicCache.layers on transformers >= 5.0 where to_legacy_cache was removed) and converts it back to a DynamicCache (via DynamicCache.from_legacy_cache() or the ddp_cache_data= constructor arg on newer versions) right before the next forward pass. The rest of the slicing/iteration code in iterative_compress_prompt is untouched.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Was this discussed/approved via a Github issue? Please add a link to it if that's the case — [Question]: error when compressing long prompt with LLMLingua1 & LongLLMLingua #210.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? — tests/test_issue_210.py

Verification

Baseline (before this PR): 2 tests pass, 3 tests fail with ValueError: too many values to unpack (expected 2) in iterative_compress_prompt at llmlingua/prompt_compressor.py:1659.
Post-fix: 13 tests pass (5 existing + 8 new regression tests in tests/test_issue_210.py), 0 regressions.

Tested against transformers 5.5.3 (latest on PyPI), which is what triggered the original issue. The helpers fall back to the legacy to_legacy_cache/from_legacy_cache methods when present, so older transformers releases keep working too.

Generated by Claude Code
Vibe coded by ousamabenyounes

…icrosoft#210) Older llmlingua code assumed past_key_values was a tuple of (k, v) pairs. Transformers >= 4.36 returns a DynamicCache object, breaking the `for k, v in past_key_values` loops and the `past_key_values[0][0].shape[2]` access in iterative_compress_prompt with `AttributeError: 'list' object has no attribute 'get_seq_length'` (as reported in microsoft#210) or the unpacking ValueError that the test suite catches. Normalize to the legacy tuple format after every model forward pass and convert back to DynamicCache before any call that feeds it to the model. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>

This was referenced Apr 11, 2026

fix: compress prompts shorter than iterative_size (#196) #247

Open

fix: dispatch LLMLingua-2 models by loosened name markers (#168) #248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: normalize past_key_values across transformers DynamicCache API (#210)#246

fix: normalize past_key_values across transformers DynamicCache API (#210)#246
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-210

ousamabenyounes commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ousamabenyounes commented Apr 11, 2026

What does this PR do?

Before submitting

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant