Fix vLLM v0.19 MLA merge validation and CacheOnly KV cache registration by yubofredwang · Pull Request #76 · lightseekorg/TorchSpec

yubofredwang · 2026-04-15T06:38:15Z

Summary

MLAAttentionSpec.merge() dimension validation: Without this fix, CacheOnlyAttentionLayer (used by extract_hidden_states) gets incorrectly merged with MLA attention layers into a single KV cache group, causing the CacheOnly KV cache to be reshaped with MLA dimensions (1 head, 576 dim) instead of hidden-state dimensions (num_aux_layers heads, hidden_size dim).
Register _CacheOnlyKVCacheSpec in spec_manager_map: extract_hidden_states uses _CacheOnlyKVCacheSpec (a subclass of AttentionSpec) which is not in vLLM's spec_manager_map, causing a KeyError during engine init. Routes it to FullAttentionManager.
Integration test improvements: Update defaults to Kimi-K2.5 model, add CLI flags for --load-format, --enforce-eager, and --max-model-len.

Test plan

Run vLLM engine integration test with Kimi-K2.5 and verify engine init succeeds without KeyError
Verify MLA and CacheOnly layers are placed in separate KV cache groups

Add two new patches to the vLLM v0.19.0 patch set: - MLAAttentionSpec.merge() now validates that all fields match, preventing CacheOnlyAttentionLayer from being silently merged with MLA layers into a single KV cache group with wrong dimensions. - Register _CacheOnlyKVCacheSpec in spec_manager_map so extract_hidden_states doesn't hit a KeyError during engine init. Update integration test to default to Kimi-K2.5, add CLI flags for --load-format, --enforce-eager, and --max-model-len.

Copilot

Pull request overview

This PR updates the vLLM v0.19 patchset to prevent incorrect KV-cache grouping between MLA attention layers and CacheOnly attention (used by extract_hidden_states), and to ensure the CacheOnly KV-cache spec is properly registered during engine initialization. It also extends the vLLM integration test CLI to better match the intended model/config defaults and allow additional runtime configuration.

Changes:

Add stricter merge-time validation for MLAAttentionSpec.merge() to prevent merging incompatible attention specs into the same KV cache group.
Register _CacheOnlyKVCacheSpec in vLLM’s KV cache manager spec_manager_map to avoid KeyError at engine init.
Enhance the vLLM engine integration test script with new CLI flags (--load-format, --[no-]enforce-eager, --max-model-len) and update default model/TP.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/test_vllm_engine_integration.py	Adds CLI/config plumbing for `load_format`, eager mode, and max model length; updates defaults for the integration test script.
patches/vllm/v0.19.0/vllm.patch	Extends the vLLM v0.19.0 patchset to validate MLA merge compatibility and register CacheOnly KV-cache spec to the appropriate manager.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 15, 2026 06:38

Copilot started reviewing on behalf of yubofredwang April 15, 2026 06:38 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread tests/test_vllm_engine_integration.py

yubofredwang merged commit 59d58e4 into main Apr 15, 2026
5 checks passed

yubofredwang deleted the fix/vllm-mla-merge-and-cache-only-spec branch April 15, 2026 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vLLM v0.19 MLA merge validation and CacheOnly KV cache registration#76

Fix vLLM v0.19 MLA merge validation and CacheOnly KV cache registration#76
yubofredwang merged 1 commit intomainfrom
fix/vllm-mla-merge-and-cache-only-spec

yubofredwang commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yubofredwang commented Apr 15, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants