fix(test): apply aux-layer id fix to integration test (follow-up to #89)#90
Conversation
`tests/test_vllm_engine_integration.py` is a script-style runner (uses `main()` with argparse, not pytest-collected) that exercises the full extract_hidden_states + MooncakeHiddenStatesConnector path. It duplicated the same off-by-one fixed in #89 in two places: - `get_aux_layer_ids` built `[1, num_layers // 2 - 1, num_layers - 4]` in TorchSpec post-layer semantics but appended `num_layers - 1` for the final-layer slot, missing the +1 shift entirely. - `main()` did the same when the user overrode aux layers via CLI. Without this fix, any end-to-end run of the integration script would have captured the wrong final-layer slot too, hiding the issue from manual validation. This commit translates both call sites to the corrected convention: TorchSpec post-layer ids are shifted +1 to match vllm's capture-after-layer hook, and `num_hidden_layers` (the pre-`norm` slot) is appended for `last_hidden_states`. Behavior now matches the fixed `VllmEngine.init`. Follow-up to #89. Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
There was a problem hiding this comment.
Pull request overview
Updates the integration runner to match vLLM’s capture-after-layer indexing so last_hidden_states resolves to the pre-norm slot (follow-up to #89).
Changes:
- Shift default Eagle3 TorchSpec post-layer IDs by
+1to match vLLM capture indices. - When
--aux-layersis provided, shift user-provided post-layer IDs by+1, filter out-of-range, and appendnum_hidden_layersforlast_hidden_states. - Expand
get_aux_layer_idsdocstring to document the indexing convention and final-slot behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def get_aux_layer_ids(model_path: str) -> list[int]: | ||
| """Replicate VllmEngine's aux layer resolution: default Eagle3 layers + final layer.""" | ||
| """Replicate VllmEngine's aux layer resolution: default Eagle3 layers + final layer. | ||
|
|
||
| See ``VllmEngine.init`` for the canonical implementation. vllm's hidden | ||
| state hook captures index ``layer_idx + 1`` after each layer runs, so | ||
| valid capture indices are ``[0, num_hidden_layers]``; index | ||
| ``num_hidden_layers`` is the pre-``norm`` slot used as | ||
| ``last_hidden_states`` for target logit computation. | ||
| """ | ||
| cfg = AutoConfig.from_pretrained(model_path, trust_remote_code=True) | ||
| cfg = getattr(cfg, "text_config", cfg) | ||
| num_layers = cfg.num_hidden_layers | ||
| # Default Eagle3 aux layers | ||
| aux_ids = [1, num_layers // 2 - 1, num_layers - 4] | ||
| # VllmEngine appends the final layer for last_hidden_states capture | ||
| final_layer = num_layers - 1 | ||
| if final_layer not in aux_ids: | ||
| aux_ids.append(final_layer) | ||
| # TorchSpec post-layer ids for default Eagle3 aux layers, shifted +1 to | ||
| # match vllm's capture-after-layer convention. | ||
| aux_ids = [lid + 1 for lid in [1, num_layers // 2 - 1, num_layers - 4]] | ||
| # Append the post-last-layer / pre-norm slot for last_hidden_states capture. | ||
| if num_layers not in aux_ids: | ||
| aux_ids.append(num_layers) | ||
| return aux_ids, cfg.hidden_size, num_layers |
There was a problem hiding this comment.
get_aux_layer_ids is annotated as returning list[int], but it returns a 3-tuple (aux_ids, hidden_size, num_layers) (and is unpacked that way in main). Update the return type annotation (and ideally the docstring summary) to match the actual return value to avoid misleading type usage.
| # User passes TorchSpec post-layer ids; shift +1 to vllm's | ||
| # capture-after-layer convention and append the pre-norm slot. | ||
| aux_layer_ids = [lid + 1 for lid in args.aux_layers if lid < num_layers] | ||
| if num_layers not in aux_layer_ids: | ||
| aux_layer_ids.append(num_layers) |
There was a problem hiding this comment.
The CLI help for --aux-layers still reads as generic “aux layer IDs”, but this block now explicitly interprets them as TorchSpec post-layer IDs and shifts them by +1. Please update the argparse help= text to state the expected convention (post-layer IDs) and that the script auto-appends the pre-norm slot (num_hidden_layers) so users don’t pass already-shifted vLLM capture indices by mistake.
Summary
Follow-up to #89. The integration runner
tests/test_vllm_engine_integration.py(script-style, not pytest-collected) duplicated the same off-by-one in two places:get_aux_layer_idsbuilt post-layer ids[1, num_layers // 2 - 1, num_layers - 4]but appendednum_layers - 1for the final slot, never applying the+1shift.main()did the same when--aux-layerswas passed.Without this fix, any manual end-to-end run of the integration script would have captured the wrong final-layer slot too, masking the issue from spot validation.
Changes
Both call sites now translate TorchSpec post-layer ids to vllm's capture-after-layer convention via the
+1shift, and appendnum_hidden_layers(the pre-normslot) forlast_hidden_states. Behavior matches the fixedVllmEngine.initfrom #89.Test plan
last_hidden_statesmatchesoutputs.hidden_states[-1]from a plain HF forward (currently the script only dumps shapes / norms / stats, no reference comparison). Out of scope for this PR but worth scheduling.Refs: #87, #89