(bugfix): Clarify CausalLM decode-only flag handling and fix PL=1 specialization naming by vbaddi · Pull Request #1006 · quic/efficient-transformers

vbaddi · 2026-05-22T08:27:13Z

This PR verifies the reported Llama decode-only compile behaviour

Findings:

decode_only=True is currently not supported by QEFFAutoModelForCausalLM.export().
retain_full_kv=True is only applicable to specialized disaggregated-serving models such as GPT-OSS/Kimi, and has no
effect for Llama.
prefill_seq_len=1 with prefill_only=False was incorrectly tagged as Prefill in the generated specialization metadata.

Changes:

Raise a clear NotImplementedError when decode_only=True is passed to CausalLM export.
Warn and ignore retain_full_kv=True for non-specialized models such as Llama.
Tag prefill_seq_len=1 and prefill_only=False specializations as Decode.
Add quickcheck unit coverage for unsupported decode_only, ignored Llama retain_full_kv, and PL=1 decode specialization
naming.

Validation:

Verified PL=1 now emits "name": "Decode" in specialization output.
Added unit test and Ran focused quickcheck tests successfully:
- PYENV_VERSION=qeff pytest -q tests/unit_test/models/test_model_quickcheck.py::TestCausalLMFlagDiagnostics

cc: @quic-hemagnih @quic-rishinr

quic-rishinr

LGTM Thanks!!

…v flags Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

@quic-hemagnih

…cialization naming (quic#1006) This PR verifies the reported Llama decode-only compile behaviour **Findings**: - `decode_only=True` is currently not supported by `QEFFAutoModelForCausalLM.export()`. - `retain_full_kv=True` is only applicable to specialized disaggregated-serving models such as GPT-OSS/Kimi, and has no effect for Llama. - `prefill_seq_len=1` with prefill_only=False was incorrectly tagged as Prefill in the generated specialization metadata. **Changes**: - Raise a clear `NotImplementedError `when `decode_only=True` is passed to CausalLM export. - Warn and ignore `retain_full_kv=True` for non-specialized models such as Llama. - Tag `prefill_seq_len=1 and prefill_only=False` specializations as Decode. - Add quickcheck unit coverage for unsupported decode_only, ignored Llama retain_full_kv, and PL=1 decode specialization naming. **Validation**: - Verified PL=1 now emits "name": "Decode" in specialization output. - Added unit test and Ran focused quickcheck tests successfully: - PYENV_VERSION=qeff pytest -q tests/unit_test/models/test_model_quickcheck.py::TestCausalLMFlagDiagnostics cc: @quic-hemagnih @quic-rishinr Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>

@quic-hemagnih

…cialization naming (quic#1006) This PR verifies the reported Llama decode-only compile behaviour **Findings**: - `decode_only=True` is currently not supported by `QEFFAutoModelForCausalLM.export()`. - `retain_full_kv=True` is only applicable to specialized disaggregated-serving models such as GPT-OSS/Kimi, and has no effect for Llama. - `prefill_seq_len=1` with prefill_only=False was incorrectly tagged as Prefill in the generated specialization metadata. **Changes**: - Raise a clear `NotImplementedError `when `decode_only=True` is passed to CausalLM export. - Warn and ignore `retain_full_kv=True` for non-specialized models such as Llama. - Tag `prefill_seq_len=1 and prefill_only=False` specializations as Decode. - Add quickcheck unit coverage for unsupported decode_only, ignored Llama retain_full_kv, and PL=1 decode specialization naming. **Validation**: - Verified PL=1 now emits "name": "Decode" in specialization output. - Added unit test and Ran focused quickcheck tests successfully: - PYENV_VERSION=qeff pytest -q tests/unit_test/models/test_model_quickcheck.py::TestCausalLMFlagDiagnostics cc: @quic-hemagnih @quic-rishinr Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi self-assigned this May 22, 2026

vbaddi added the bugfix label May 22, 2026

quic-rishinr approved these changes May 22, 2026

View reviewed changes

fix(0522): Clarify unsupported CausalLM decode-only and retain_full_k…

9aa8cf2

…v flags Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

quic-rishinr force-pushed the bugfix/decode-only-artifact-names branch from fd748d0 to 9aa8cf2 Compare May 22, 2026 15:03

quic-rishinr merged commit 2d07b6e into quic:main May 22, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(bugfix): Clarify CausalLM decode-only flag handling and fix PL=1 specialization naming#1006

(bugfix): Clarify CausalLM decode-only flag handling and fix PL=1 specialization naming#1006
quic-rishinr merged 1 commit into
quic:mainfrom
vbaddi:bugfix/decode-only-artifact-names

vbaddi commented May 22, 2026 •

edited

Loading

Uh oh!

quic-rishinr left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-rishinr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vbaddi commented May 22, 2026 •

edited

Loading