Skip to content

(bugfix): Clarify CausalLM decode-only flag handling and fix PL=1 specialization naming#1006

Merged
quic-rishinr merged 1 commit into
quic:mainfrom
vbaddi:bugfix/decode-only-artifact-names
May 22, 2026
Merged

(bugfix): Clarify CausalLM decode-only flag handling and fix PL=1 specialization naming#1006
quic-rishinr merged 1 commit into
quic:mainfrom
vbaddi:bugfix/decode-only-artifact-names

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented May 22, 2026

This PR verifies the reported Llama decode-only compile behaviour

Findings:

  • decode_only=True is currently not supported by QEFFAutoModelForCausalLM.export().
  • retain_full_kv=True is only applicable to specialized disaggregated-serving models such as GPT-OSS/Kimi, and has no
    effect for Llama.
  • prefill_seq_len=1 with prefill_only=False was incorrectly tagged as Prefill in the generated specialization metadata.

Changes:

  • Raise a clear NotImplementedError when decode_only=True is passed to CausalLM export.
  • Warn and ignore retain_full_kv=True for non-specialized models such as Llama.
  • Tag prefill_seq_len=1 and prefill_only=False specializations as Decode.
  • Add quickcheck unit coverage for unsupported decode_only, ignored Llama retain_full_kv, and PL=1 decode specialization
    naming.

Validation:

  • Verified PL=1 now emits "name": "Decode" in specialization output.
  • Added unit test and Ran focused quickcheck tests successfully:
    • PYENV_VERSION=qeff pytest -q tests/unit_test/models/test_model_quickcheck.py::TestCausalLMFlagDiagnostics

cc: @quic-hemagnih @quic-rishinr

@vbaddi vbaddi self-assigned this May 22, 2026
@vbaddi vbaddi added the bugfix label May 22, 2026
Copy link
Copy Markdown
Contributor

@quic-rishinr quic-rishinr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks!!

…v flags

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@quic-rishinr quic-rishinr force-pushed the bugfix/decode-only-artifact-names branch from fd748d0 to 9aa8cf2 Compare May 22, 2026 15:03
@quic-rishinr quic-rishinr merged commit 2d07b6e into quic:main May 22, 2026
5 checks passed
quic-mohmeh pushed a commit to quic-mohmeh/efficient-transformers that referenced this pull request May 25, 2026
…cialization naming (quic#1006)

This PR verifies the reported Llama decode-only compile behaviour

**Findings**:

- `decode_only=True` is currently not supported by
`QEFFAutoModelForCausalLM.export()`.
- `retain_full_kv=True` is only applicable to specialized
disaggregated-serving models such as GPT-OSS/Kimi, and has no
    effect for Llama.
- `prefill_seq_len=1` with prefill_only=False was incorrectly tagged as
Prefill in the generated specialization metadata.

**Changes**:

- Raise a clear `NotImplementedError `when `decode_only=True` is passed
to CausalLM export.
- Warn and ignore `retain_full_kv=True` for non-specialized models such
as Llama.
- Tag `prefill_seq_len=1 and prefill_only=False` specializations as
Decode.
- Add quickcheck unit coverage for unsupported decode_only, ignored
Llama retain_full_kv, and PL=1 decode specialization
    naming.

**Validation**:

  - Verified PL=1 now emits "name": "Decode" in specialization output.
  - Added unit test and Ran focused quickcheck tests successfully:
- PYENV_VERSION=qeff pytest -q
tests/unit_test/models/test_model_quickcheck.py::TestCausalLMFlagDiagnostics

cc: @quic-hemagnih @quic-rishinr

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Mohit Mehta <mohmeh@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request May 25, 2026
…cialization naming (quic#1006)

This PR verifies the reported Llama decode-only compile behaviour

**Findings**:

- `decode_only=True` is currently not supported by
`QEFFAutoModelForCausalLM.export()`.
- `retain_full_kv=True` is only applicable to specialized
disaggregated-serving models such as GPT-OSS/Kimi, and has no
    effect for Llama.
- `prefill_seq_len=1` with prefill_only=False was incorrectly tagged as
Prefill in the generated specialization metadata.

**Changes**:

- Raise a clear `NotImplementedError `when `decode_only=True` is passed
to CausalLM export.
- Warn and ignore `retain_full_kv=True` for non-specialized models such
as Llama.
- Tag `prefill_seq_len=1 and prefill_only=False` specializations as
Decode.
- Add quickcheck unit coverage for unsupported decode_only, ignored
Llama retain_full_kv, and PL=1 decode specialization
    naming.

**Validation**:

  - Verified PL=1 now emits "name": "Decode" in specialization output.
  - Added unit test and Ran focused quickcheck tests successfully:
- PYENV_VERSION=qeff pytest -q
tests/unit_test/models/test_model_quickcheck.py::TestCausalLMFlagDiagnostics

cc: @quic-hemagnih @quic-rishinr

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants