Skip to content

updating Subfn, Prefill only logic in Disagg mode#820

Merged
ochougul merged 5 commits intoquic:qwen3_vl_mainlinefrom
tv-karthikeya:qwen_disagg_prefill
Mar 5, 2026
Merged

updating Subfn, Prefill only logic in Disagg mode#820
ochougul merged 5 commits intoquic:qwen3_vl_mainlinefrom
tv-karthikeya:qwen_disagg_prefill

Conversation

@tv-karthikeya
Copy link
Contributor

@tv-karthikeya tv-karthikeya commented Mar 2, 2026

Added Support for Subfn for Qwen 3 VL dense, MOE.
Updated prefill only logic for disagg mode

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
@tv-karthikeya tv-karthikeya marked this pull request as ready for review March 2, 2026 08:22
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
@tv-karthikeya tv-karthikeya changed the title updating Prefill only logic in Disagg mode updating Subfn, Prefill only logic in Disagg mode Mar 2, 2026
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Path to the generated ONNX graph file for the language decoder.
"""
if prefill_only:
if prefill_only is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rewrite as

        if prefill_only:
            assert prefill_seq_len>1
            if not enable_chunking and self.continuous_batching:
                raise NotImplementedError(
                    "Looks like you are trying to run prefix-caching without chunking, this feature is not available yet!"
                )
            self.hash_params["prefill_only"] = True
            self.prefill(enable=True, enable_chunking=enable_chunking)
        else:
            self.hash_params["prefill_only"] = False
            self.prefill(False, retain_full_kv=kwargs.get("retain_full_kv", False))

Comment on lines +1498 to +1499
or prefill_only
or prefill_seq_len == 1 # to export for prefill and decode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove both lines and try to use

        if (
            vision_onnx_path is None
            or lang_onnx_path is None
        ):

**compiler_options,
)
if skip_vision and prefill_only: # for disagg serving
if skip_vision and (prefill_only or prefill_seq_len == 1): # for disagg serving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

self,
export_dir: Optional[str] = None,
prefill_only: Optional[bool] = False,
prefill_only: Optional[bool] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
@ochougul ochougul merged commit 8bb1e3e into quic:qwen3_vl_mainline Mar 5, 2026
3 checks passed
qcdipankar pushed a commit that referenced this pull request Mar 10, 2026
Added Support for Subfn for Qwen 3 VL dense, MOE.
Updated prefill only logic for disagg mode

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
qcdipankar pushed a commit that referenced this pull request Mar 11, 2026
Added Support for Subfn for Qwen 3 VL dense, MOE.
Updated prefill only logic for disagg mode

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants