Skip to content

[Bug] several patched models active all experts in forward #1369

@wenhuach21

Description

@wenhuach21

Problem Description

Qwen3 VL/gpt oss/llama4.
As we apply the same patch in inference, this leads to very slow speed at decoding phase

Reproduction Steps

~

Environment Information

No response

Error Logs

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions