Add Glm4MoeForCausalLM Support by quic-shagun · Pull Request #619 · quic/efficient-transformers

quic-shagun · 2025-11-14T09:40:52Z

This PR adds support for zai-org/GLM-4.5-Air model.
Open source MoE model with performance and accuracy better than many closed source models:

quic-rishinr · 2025-11-18T09:16:20Z

@shagsood do we have approval for this model? also do add this model under validated model list

quic-rishinr · 2025-11-18T09:17:25Z

@vbaddi can you please review this PR?

quic-sgunnala · 2025-11-18T22:29:43Z

@shagsood do we have approval for this model? also do add this model under validated model list

Yes we have legal approval for this model.

vbaddi · 2025-11-19T17:31:18Z

+
+class QEffGlm4MoeMoE(Glm4MoeMoE):
+    """
+    MoE Block


nit: We can start using our optimized moe block for prefill/decode usecase here?

Is there a specific PR I need to refer for this?

asmigosw · 2025-12-05T06:19:29Z

+            key_states,
+            value_states,
+            attention_mask,
+            dropout=0.0 if not self.training else self.attention_dropout,


Can we remove this dropout, since we are not using it in eager_attention_forward

asmigosw · 2025-12-05T06:19:54Z

+    value: torch.Tensor,
+    attention_mask: Optional[torch.Tensor],
+    scaling: float,
+    dropout: float = 0.0,


Please remove, since its not used.

vbaddi · 2026-01-27T08:33:10Z

+        topk_weights = router_scores.gather(1, topk_indices)  # [T, 8]
+
+        if self.norm_topk_prob:
+            topk_weights = topk_weights / (topk_weights.sum(dim=-1, keepdim=True) + 1e-20)


nit: is subfunction verified? I feel this. sum() needs to be replaced w/einsum.

anujgupt-github · 2026-04-30T10:00:13Z

@shagsood can you rebase and bring this to main branch?

Signed-off-by: shagsood <shagsood@qti.qualcomm.com>

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

quic-rishinr · 2026-05-08T07:27:43Z

@asmigosw @mamtsing please review the PR

quic-rishinr · 2026-05-08T07:28:48Z

@ochougul please fix the Lint DCO and other failures

quic-hemagnih · 2026-05-13T08:00:37Z

@ochougul Can you please fix the LINT errors

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

quic-rishinr · 2026-05-15T10:08:05Z

New PR is raised with the following features. Will close this PR and will proceed with PR #988
• GLM4-MOE prefill/decode path
• Chunked prefill MoE path with packed expert dispatch
• KV-blocked attention path, headpar_offline to be default for all blocking combinations.
• Disaggregated prefill/decode serving example
• ONNX subfunction export for decode and prefill

quic-shagun requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners November 14, 2025 09:40

quic-rishinr requested a review from vbaddi November 18, 2025 09:17

quic-rishinr requested a review from asmigosw November 19, 2025 10:33

vbaddi requested changes Nov 19, 2025

View reviewed changes

asmigosw suggested changes Dec 5, 2025

View reviewed changes

ochougul force-pushed the glm_air branch from 53832b0 to b687450 Compare January 5, 2026 08:59

vbaddi requested changes Jan 27, 2026

View reviewed changes

shagsood and others added 12 commits May 7, 2026 15:40

Onbaord GLM model

09a1d02

Signed-off-by: shagsood <shagsood@qti.qualcomm.com>

Fix modeling file issue

58d725a

Signed-off-by: shagsood <shagsood@qti.qualcomm.com>

Fix modeling file issue

40c6944

Signed-off-by: shagsood <shagsood@qti.qualcomm.com>

Add Glm4MoeForCausalLM support

9559e80

Signed-off-by: shagsood <shagsood@qti.qualcomm.com>

added MOE differentiation for prefill/decode

48075af

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

added example and verified MOE

5a84254

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

added ffn, KV blocking

b7c8893

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

fixed

1ab3fe3

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

changes for export to float16

bd1b02d

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

removed cliptransform and added simple router for glm

5b1e898

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

removing reducesum by einsum

c776603

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

decode only 24 KV heads

6e2d2ea

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

ochougul force-pushed the glm_air branch from 96c4192 to 6e2d2ea Compare May 7, 2026 10:20

quic-rishinr requested review from asmigosw and vbaddi May 8, 2026 07:27

quic-rishinr requested review from quic-mamta and removed request for ochougul, quic-amitraj, quic-hemagnih and quic-rishinr May 8, 2026 07:27

Merge branch 'main' into glm_air

ad85236

fixed linter errors

1f3da04

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

quic-rishinr closed this May 15, 2026

Conversation

quic-shagun commented Nov 14, 2025

Uh oh!

quic-rishinr commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quic-rishinr commented Nov 18, 2025

Uh oh!

quic-sgunnala commented Nov 18, 2025

Uh oh!

vbaddi Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

quic-shagun Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

asmigosw Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

asmigosw Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

vbaddi Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

anujgupt-github commented Apr 30, 2026

Uh oh!

quic-rishinr commented May 8, 2026

Uh oh!

quic-rishinr commented May 8, 2026

Uh oh!

quic-hemagnih commented May 13, 2026

Uh oh!

quic-rishinr commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

quic-rishinr commented Nov 18, 2025 •

edited

Loading