Skip to content

Fix deep-gemm alignment : per-group alignment -> output alignment#35

Merged
MasterJH5574 merged 1 commit intomlc-ai:mainfrom
haok1402:0421-fix-deepgemm-alignment
Apr 28, 2026
Merged

Fix deep-gemm alignment : per-group alignment -> output alignment#35
MasterJH5574 merged 1 commit intomlc-ai:mainfrom
haok1402:0421-fix-deepgemm-alignment

Conversation

@haok1402
Copy link
Copy Markdown
Collaborator

Drop redundant 1024-row over-rounding; per-group 128-alignment already covers FP8 grouped-GEMM tile requirements.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies token slicing and removes redundant truncation logic across the codebase. In gpt_oss.py, the manual truncation of input tensors based on group sizes has been removed. In token_scatter.py, the logic for rounding the output token slice to _GEMM_ALLOC_ALIGNMENT was replaced with a direct slice to actual_M, as the group sizes are already guaranteed to be correctly aligned. This adjustment ensures compatibility with DeepGEMM's FP8 kernels, which require the data row count to exactly match the sum of group sizes. I have no feedback to provide as no review comments were submitted.

@MasterJH5574 MasterJH5574 merged commit 23db182 into mlc-ai:main Apr 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants