Skip to content

Support fp4 blockwise load#96

Merged
Jintao-Huang merged 4 commits into
modelscope:mainfrom
Jintao-Huang:support_fp4_blockwise_load
May 26, 2026
Merged

Support fp4 blockwise load#96
Jintao-Huang merged 4 commits into
modelscope:mainfrom
Jintao-Huang:support_fp4_blockwise_load

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors weight loading in GPTBridge by extracting _set_param and introduces state dict key conversions for DeepSeek-V4. It also adds a utility function fp4_to_fp8 to unpack FP4 tensors. However, the newly added fp4_to_fp8 function and _check_fp4 flag are currently unused in gpt_bridge.py, which will lead to shape mismatch errors when loading packed FP4 tensors. Additionally, creating the LUT tensor on the target device during every call to fp4_to_fp8 introduces unnecessary overhead and should be cached.

Comment thread src/mcore_bridge/bridge/gpt_bridge.py
Comment thread src/mcore_bridge/utils/dequantizer.py
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors parameter setting logic by introducing a helper method _set_param and adds support for FP4-to-FP8 dequantization, specifically for DeepSeek V4 models. The feedback highlights critical improvements to ensure runtime stability and performance: adding a defensive check to prevent an AttributeError when scale_inv is None, caching the LUT tensor in fp4_to_fp8 to avoid redundant host-to-device transfers, and adding checks to prevent division issues during block size calculation.

Comment thread src/mcore_bridge/model/gpts/deepseek_v4.py
Comment thread src/mcore_bridge/utils/dequantizer.py
Comment thread src/mcore_bridge/utils/dequantizer.py
@Jintao-Huang Jintao-Huang merged commit b7fab88 into modelscope:main May 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants