Support fp4 blockwise load#96
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors weight loading in GPTBridge by extracting _set_param and introduces state dict key conversions for DeepSeek-V4. It also adds a utility function fp4_to_fp8 to unpack FP4 tensors. However, the newly added fp4_to_fp8 function and _check_fp4 flag are currently unused in gpt_bridge.py, which will lead to shape mismatch errors when loading packed FP4 tensors. Additionally, creating the LUT tensor on the target device during every call to fp4_to_fp8 introduces unnecessary overhead and should be cached.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors parameter setting logic by introducing a helper method _set_param and adds support for FP4-to-FP8 dequantization, specifically for DeepSeek V4 models. The feedback highlights critical improvements to ensure runtime stability and performance: adding a defensive check to prevent an AttributeError when scale_inv is None, caching the LUT tensor in fp4_to_fp8 to avoid redundant host-to-device transfers, and adding checks to prevent division issues during block size calculation.
No description provided.