Conversation
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR patches the FP8 experts replacement functionality in the transformers library by disabling the automatic conversion of expert modules during FP8 quantization, while preserving standard linear layer conversion.
Changes:
- Adds a version check utility to determine if transformers >= 5.0.0 is installed
- Introduces a custom FP8 linear replacement function that explicitly disables expert module conversion
- Automatically applies the patch at import time for compatible transformers versions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| auto_round/utils/common.py | Adds utility function to check transformers version against 5.0.0 |
| auto_round/modeling/fp8_quant.py | Implements patched FP8 linear replacement without expert conversion and applies it automatically |
| auto_round/modeling/init.py | Imports fp8_quant module to ensure patch is applied at package initialization |
|
does this pr work for A100 and B200? For 200, transformers will keep the FP8 layer, while for A100, it will dequantize the model to BF16. |
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
I have verified it on A100 and B200; it works on both nodes. Generated Output:
Explain the theory of relativity in simple terms. The theory of relativity, developed by Albert Einstein, is a fundamental concept in physics that explains how |
Thanks, nice work! |
wenhuach21
left a comment
There was a problem hiding this comment.
Another concern is transformers may change its behavior, shall we add try catch for the core code or some others ways to avoid the potential issue
I agree. Currently, only the FineGrainedFP8HfQuantizer is imported when initializing AutoRound; the other imports are inside a try–catch block. |
Resolve the FP8 part of #1248
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting