Disable replace `FP8Expert` by yiliu30 · Pull Request #1379 · intel/auto-round

yiliu30 · 2026-02-02T01:31:26Z

Resolve the FP8 part of #1248

patch fp8 experts replacement
add fp8 linear

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot

Pull request overview

This PR patches the FP8 experts replacement functionality in the transformers library by disabling the automatic conversion of expert modules during FP8 quantization, while preserving standard linear layer conversion.

Changes:

Adds a version check utility to determine if transformers >= 5.0.0 is installed
Introduces a custom FP8 linear replacement function that explicitly disables expert module conversion
Automatically applies the patch at import time for compatible transformers versions

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
auto_round/utils/common.py	Adds utility function to check transformers version against 5.0.0
auto_round/modeling/fp8_quant.py	Implements patched FP8 linear replacement without expert conversion and applies it automatically
auto_round/modeling/init.py	Imports fp8_quant module to ensure patch is applied at package initialization

auto_round/modeling/fp8_quant.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

wenhuach21 · 2026-02-02T03:10:36Z

does this pr work for A100 and B200? For 200, transformers will keep the FP8 layer, while for A100, it will dequantize the model to BF16.

auto_round/modeling/fp8_quant.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 · 2026-02-03T05:08:45Z

does this pr work for A100 and B200? For 200, transformers will keep the FP8 layer, while for A100, it will dequantize the model to BF16.

I have verified it on A100 and B200; it works on both nodes.

Generated Output:
Explain the theory of relativity in simple terms. The theory of relativity, developed by Albert Einstein, is a fundamental concept in physics that explains how

Signed-off-by: yiliu30 <yi4.liu@intel.com>

wenhuach21 · 2026-02-03T06:15:18Z

does this pr work for A100 and B200? For 200, transformers will keep the FP8 layer, while for A100, it will dequantize the model to BF16.

I have verified it on A100 and B200; it works on both nodes.
Generated Output:
Explain the theory of relativity in simple terms. The theory of relativity, developed by Albert Einstein, is a fundamental concept in physics that explains how

Thanks, nice work!

wenhuach21

Another concern is transformers may change its behavior, shall we add try catch for the core code or some others ways to avoid the potential issue

yiliu30 · 2026-02-03T06:27:10Z

Another concern is transformers may change its behavior, shall we add try catch for the core code or some others ways to avoid the potential issue

I agree. Currently, only the FineGrainedFP8HfQuantizer is imported when initializing AutoRound; the other imports are inside a try–catch block.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 added 2 commits February 2, 2026 01:26

patch fp8 experts replacement

afdb34b

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add fp8 linear

e11b288

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot AI review requested due to automatic review settings February 2, 2026 01:31

yiliu30 changed the title ~~Disable replace FP8Experts~~ Disable replace FP8Expert Feb 2, 2026

Copilot AI reviewed Feb 2, 2026

View reviewed changes

auto_round/modeling/fp8_quant.py Outdated Show resolved Hide resolved

auto_round/modeling/fp8_quant.py Show resolved Hide resolved

yiliu30 added the transformers v5 label Feb 2, 2026

yiliu30 requested review from n1ck-guo, wenhuach21 and xin3he February 2, 2026 01:33

yiliu30 added 2 commits February 2, 2026 02:10

fix typo

b3aece9

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

deb7a4f

Signed-off-by: yiliu30 <yi4.liu@intel.com>

wenhuach21 reviewed Feb 2, 2026

View reviewed changes

auto_round/modeling/fp8_quant.py Outdated Show resolved Hide resolved

yiliu30 added 7 commits February 2, 2026 18:20

patch cuda

aabcc07

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix fp8 on a100

8fe4bbd

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add log

3fc5ab2

Signed-off-by: yiliu30 <yi4.liu@intel.com>

revert

f097a62

Signed-off-by: yiliu30 <yi4.liu@intel.com>

robust patch

8a836d1

Signed-off-by: yiliu30 <yi4.liu@intel.com>

merge main

bbb8465

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into fp8-experts

0924c48

yiliu30 added 2 commits February 3, 2026 14:06

add ut

4a737bd

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into fp8-experts

b3707b7

yiliu30 requested a review from wenhuach21 February 3, 2026 06:09

wenhuach21 approved these changes Feb 3, 2026

View reviewed changes

chensuyue merged commit 137da55 into main Feb 3, 2026
28 of 29 checks passed

chensuyue deleted the fp8-experts branch February 3, 2026 08:08

lvliang-intel pushed a commit that referenced this pull request Feb 4, 2026

Disable replace FP8Expert (#1379)

7a3dcac

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable replace `FP8Expert`#1379

Disable replace `FP8Expert`#1379
chensuyue merged 13 commits intomainfrom
fp8-experts

yiliu30 commented Feb 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Feb 2, 2026

Uh oh!

Uh oh!

yiliu30 commented Feb 3, 2026

Uh oh!

wenhuach21 commented Feb 3, 2026

Uh oh!

wenhuach21 left a comment

Uh oh!

yiliu30 commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yiliu30 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Feb 2, 2026

Uh oh!

Uh oh!

yiliu30 commented Feb 3, 2026

Uh oh!

wenhuach21 commented Feb 3, 2026

Uh oh!

wenhuach21 left a comment

Choose a reason for hiding this comment

Uh oh!

yiliu30 commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiliu30 commented Feb 2, 2026 •

edited

Loading