Add static FP8 attention support #1045

yiliu30 · 2025-11-19T06:18:34Z

resolve #938
LLMC format export is not supported for now, as vLLM loading is still a work in progress.
But it can be used on HPU.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

… quant-attn

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot

Pull Request Overview

This PR adds static FP8 attention support to the auto-round library. The implementation enables quantization of attention mechanisms using FP8 format, building on existing KV cache quantization infrastructure.

Key Changes

Introduces QuantizedAttentionImpl module to handle FP8 quantized attention operations
Refactors shared quantization utilities into a new experimental/utils.py module
Updates test suite to validate FP8 attention quantization and correct scale tensor shapes

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
test/test_cpu/test_export.py	Adds test for static FP8 attention, updates model path and corrects expected scale tensor shapes
auto_round/experimental/utils.py	New utility module consolidating shared FP8 quantization and module manipulation functions
auto_round/experimental/qmodules/fp8_static.py	Preserves original dtype in forward pass for FP8 linear operations
auto_round/experimental/kv_cache.py	Refactors utilities to new module, initializes scale parameters, removes debug logging
auto_round/experimental/attention.py	New module implementing hooked attention mechanism for FP8 quantization
auto_round/compressors/base.py	Adds static_attention_dtype parameter and integrates attention quantization context

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

auto_round/experimental/attention.py

auto_round/experimental/kv_cache.py

auto_round/experimental/attention.py

auto_round/experimental/utils.py

auto_round/experimental/attention.py

n1ck-guo · 2025-11-24T05:12:38Z

auto_round/experimental/utils.py

+from auto_round.utils import logger
+
+
+def fp8_per_tensor_qdq(


The function name is a bit strange.I suggest changing it to a more easily understood name.

n1ck-guo · 2025-11-24T05:15:48Z

auto_round/compressors/base.py

        enable_deterministic_algorithms = kwargs.pop("enable_deterministic_algorithms", False)
        self.momentum = kwargs.pop("momentum", 0.0)
        static_kv_dtype = kwargs.pop("static_kv_dtype", None)
+        static_attention_dtype = kwargs.pop("static_attention_dtype", None)


Does __main__.py also need add this parameter?

n1ck-guo · 2025-11-24T05:16:47Z

LGTM

Signed-off-by: yiliu30 <yi4.liu@intel.com>

This reverts commit c5b1c41.

yiliu30 added 9 commits October 21, 2025 01:23

add attention quant

46749f0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add ut

f743ffb

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add llama patch

a81b514

Signed-off-by: yiliu30 <yi4.liu@intel.com>

correct fp8

157f6d1

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add utils

586462f

Signed-off-by: yiliu30 <yi4.liu@intel.com>

merge main

591549b

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix shape

65a467e

Signed-off-by: yiliu30 <yi4.liu@intel.com>

tmp

da1fe7f

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

4f3b0a3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 marked this pull request as draft November 19, 2025 06:34

yiliu30 added 2 commits November 19, 2025 14:58

Merge branch 'main' into quant-attn

ae3a4aa

add ut

ceca38a

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 marked this pull request as ready for review November 19, 2025 07:12

yiliu30 added 4 commits November 19, 2025 02:13

clean

a49c09b

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'quant-attn' of https://github.com/intel/auto-round into…

90bf465

… quant-attn

fix

adc5cb3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine

a61bd65

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 requested review from n1ck-guo and wenhuach21 November 19, 2025 07:18

yiliu30 added 7 commits November 19, 2025 02:18

clean

c4bfce0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

478eef0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

5ed5f72

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

53f6ae8

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

741f818

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix alias tensor

ae6cec5

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix ut

ffa5ac5

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 requested a review from Copilot November 20, 2025 06:36

Copilot AI reviewed Nov 20, 2025

View reviewed changes

Merge branch 'main' into quant-attn

c7d72d5

yiliu30 added this to the 0.9.1 milestone Nov 21, 2025

Merge branch 'main' into quant-attn

641089d

n1ck-guo reviewed Nov 24, 2025

View reviewed changes

n1ck-guo approved these changes Nov 24, 2025

View reviewed changes

yiliu30 added 2 commits November 24, 2025 01:29

Merge branch 'main' into quant-attn

61ca489

update

b698ec4

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 enabled auto-merge (squash) November 24, 2025 07:04

fix

3b36353

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 merged commit c5b1c41 into main Nov 24, 2025
17 of 22 checks passed

yiliu30 deleted the quant-attn branch November 24, 2025 07:39

yiliu30 added a commit that referenced this pull request Nov 24, 2025

Revert "Add static FP8 attention support (#1045)"

861f9bb

This reverts commit c5b1c41.

yiliu30 mentioned this pull request Nov 24, 2025

Revert "Add static FP8 attention support" #1060

Merged

yiliu30 restored the quant-attn branch November 24, 2025 08:52

chensuyue pushed a commit that referenced this pull request Nov 24, 2025

Revert "Add static FP8 attention support (#1045)" (#1060)

6d40e20

This reverts commit c5b1c41.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add static FP8 attention support #1045

Add static FP8 attention support #1045

yiliu30 commented Nov 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo Nov 24, 2025

Uh oh!

n1ck-guo Nov 24, 2025 •

edited

Loading

Uh oh!

n1ck-guo commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add static FP8 attention support #1045

Add static FP8 attention support #1045

Conversation

yiliu30 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n1ck-guo Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

n1ck-guo Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n1ck-guo commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiliu30 commented Nov 19, 2025 •

edited

Loading

n1ck-guo Nov 24, 2025 •

edited

Loading