Skip to content

Conversation

@yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Nov 19, 2025

resolve #938
LLMC format export is not supported for now, as vLLM loading is still a work in progress.
But it can be used on HPU.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 marked this pull request as draft November 19, 2025 06:34
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 marked this pull request as ready for review November 19, 2025 07:12
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 requested a review from Copilot November 20, 2025 06:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds static FP8 attention support to the auto-round library. The implementation enables quantization of attention mechanisms using FP8 format, building on existing KV cache quantization infrastructure.

Key Changes

  • Introduces QuantizedAttentionImpl module to handle FP8 quantized attention operations
  • Refactors shared quantization utilities into a new experimental/utils.py module
  • Updates test suite to validate FP8 attention quantization and correct scale tensor shapes

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/test_cpu/test_export.py Adds test for static FP8 attention, updates model path and corrects expected scale tensor shapes
auto_round/experimental/utils.py New utility module consolidating shared FP8 quantization and module manipulation functions
auto_round/experimental/qmodules/fp8_static.py Preserves original dtype in forward pass for FP8 linear operations
auto_round/experimental/kv_cache.py Refactors utilities to new module, initializes scale parameters, removes debug logging
auto_round/experimental/attention.py New module implementing hooked attention mechanism for FP8 quantization
auto_round/compressors/base.py Adds static_attention_dtype parameter and integrates attention quantization context

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yiliu30 yiliu30 added this to the 0.9.1 milestone Nov 21, 2025
from auto_round.utils import logger


def fp8_per_tensor_qdq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is a bit strange.I suggest changing it to a more easily understood name.

enable_deterministic_algorithms = kwargs.pop("enable_deterministic_algorithms", False)
self.momentum = kwargs.pop("momentum", 0.0)
static_kv_dtype = kwargs.pop("static_kv_dtype", None)
static_attention_dtype = kwargs.pop("static_attention_dtype", None)
Copy link
Contributor

@n1ck-guo n1ck-guo Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does __main__.py also need add this parameter?

@n1ck-guo
Copy link
Contributor

LGTM

Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 enabled auto-merge (squash) November 24, 2025 07:04
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30 yiliu30 merged commit c5b1c41 into main Nov 24, 2025
17 of 22 checks passed
@yiliu30 yiliu30 deleted the quant-attn branch November 24, 2025 07:39
yiliu30 added a commit that referenced this pull request Nov 24, 2025
@yiliu30 yiliu30 restored the quant-attn branch November 24, 2025 08:52
chensuyue pushed a commit that referenced this pull request Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support fp8 sdpa

3 participants