Skip to content

Conversation

Kaihui-intel
Copy link
Contributor

@Kaihui-intel Kaihui-intel commented Oct 11, 2025

User description

Type of Change

documentation

Description

transfer to AutoRound Quant

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed


PR Type

Documentation, Enhancement


Description

  • Updated documentation for AutoRound Quantization API

  • Added example using Hugging Face models

  • Included code snippet for model quantization and inference


Diagram Walkthrough

flowchart LR
  A["MXQuantConfig"] -- "updated to" --> B["AutoRoundConfig"]
  B -- "added example" --> C["Hugging Face models"]
  C -- "included code" --> D["Model quantization and inference"]
Loading

File Walkthrough

Relevant files
Documentation
PT_MXQuant.md
Updated to AutoRound Quantization API                                       

docs/source/3x/PT_MXQuant.md

  • Updated API usage from MXQuant to AutoRound
  • Added example with Hugging Face models
  • Included code for quantization and inference
+30/-6   

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Typo

There is a typo in the comment # quantize the model and save to output_dir. It should be # quantize the model and save to output_dir.

# quantize the model and save to output_dir
Example Link

The link provided in the examples section points to a non-existent path. Ensure the path is correct and the example exists.

- PyTorch [huggingface models](/examples/pytorch/multimodal-modeling/quantization/auto_round/llama4)

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Update API name consistency

Correct the API name to match the description.

docs/source/3x/PT_MXQuant.md [86]

-To get a model quantized with Microscaling Data Types, users can use the Microscaling Quantization API as follows.
+To get a model quantized with AutoRound Data Types, users can use the AutoRound Quantization API as follows.
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly updates the API name to match the description, improving clarity and consistency. However, it does not address a critical issue and offers a minor improvement.

Medium

@thuang6
Copy link
Contributor

thuang6 commented Oct 11, 2025

"It adapts a granularity falling between per-channel and per-tensor to balance accuracy and memory consumption." in introduction section looks not right. block size 32 is normally smaller than channel dimension. @mengniwang95, should we remove this sentence?

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@thuang6
Copy link
Contributor

thuang6 commented Oct 11, 2025

Also formular "The exponent (exp) is equal to torch.floor(torch.log2(amax))" in introduction section is not right. According to recipe document, the formular is: clamp(floor(log2(amax)) - maxExp, -127, 127), Where maxExp is the largest power-of-two representable in the element data type, e.g. for element FP8 E4M3, maxExp is 8, FP4 E2M1, maxExp is 2. @mengniwang95 , please double confirm if it is default option used in auto-round

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@mengniwang95
Copy link
Contributor

"It adapts a granularity falling between per-channel and per-tensor to balance accuracy and memory consumption." in introduction section looks not right. block size 32 is normally smaller than channel dimension. @mengniwang95, should we remove this sentence?

yes, you ar right

@mengniwang95
Copy link
Contributor

Also formular "The exponent (exp) is equal to torch.floor(torch.log2(amax))" in introduction section is not right. According to recipe document, the formular is: clamp(floor(log2(amax)) - maxExp, -127, 127), Where maxExp is the largest power-of-two representable in the element data type, e.g. for element FP8 E4M3, maxExp is 8, FP4 E2M1, maxExp is 2. @mengniwang95 , please double confirm if it is default option used in auto-round

clamp(floor(log2(amax)) - maxExp, -127, 127) is used in autoround

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@thuang6
Copy link
Contributor

thuang6 commented Oct 11, 2025

Also formular "The exponent (exp) is equal to torch.floor(torch.log2(amax))" in introduction section is not right. According to recipe document, the formular is: clamp(floor(log2(amax)) - maxExp, -127, 127), Where maxExp is the largest power-of-two representable in the element data type, e.g. for element FP8 E4M3, maxExp is 8, FP4 E2M1, maxExp is 2. @mengniwang95 , please double confirm if it is default option used in auto-round

clamp(floor(log2(amax)) - maxExp, -127, 127) is used in autoround

@Kaihui-intel , please help to update formular as well

@Kaihui-intel
Copy link
Contributor Author

Also formular "The exponent (exp) is equal to torch.floor(torch.log2(amax))" in introduction section is not right. According to recipe document, the formular is: clamp(floor(log2(amax)) - maxExp, -127, 127), Where maxExp is the largest power-of-two representable in the element data type, e.g. for element FP8 E4M3, maxExp is 8, FP4 E2M1, maxExp is 2. @mengniwang95 , please double confirm if it is default option used in auto-round

clamp(floor(log2(amax)) - maxExp, -127, 127) is used in autoround

@Kaihui-intel , please help to update formular as well

2d9c95d

@Kaihui-intel Kaihui-intel merged commit e36230e into master Oct 13, 2025
11 checks passed
@Kaihui-intel Kaihui-intel deleted the kaihui/mx_doc branch October 13, 2025 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants