fix packing nvfp/mxfp max_wokers & extend xpu ut#1555
Merged
Conversation
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes NVFP/MXFP packing thread worker selection and expands XPU unit tests to cover more quantization schemes and model types.
Changes:
- Fix
max_workersselection logic during NVFP/MXFP packing to avoid problematic concurrency on CUDA. - Update XPU tests to load quantized models with
device_map="xpu"instead of"auto". - Add new XPU tests for multiple schemes, VLM quantization/inference, and
lm_headquantization.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
test/test_xpu/test_autoround.py |
Extends XPU tests (schemes/VLM/lm_head) and adjusts device_map usage when reloading quantized models. |
auto_round/export/export_to_autoround/export_to_nvfp_mxfp.py |
Fixes the max_workers condition for packing to avoid unintended multi-thread packing on CUDA-only setups. |
Comments suppressed due to low confidence (1)
test/test_xpu/test_autoround.py:11
save_tiny_modelis imported here but never used in this test module. Please remove it or use it (e.g., to build a tiny model for the new large-model XPU tests) to avoid dead imports and keep intent clear.
from ..helpers import get_model_path
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
chensuyue
reviewed
Mar 20, 2026
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
chensuyue
approved these changes
Mar 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
fix packing nvfp/mxfp
max_wokers& extend xpu utformats/schemes/vlm/lm_head/
Type of Change
Related Issues
#1490
Fixes or relates to #
Checklist Before Submitting