Skip to content

fix packing nvfp/mxfp max_wokers & extend xpu ut#1555

Merged
chensuyue merged 11 commits intomainfrom
kaihui/xpu_ut
Mar 21, 2026
Merged

fix packing nvfp/mxfp max_wokers & extend xpu ut#1555
chensuyue merged 11 commits intomainfrom
kaihui/xpu_ut

Conversation

@Kaihui-intel
Copy link
Copy Markdown
Contributor

@Kaihui-intel Kaihui-intel commented Mar 17, 2026

Description

fix packing nvfp/mxfp max_wokers & extend xpu ut
formats/schemes/vlm/lm_head/

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify): xpu ut

Related Issues

#1490

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Copilot AI review requested due to automatic review settings March 17, 2026 08:19
pre-commit-ci bot and others added 2 commits March 17, 2026 08:19
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes NVFP/MXFP packing thread worker selection and expands XPU unit tests to cover more quantization schemes and model types.

Changes:

  • Fix max_workers selection logic during NVFP/MXFP packing to avoid problematic concurrency on CUDA.
  • Update XPU tests to load quantized models with device_map="xpu" instead of "auto".
  • Add new XPU tests for multiple schemes, VLM quantization/inference, and lm_head quantization.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
test/test_xpu/test_autoround.py Extends XPU tests (schemes/VLM/lm_head) and adjusts device_map usage when reloading quantized models.
auto_round/export/export_to_autoround/export_to_nvfp_mxfp.py Fixes the max_workers condition for packing to avoid unintended multi-thread packing on CUDA-only setups.
Comments suppressed due to low confidence (1)

test/test_xpu/test_autoround.py:11

  • save_tiny_model is imported here but never used in this test module. Please remove it or use it (e.g., to build a tiny model for the new large-model XPU tests) to avoid dead imports and keep intent clear.
from ..helpers import get_model_path

chensuyue and others added 2 commits March 19, 2026 09:12
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
chensuyue and others added 4 commits March 20, 2026 21:06
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
@chensuyue chensuyue merged commit 79fa1a9 into main Mar 21, 2026
30 checks passed
@chensuyue chensuyue deleted the kaihui/xpu_ut branch March 21, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants