refactor eval and add UT by xin3he · Pull Request #1324 · intel/auto-round

xin3he · 2026-01-23T02:25:55Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Other (please describe):

Related Issues

Changes Made

Refactored evaluation functionality into a dedicated auto_round/eval module and added comprehensive unit tests for CPU and GPU backends.

Module Structure

auto_round/eval/eval_cli.py (486 lines): CLI argument parsing, vLLM/HF backend orchestration, custom argument parsing [for pure quantization with --eval]
auto_round/eval/evaluation.py (439 lines): Core evaluation wrappers for lm_eval library [for evaluation after quantziation]

Unit Tests

test/test_cpu/advanced/test_evaluation_functions.py (121 lines): Tests for parse_vllm_args() with various types (int, float, bool, string, mixed)
test/test_cuda/advanced/test_evaluation.py (115 lines): Integration tests for vLLM backend with custom args, quantization workflows

Compatibility & Optimization

Added allow_deprecated_quantization: True for vLLM 0.14.0 compatibility
Add comprehensive unit tests that cover common scenarios.

Example Usage

# support VLM evaluation and vllm_args with vLLM backend 
CUDA_VISIBLE_DEVICES=1 auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks  lambada_openai --limit 100 --eval_backend vllm  --vllm_args "tensor_parallel_size=1,gpu_memory_utilization=0.8,max_model_len=8096"
CUDA_VISIBLE_DEVICES=1,2 auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks  lambada_openai --limit 100 --eval_backend vllm  --vllm_args "tensor_parallel_size=2,gpu_memory_utilization=0.8"

# --eval aims evaluate the model without quantization
1. HF backend
auto-round facebook/opt-125m --eval --tasks  lambada_openai --limit 100
2. HF backend + task-by-task evaluation
auto-round facebook/opt-125m --eval --eval_task_by_task --tasks lambada_openai,piqa --limit 100
3. VLLM backend
auto-round facebook/opt-125m --eval --tasks  lambada_openai --limit 100 --eval_backend vllm

# evaluation after quantization
1. HF backend
auto-round facebook/opt-125m --iters 0 --tasks  lambada_openai --limit 100
2. HF backend + task-by-task evaluation
auto-round facebook/opt-125m --iters 0 --eval_task_by_task --tasks lambada_openai,piqa --limit 100
3. VLLM backend
auto-round facebook/opt-125m --iters 0 --tasks  lambada_openai --limit 100 --eval_backend vllm

Testing

Tested locally
Added/updated unit tests
All existing tests pass
Tested on specific hardware/environment (please specify):

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Context

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he · 2026-01-23T02:26:30Z

@copilot please help update the description

Copilot · 2026-01-23T02:26:39Z

@xin3he I've opened a new pull request, #1325, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

This PR refactors the evaluation system by extracting evaluation logic into separate functions, adding vLLM backend support with custom arguments, and improving code organization.

Changes:

Removed unused imports and deprecated VLLM integration test
Replaced verbose VLLM argument parser with a single --vllm_args parameter accepting comma-separated key-value pairs
Extracted evaluation logic into dedicated functions in evaluation.py for better modularity

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/test_cuda/integrations/test_vllm.py	Removed unused imports and deprecated `test_vllm_lm_eval` test function
test/test_cuda/advanced/test_evaluation.py	Added new tests for VLLM and HF evaluation backends with custom arguments
test/test_cpu/advanced/test_evaluation_functions.py	Added unit tests for VLLM argument parsing and GGUF model loading utilities
auto_round/utils/model.py	Added support for Qwen3Next and Qwen3VLMoeText MoE blocks
auto_round/eval/evaluation.py	Added helper functions for diffusion models, GGUF loading, and model evaluation routing
auto_round/eval/eval_cli.py	Refactored VLLM argument handling to use parse_vllm_args and extracted GGUF loading logic
auto_round/main.py	Replaced inline evaluation code with call to run_model_evaluation function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/test_cuda/advanced/test_evaluation.py

auto_round/eval/eval_cli.py

auto_round/__main__.py

Signed-off-by: He, Xin3 <xin3.he@intel.com>

for more information, see https://pre-commit.ci

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Signed-off-by: He, Xin3 <xin3.he@intel.com>

auto_round/eval/eval_cli.py

xin3he · 2026-01-23T06:41:01Z

126.61s call test/test_cuda/advanced/test_evaluation.py::TestVllmEvaluation::test_vllm_backend_with_custom_args[OPEA/Qwen2.5-0.5B-Instruct-int4-sym-inc]
112.68s call test/test_cuda/advanced/test_evaluation.py::TestVllmEvaluation::test_vllm_backend_with_quantization_iters_0
85.39s call test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_eval_mode_hf_backend[OPEA/Qwen2.5-0.5B-Instruct-int4-sym-inc]
83.31s call test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_task_by_task
66.82s call test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_hf_backend
1.00s setup test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_hf_backend
0.17s teardown test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_task_by_task

…e limit Signed-off-by: He, Xin3 <xin3.he@intel.com>

auto_round/eval/eval_cli.py

…val_with_vllm Signed-off-by: He, Xin3 <xin3.he@intel.com>

…cesses Signed-off-by: He, Xin3 <xin3.he@intel.com>

…trings Signed-off-by: He, Xin3 <xin3.he@intel.com>

…m_args Signed-off-by: He, Xin3 <xin3.he@intel.com>

Signed-off-by: He, Xin3 <xin3.he@intel.com>

auto_round/eval/eval_cli.py

wenhuach21 · 2026-01-26T01:40:33Z

Please make sure this use case is supported
auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks lambada_openai --limit 100 --deivce_map 0.1 --eval_backend vllm

xin3he · 2026-01-26T02:08:43Z

Please make sure this use case is supported auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks lambada_openai --limit 100 --deivce_map 0.1 --eval_backend vllm

Sure, verified with auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks lambada_openai --limit 100 --device_map 5,6 --eval_backend vllm

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he · 2026-01-26T08:11:08Z

HPU, XPU are verified.
A weird issue that without fix CI will cause dynamo failure
FAILED test_cpu/utils/test_alg_ext.py::TestAlgExt::test_all_support_dtype - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <functio..

Signed-off-by: He, Xin3 <xin3.he@intel.com>

root and others added 2 commits January 22, 2026 10:04

refactor eval and add UT

8514a7d

allow_deprecated_quantization and simplify UT to reduce time

a949c5c

Signed-off-by: He, Xin3 <xin3.he@intel.com>

Copilot AI review requested due to automatic review settings January 23, 2026 02:25

xin3he changed the title ~~Xinhe/eval~~ refactor eval and add UT Jan 23, 2026

Copilot AI mentioned this pull request Jan 23, 2026

Refactor evaluation module and add unit tests #1325

Closed

Copilot AI reviewed Jan 23, 2026

View reviewed changes

test/test_cuda/advanced/test_evaluation.py Outdated Show resolved Hide resolved

auto_round/eval/eval_cli.py Show resolved Hide resolved

auto_round/__main__.py Show resolved Hide resolved

xin3he and others added 5 commits January 23, 2026 11:39

fix CI and update readme

79fe1f4

Signed-off-by: He, Xin3 <xin3.he@intel.com>

remove 'from auto_round import AutoRoundConfig'

3a43bd7

Signed-off-by: He, Xin3 <xin3.he@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3f73a2a

for more information, see https://pre-commit.ci

Update test/test_cuda/advanced/test_evaluation.py

0e922a1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

remove AutoHfQuantizer

ceeaeb7

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he requested review from n1ck-guo and wenhuach21 January 23, 2026 06:08

Refactor imports in evaluation.py and update logging in eval_cli.py

ef1b0fe

Signed-off-by: He, Xin3 <xin3.he@intel.com>

n1ck-guo reviewed Jan 23, 2026

View reviewed changes

auto_round/eval/eval_cli.py Outdated Show resolved Hide resolved

n1ck-guo reviewed Jan 23, 2026

View reviewed changes

auto_round/eval/eval_cli.py Show resolved Hide resolved

Update evaluation commands to disable optimization return and increas…

b6b92ee

…e limit Signed-off-by: He, Xin3 <xin3.he@intel.com>

wenhuach21 reviewed Jan 23, 2026

View reviewed changes

auto_round/eval/eval_cli.py Outdated Show resolved Hide resolved

xin3he added 4 commits January 23, 2026 15:22

Add tensor_parallel_size handling and CUDA_VISIBLE_DEVICES setup in e…

5a349dc

…val_with_vllm Signed-off-by: He, Xin3 <xin3.he@intel.com>

Update step-by-step documentation for quantization and evaluation pro…

6f8c1b2

…cesses Signed-off-by: He, Xin3 <xin3.he@intel.com>

Fix task handling in eval_with_vllm to support comma-separated task s…

0cae62e

…trings Signed-off-by: He, Xin3 <xin3.he@intel.com>

Update vllm_args help text and normalize argument format in parse_vll…

22a824b

…m_args Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he requested review from n1ck-guo and wenhuach21 January 23, 2026 07:58

chensuyue added this to the 0.10.0 milestone Jan 23, 2026

Refactor imports to include AutoRoundConfig in test files

f9677fe

Signed-off-by: He, Xin3 <xin3.he@intel.com>

wenhuach21 reviewed Jan 26, 2026

View reviewed changes

auto_round/eval/eval_cli.py Outdated Show resolved Hide resolved

wenhuach21 self-requested a review January 26, 2026 01:41

wenhuach21 approved these changes Jan 26, 2026

View reviewed changes

n1ck-guo approved these changes Jan 26, 2026

View reviewed changes

xin3he added 5 commits January 26, 2026 13:03

support general devices

d61c95d

Signed-off-by: He, Xin3 <xin3.he@intel.com>

Merge branch 'main' into xinhe/eval

dafee04

Update environment variable for XPU in eval_with_vllm function

c25a62b

Signed-off-by: He, Xin3 <xin3.he@intel.com>

Merge branch 'main' into xinhe/eval

1010b5b

fix CI

daf0a39

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he mentioned this pull request Jan 26, 2026

Quality Assurance #1227

Closed

fix test_cuda failure

d1873f9

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he merged commit 14eacb1 into main Jan 26, 2026
28 checks passed

xin3he deleted the xinhe/eval branch January 26, 2026 12:26

lvliang-intel pushed a commit that referenced this pull request Feb 2, 2026

refactor eval and add UT (#1324)

409ceef

Signed-off-by: He, Xin3 <xin3.he@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor eval and add UT#1324

refactor eval and add UT#1324
xin3he merged 20 commits intomainfrom
xinhe/eval

xin3he commented Jan 23, 2026 •

edited

Loading

Uh oh!

xin3he commented Jan 23, 2026

Uh oh!

Copilot AI commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Jan 26, 2026

Uh oh!

xin3he commented Jan 26, 2026

Uh oh!

xin3he commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

xin3he commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Changes Made

Module Structure

Unit Tests

Compatibility & Optimization

Example Usage

Testing

Checklist

Additional Context

Uh oh!

xin3he commented Jan 23, 2026

Uh oh!

Copilot AI commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Jan 26, 2026

Uh oh!

xin3he commented Jan 26, 2026

Uh oh!

xin3he commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xin3he commented Jan 23, 2026 •

edited

Loading

xin3he commented Jan 23, 2026 •

edited

Loading