Skip to content

refactor eval and add UT#1324

Merged
xin3he merged 20 commits intomainfrom
xinhe/eval
Jan 26, 2026
Merged

refactor eval and add UT#1324
xin3he merged 20 commits intomainfrom
xinhe/eval

Conversation

@xin3he
Copy link
Copy Markdown
Contributor

@xin3he xin3he commented Jan 23, 2026

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please describe):

Related Issues

Fixes #1319 #1050
Relates to #1134

Changes Made

Refactored evaluation functionality into a dedicated auto_round/eval module and added comprehensive unit tests for CPU and GPU backends.

Module Structure

  • auto_round/eval/eval_cli.py (486 lines): CLI argument parsing, vLLM/HF backend orchestration, custom argument parsing [for pure quantization with --eval]
  • auto_round/eval/evaluation.py (439 lines): Core evaluation wrappers for lm_eval library [for evaluation after quantziation]

Unit Tests

  • test/test_cpu/advanced/test_evaluation_functions.py (121 lines): Tests for parse_vllm_args() with various types (int, float, bool, string, mixed)
  • test/test_cuda/advanced/test_evaluation.py (115 lines): Integration tests for vLLM backend with custom args, quantization workflows

Compatibility & Optimization

  • Added allow_deprecated_quantization: True for vLLM 0.14.0 compatibility
  • Add comprehensive unit tests that cover common scenarios.

Example Usage

# support VLM evaluation and vllm_args with vLLM backend 
CUDA_VISIBLE_DEVICES=1 auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks  lambada_openai --limit 100 --eval_backend vllm  --vllm_args "tensor_parallel_size=1,gpu_memory_utilization=0.8,max_model_len=8096"
CUDA_VISIBLE_DEVICES=1,2 auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks  lambada_openai --limit 100 --eval_backend vllm  --vllm_args "tensor_parallel_size=2,gpu_memory_utilization=0.8"

# --eval aims evaluate the model without quantization
1. HF backend
auto-round facebook/opt-125m --eval --tasks  lambada_openai --limit 100
2. HF backend + task-by-task evaluation
auto-round facebook/opt-125m --eval --eval_task_by_task --tasks lambada_openai,piqa --limit 100
3. VLLM backend
auto-round facebook/opt-125m --eval --tasks  lambada_openai --limit 100 --eval_backend vllm

# evaluation after quantization
1. HF backend
auto-round facebook/opt-125m --iters 0 --tasks  lambada_openai --limit 100
2. HF backend + task-by-task evaluation
auto-round facebook/opt-125m --iters 0 --eval_task_by_task --tasks lambada_openai,piqa --limit 100
3. VLLM backend
auto-round facebook/opt-125m --iters 0 --tasks  lambada_openai --limit 100 --eval_backend vllm

Testing

  • Tested locally
  • Added/updated unit tests
  • All existing tests pass
  • Tested on specific hardware/environment (please specify):

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Context

root and others added 2 commits January 22, 2026 10:04
Copilot AI review requested due to automatic review settings January 23, 2026 02:25
@xin3he xin3he changed the title Xinhe/eval refactor eval and add UT Jan 23, 2026
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Jan 23, 2026

@copilot please help update the description

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 23, 2026

@xin3he I've opened a new pull request, #1325, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the evaluation system by extracting evaluation logic into separate functions, adding vLLM backend support with custom arguments, and improving code organization.

Changes:

  • Removed unused imports and deprecated VLLM integration test
  • Replaced verbose VLLM argument parser with a single --vllm_args parameter accepting comma-separated key-value pairs
  • Extracted evaluation logic into dedicated functions in evaluation.py for better modularity

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/test_cuda/integrations/test_vllm.py Removed unused imports and deprecated test_vllm_lm_eval test function
test/test_cuda/advanced/test_evaluation.py Added new tests for VLLM and HF evaluation backends with custom arguments
test/test_cpu/advanced/test_evaluation_functions.py Added unit tests for VLLM argument parsing and GGUF model loading utilities
auto_round/utils/model.py Added support for Qwen3Next and Qwen3VLMoeText MoE blocks
auto_round/eval/evaluation.py Added helper functions for diffusion models, GGUF loading, and model evaluation routing
auto_round/eval/eval_cli.py Refactored VLLM argument handling to use parse_vllm_args and extracted GGUF loading logic
auto_round/main.py Replaced inline evaluation code with call to run_model_evaluation function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xin3he and others added 5 commits January 23, 2026 11:39
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
@xin3he xin3he requested review from n1ck-guo and wenhuach21 January 23, 2026 06:08
Signed-off-by: He, Xin3 <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Jan 23, 2026

  • 126.61s call test/test_cuda/advanced/test_evaluation.py::TestVllmEvaluation::test_vllm_backend_with_custom_args[OPEA/Qwen2.5-0.5B-Instruct-int4-sym-inc]
  • 112.68s call test/test_cuda/advanced/test_evaluation.py::TestVllmEvaluation::test_vllm_backend_with_quantization_iters_0
  • 85.39s call test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_eval_mode_hf_backend[OPEA/Qwen2.5-0.5B-Instruct-int4-sym-inc]
  • 83.31s call test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_task_by_task
  • 66.82s call test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_hf_backend
  • 1.00s setup test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_hf_backend
  • 0.17s teardown test/test_cuda/advanced/test_evaluation.py::TestHFEvaluation::test_iters_0_task_by_task

…e limit

Signed-off-by: He, Xin3 <xin3.he@intel.com>
…val_with_vllm

Signed-off-by: He, Xin3 <xin3.he@intel.com>
…cesses

Signed-off-by: He, Xin3 <xin3.he@intel.com>
…trings

Signed-off-by: He, Xin3 <xin3.he@intel.com>
…m_args

Signed-off-by: He, Xin3 <xin3.he@intel.com>
@xin3he xin3he requested review from n1ck-guo and wenhuach21 January 23, 2026 07:58
@chensuyue chensuyue added this to the 0.10.0 milestone Jan 23, 2026
Signed-off-by: He, Xin3 <xin3.he@intel.com>
@wenhuach21
Copy link
Copy Markdown
Contributor

Please make sure this use case is supported
auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks lambada_openai --limit 100 --deivce_map 0.1 --eval_backend vllm

@wenhuach21 wenhuach21 self-requested a review January 26, 2026 01:41
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Jan 26, 2026

Please make sure this use case is supported auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks lambada_openai --limit 100 --deivce_map 0.1 --eval_backend vllm

Sure, verified with auto-round "/models/Qwen3-VL-30B-A3B-Instruct" --eval --tasks lambada_openai --limit 100 --device_map 5,6 --eval_backend vllm

Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Jan 26, 2026

HPU, XPU are verified.
A weird issue that without fix CI will cause dynamo failure
FAILED test_cpu/utils/test_alg_ext.py::TestAlgExt::test_all_support_dtype - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <functio..

@xin3he xin3he mentioned this pull request Jan 26, 2026
Signed-off-by: He, Xin3 <xin3.he@intel.com>
@xin3he xin3he merged commit 14eacb1 into main Jan 26, 2026
28 checks passed
@xin3he xin3he deleted the xinhe/eval branch January 26, 2026 12:26
lvliang-intel pushed a commit that referenced this pull request Feb 2, 2026
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: limit doesn't work in task-by-task mode

6 participants