Add chat-template hooks to LMEvalORTGenAIEvaluator by ykhrustalev · Pull Request #2462 · microsoft/Olive

ykhrustalev · 2026-05-12T21:06:34Z

Describe your changes

Implement tokenizer_name and apply_chat_template on LMEvalORTGenAIEvaluator so the backend supports lm_eval.simple_evaluate(apply_chat_template=True). Without these, lm-eval raises NotImplementedError at task setup for any chat-formatted task.

Parity with the HuggingFace backend in lm_eval/models/huggingface.py. The HF tokenizer is loaded lazily on the first apply_chat_template call, so model directories without HF tokenizer files still work for non-chat evaluation. Generation continues to go through og.Tokenizer.

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? Release note: Enable apply_chat_template=True in lm-eval for ortgenai-backed evaluators.

(Optional) Issue link

N/A

lm-eval's `simple_evaluate(..., apply_chat_template=True)` requires the underlying LM class to implement `tokenizer_name` and `apply_chat_template`. The HFLM backend has both; the ORT GenAI backend does not, so any attempt to evaluate a chat-tuned ONNX model with chat-formatted prompts raises `NotImplementedError: To use this model with chat templates, please implement the 'tokenizer_name' property.` This adds the two members with the minimum surface area: - `tokenizer_name` returns the model path (for lm-eval's chat-aware result caching), matching the HFLM convention of slash-replacement. - `apply_chat_template` defers to the model's HF tokenizer via `AutoTokenizer.apply_chat_template`, mirroring HFLM's implementation. The HF tokenizer is loaded once at `__init__` purely for chat-template rendering; token-level encode/decode still goes through `og.Tokenizer` and the runtime, so there is no change to generation behavior or any existing code path. Verified end-to-end on LFM2.5-350M (int4, k_quant_mixed) MBPP: without chat-template hooks the eval raised at task start; with them plus `num_fewshot=0` and a chat-friendly stop list, pass@1 went from 0.0/500 to 67/500 (13.4%) -- the original 0.0 was a prompt-format artifact (instruct model + completion-style few-shot), not a conversion regression.

Copilot

Pull request overview

This PR adds lm-eval “chat template” integration hooks to LMEvalORTGenAIEvaluator so ORT GenAI–backed models can be evaluated via lm_eval.simple_evaluate(..., apply_chat_template=True) (matching the capability available in the HuggingFace backend).

Changes:

Add an HF tokenizer instance to LMEvalORTGenAIEvaluator for rendering chat templates.
Implement tokenizer_name for lm-eval chat-template-aware caching.
Implement apply_chat_template(...) by delegating to the HF tokenizer.

… key, tests - Lazy-load the HF tokenizer on the first ``apply_chat_template`` call rather than at ``__init__``. Callers that never enable chat templating no longer need HF tokenizer files (``tokenizer_config.json`` etc.) in the model directory; eager loading would have regressed those workflows. - ``tokenizer_name`` now replaces both POSIX and Windows path separators with ``__`` so the lm-eval cache identifier is stable across platforms. The previous implementation only handled forward slashes, leaving backslashes in the key on Windows because ``str(Path(...))`` preserves the native separator. - Add unit tests for both behaviours: - ``tokenizer_name`` parametrised over POSIX, relative, and Windows-style paths to lock in the normalisation contract. - ``apply_chat_template`` verified to (a) not load the HF tokenizer at construction, (b) load once on first call, and (c) reuse the cached tokenizer on subsequent calls. ``AutoTokenizer`` is patched so the tests run without any HF tokenizer files on disk. All four new tests pass; ``test_olive_evaluator.py`` as a whole stays green (85 passed). ``lintrunner`` reports no new warnings on the changed files.

shaahji · 2026-05-20T18:17:49Z

@ykhrustalev Please add the following snippet on tests that depend on lm-eval package

@pytest.mark.skipif(
    importlib.util.find_spec("lm_eval") is None,
    reason="lm_eval not installed",
)

Copilot AI review requested due to automatic review settings May 12, 2026 21:06

Copilot started reviewing on behalf of ykhrustalev May 12, 2026 21:07 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread olive/evaluator/lmeval_ort.py Outdated

Comment thread olive/evaluator/lmeval_ort.py Outdated

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread test/evaluator/test_olive_evaluator.py Fixed

ykhrustalev added 2 commits May 12, 2026 18:30

Trim comments and docstrings on chat-template hooks

e174f59

Use object.__new__ in chat-template test helper to silence pylint E1120

34ff372

shaahji enabled auto-merge (squash) May 20, 2026 17:46

Skip chat-template tests when lm_eval is not installed

cb21551

auto-merge was automatically disabled May 20, 2026 18:32
Head branch was pushed to by a user without write access

Merge branch 'main' into lmeval-ort-chat-template

55d7892

xiaoyu-work approved these changes May 21, 2026

View reviewed changes

xiaoyu-work merged commit 5ff3a59 into microsoft:main May 21, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chat-template hooks to LMEvalORTGenAIEvaluator#2462

Add chat-template hooks to LMEvalORTGenAIEvaluator#2462
xiaoyu-work merged 6 commits into
microsoft:mainfrom
ykhrustalev:lmeval-ort-chat-template

ykhrustalev commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shaahji commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ykhrustalev commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shaahji commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ykhrustalev commented May 12, 2026 •

edited

Loading