Added `backend_options` parameter to llm judges. #963

rolshoven · 2025-09-15T10:16:47Z

As part of a community task I have been collaborating on, I've encountered various challenges with the litellm judge backend. The challenges were (#962):

Judge outputs are currently not cached. If the evaluation script fails, the responses from the judge need to be regenerated, leading to higher evaluation costs.
Not all inference providers support the same set of parameters when generating chat completions.
The maximum number of generated tokens is currently hardcoded to 512 in the litellm backend
Currently, 100 concurrent requests are performed, potentially leading to the client running into rate limits.

These changes are solved in this PR in the following way:

The JudgeLM and JudgeLLM constructors now accept a backend_options parameter (dict). In case of the litellm backend, this is then converted to a new dataclass LitellmBackendOptions, which allows to specify whether to use caching or not, how many concurrent requests should be performed, and whether to increase the number of output tokens in case of reasoning models or not.

Additionally, the litellm backend will now by default ignore chat completion arguments that are not supported by the currently used inference provider. The max_tokens parameter is now respected by the litellm backend instead of using a hardcoded value.

I'm looking forward to discussing the current solution or potential alternatives!

Currently only used for litellm backend but can be extended to other backends as well. Allows to specify whether to use caching or not, the number of concurrent requests, and whether the token output budget should be increased for reasoning models.

NathanHB

hey ! Thanks for the PR it looks great, only a few nts and it's will be good to ber merged

src/lighteval/metrics/utils/llm_as_judge.py

HuggingFaceDocBuilderDev · 2025-09-15T11:53:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Added `backend_options` parameter to llm judges. Currently only used for litellm backend but can be extended to other backends as well. Allows to specify whether to use caching or not, the number of concurrent requests, and whether the token output budget should be increased for reasoning models. * Implemented changes from code review * Ran pre-commit hooks --------- Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

NathanHB added the feature label Sep 15, 2025

NathanHB reviewed Sep 15, 2025

View reviewed changes

src/lighteval/metrics/utils/llm_as_judge.py Outdated Show resolved Hide resolved

src/lighteval/metrics/utils/llm_as_judge.py Outdated Show resolved Hide resolved

src/lighteval/metrics/utils/llm_as_judge.py Outdated Show resolved Hide resolved

NathanHB and others added 4 commits September 15, 2025 14:39

Merge branch 'main' into judge_litellm_backend_changes

90ab6f9

Implemented changes from code review

7392c18

Ran pre-commit hooks

8bf62ae

Merge branch 'main' into judge_litellm_backend_changes

51fc1fb

NathanHB approved these changes Sep 16, 2025

View reviewed changes

NathanHB merged commit 9ba430f into huggingface:main Sep 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added `backend_options` parameter to llm judges. #963

Added `backend_options` parameter to llm judges. #963

Uh oh!

rolshoven commented Sep 15, 2025 •

edited

Loading

Uh oh!

NathanHB left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Added backend_options parameter to llm judges. #963

Added backend_options parameter to llm judges. #963

Uh oh!

Conversation

rolshoven commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Added `backend_options` parameter to llm judges. #963

Added `backend_options` parameter to llm judges. #963

rolshoven commented Sep 15, 2025 •

edited

Loading