Fix `TGI` (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation #502

cpcdoy · 2025-01-15T14:31:42Z

Description

While implementing a custom task using lighteval, I needed to use constrained grammar generation with TGI and it seems that TGI integration is not up-to-date and not working.

Fixes for TGI Endpoint Inference

The /info route of TGI 3.0.1 doesn't always return required fields such as model_dtype, so it was set to None by default if not found:

$ curl http://localhost:8080/info
{"model_id":"unsloth/Qwen2.5-0.5B-Instruct","model_sha":"6a7b5090fc11df0706c796b7ba76762d7beb688b","model_pipeline_tag":"text-generation","max_concurrent_requests":128,"max_best_of":2,"max_stop_sequences":4,"max_input_tokens":32767,"max_total_tokens":32768,"validation_workers":2,"max_client_batch_size":4,"router":"text-generation-router","version":"3.0.1","sha":"bb9095aae339579fbf3b4e7be3909932de26a7ee","docker_label":"sha-bb9095a"}

AsyncClient from TGI has a generate function that expects multiple parameters and not a structure.
- I've set do_sample, return_full_text and watermark parameters as False by default since they come from huggingface_hub which accepts a None default parameters but TGI doesn't accept them
  - Question for a maintainer : Should they be set as such by default? I don't see them being provided to _async_process_request anyway and maybe this should be fixed in another PR. Same for adapter_id for LoRA heads.
ModelClient's usage has been fixed to use the config: TGIModelConfig by default instead of named parameters

Fixes for TGI JSON Grammar Generation

Updated text_generation to 0.7.0
Added support for the grammar field to enable JSON grammar generation

Environment

Command

uv run lighteval endpoint tgi tgi.yaml "custom|...|0|0" --custom-tasks "ner_eval.py" --output-dir "results" --max-samples 10 --override-batch-size 1 --use-chat-template --save-details --no-public-run

Dependencies

dependencies = [
    "datasets>=3.2.0",
    "huggingface-hub>=0.27.1",
    "lighteval[tgi]>=0.7.0",
    "numpy>=1.26.4",
    "pandas>=2.2.3",
    "pydantic>=1.10.21",
    "text-generation==0.6.0",
    "torch>=2.4.1",
    "torchvision>=0.19.1",
]

[tool.uv.sources]
lighteval = { path = "../../../../lighteval", editable = true } # This branch

`model_config_path` argument for TGI

tgi.yaml:

model:
  instance:
    inference_server_address: "http://localhost:8080"
    inference_server_auth: null
    model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory

Test Results

It works as can be seen from the logs.

TGI Logs with JSON Grammar Generation

2025-01-15T17:09:34.811955Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3060"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(128), return_full_text: Some(false), stop: ["\n\n", "<|im_end|>"], truncate: None, watermark: false, details: true, decoder_input_details: true, seed: None, top_n_tokens: None, grammar: Some(Json(Object {"type": String("object"), "properties": Object {"entities": Object {"type": String("array"), "items": Object {"type": String("object"), "properties": Object {"entity": Object {"type": String("string")}, "classification": Object {"type": String("string"), "enum": Array [String("merchant"), String("bank"), String("individual"), String("date"), String("location"), String("unknown")]}}, "required": Array [String("entity"), String("classification")]}}}, "required": Array [String("entities")]})), adapter_id: None } total_time="428.587752ms" validation_time="716.935µs" queue_time="82.504µs" inference_time="427.788413ms" time_per_token="25.164024ms" seed="None"}: text_generation_router::server: router/src/server.rs:422: Success

Lighteval Logs

(py3.11.3) cpcdoy@cpcdoy-desktop:~/projects/.../llm_tasks_eval$ uv run lighteval endpoint tgi tgi.yaml "custom|...|0|0" --custom-tasks "ner_eval.py" --output-dir "results" --max-samples 10 --override-batch-size 1 --use-chat-template --save-details --no-public-run
warning: `VIRTUAL_ENV=/home/cpcdoy/py3.11.3` does not match the project environment path `.venv` and will be ignored
[2025-01-15 15:11:24,861] [    INFO]: PyTorch version 2.4.1 available. (config.py:54)
[2025-01-15 15:11:28,418] [ WARNING]: --max_samples WAS SET. THESE NUMBERS ARE ONLY PARTIAL AND SHOULD NOT BE USED FOR COMPARISON UNLESS YOU KNOW WHAT YOU ARE DOING. (pipeline.py:132)
[2025-01-15 15:11:28,418] [    INFO]: --- LOADING MODEL --- (pipeline.py:168)
[2025-01-15 15:11:28,418] [    INFO]: Load model from inference server: http://localhost:8080 (model_loader.py:110)
[2025-01-15 15:11:28,846] [    INFO]: --- LOADING TASKS --- (pipeline.py:195)
[2025-01-15 15:11:28,858] [ WARNING]: If you want to use extended_tasks, make sure you installed their dependencies using `pip install -e .[extended_tasks]`. (registry.py:136)
[2025-01-15 15:11:28,858] [    INFO]: Found 1 custom tasks in /home/cpcdoy/.cache/huggingface/modules/datasets_modules/datasets/ner_eval/1739d6fd80c40f11df64fba54bf39bd05b1b1408659c4325f28f0ca9ee2a04b0/ner_eval.py (registry.py:141)
[2025-01-15 15:11:28,861] [    INFO]: ... default (lighteval_task.py:187)
[2025-01-15 15:11:28,861] [ WARNING]: Careful, the task ... is using evaluation data to build the few shot examples. (lighteval_task.py:261)
[2025-01-15 15:11:28,898] [    INFO]: --- INIT SEEDS --- (pipeline.py:224)
[2025-01-15 15:11:28,899] [    INFO]: --- RUNNING MODEL --- (pipeline.py:267)
[2025-01-15 15:11:28,899] [    INFO]: Running RequestType.GREEDY_UNTIL requests (pipeline.py:271)
[2025-01-15 15:11:28,903] [ WARNING]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:260)
Splits: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.90s/it]
[2025-01-15 15:11:33,800] [    INFO]: --- COMPUTING METRICS --- (pipeline.py:299)                                                                  
[2025-01-15 15:11:33,802] [    INFO]: --- DISPLAYING RESULTS --- (pipeline.py:342)
|            Task             |Version|        Metric         |Value|   |Stderr|
|-----------------------------|------:|-----------------------|----:|---|-----:|
...

[2025-01-15 15:11:33,824] [    INFO]: --- SAVING AND PUSHING RESULTS --- (pipeline.py:332)
[2025-01-15 15:11:33,825] [    INFO]: Saving experiment tracker (evaluation_tracker.py:154)
[2025-01-15 15:11:33,848] [    INFO]: Saving results to ... (evaluation_tracker.py:208)
[2025-01-15 15:11:33,851] [    INFO]: Saving details to ... (evaluation_tracker.py:216)
Creating parquet from Arrow format: 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 82.46ba/s]

Note: I have anonymized parts of the logs

cpcdoy · 2025-01-15T17:07:59Z

Updated the PR to add support for JSON Grammar Constrained Generation for TGI

NathanHB

Thanks for the PR ! Few thing i'm not sure to get / understand

src/lighteval/models/endpoints/endpoint_model.py

src/lighteval/models/endpoints/tgi_model.py

naufalso · 2025-02-07T09:16:57Z

UP! I encountered a similar issue where the bug prevented us from using the TGI endpoint. The key issues I found are:

Line 111-113 in `src/lighteval/models/model_loader.py:
The current implementation:
```
model = ModelClient(address=config.inference_server_address, auth_token=config.inference_server_auth, model_id=config.model_id)  
```
should be updated to:
```
model = ModelClient(config=config)  
```
This ensures that the initialization parameters are correctly passed to ModelClient, resolving configuration-related issues.
model_dtype issue:
The model_dtype is not consistently available on the /info route of TGI, which leads to errors when the field is required. To address this, model_dtype should be set to None by default.

cpcdoy · 2025-02-07T18:48:48Z

Exactly @naufalso , this is already solved in this PR!

ZQ-Dev8 · 2025-04-02T18:07:27Z

+1 is this going to be merged @NathanHB ? Would really like to use lighteval with locally hosted TGI, but I'm seeing the same TypeError: ModelClient.__init__() got an unexpected keyword argument 'address' error described above.

HuggingFaceDocBuilderDev · 2025-04-17T10:27:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NathanHB · 2025-04-17T10:30:11Z

hey ! Thanks for the PR it seems good to merge, just need to fix the tests

add: dep in extra

cpcdoy · 2025-06-11T09:38:33Z

@NathanHB Apologies for the delay, I missed you approval comment!

I've fixed the unit tests in two files that simply needed the new grammar field to be added.

I've also noticed that the langcodes dependency was missing from the multilingual extra when I ran the tests, so I added it there.

I've tried both without and with --runslow:

without --runslow: everything passes

➜  lighteval git:(fix/tgi_inference) ✗ uv run --extra tests --extra dev pytest -xvvs /home/cpcdoy/projects/abwab.ai/lighteval/tests/

...

================================================== 634 passed, 7 skipped, 5 warnings in 38.11s ==================================================

with --runslow: it seems the accuracy increased in this run, but since it's a vLLM run, I'm expecting it's unrelated? Lmk what you think.

➜  lighteval git:(fix/tgi_inference) ✗ uv run --extra tests --extra dev pytest -xvvs /home/cpcdoy/projects/abwab.ai/lighteval/tests/

...
FAILED tests/slow_tests/test_vllm_model.py::test_vllm_model[examples/model_configs/vllm_model_config.yaml] - AssertionError: Differences found: {'values_changed': {"root['lighteval:agieval:logiqa-en:0']['acc']": {'new_value': 0.3, 'old_value': 0.2}}}
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 1 failed, 593 passed, 4 skipped, 8 warnings in 1637.09s (0:27:17) ==================================================

cpcdoy · 2025-07-08T11:51:57Z

Hello @NathanHB , just checking if there's any news on this PR? Lmk if I need to provide any support

NathanHB · 2025-08-12T15:26:24Z

hey ! Sorry for the late review. I just retook a look and there's been a refacto of the codebase, this does not seems to affect your code that much but you would need to rename the request variable in the endpoint_model.py file for example.
and overall make sure it all works :)

cpcdoy · 2025-08-15T09:14:58Z

Hey, no worries @NathanHB ! I've adapted the code to use doc instead of request variable after the refacto. I've fixed the tests to include the new grammar field (all of them pass) and I've re-run my own benchmark suite that uses lighteval with TGI as a backend to check that everything still works after the refacto. Everything looks good :)

pyproject.toml

NathanHB · 2025-08-18T10:56:57Z

thanks ! Last thing, can you provide a config in which you use the grammar arg ? I will test locally to make sure everything is fine on this side

…he usage of constrained grammar generation using TGI

cpcdoy · 2025-08-20T13:57:53Z

@NathanHB I have actually noticed that my uv env was using an older version of some of the files of lighteval in my benchmarking suite, so I actually had to make a few more changes to accommodate for your refactoring. I've also adapted generation_parameters to work the same way you're now doing it in other endpoints.

Also, all tests pass.

Furthermore, I have also created an example usage of a custom task that uses a publicly available dataset (emotion dataset) from HF Hub on a classification task that demonstrates the newly implemented constrained grammar generation feature using TGI. I added this example in examples/custom_tasks_templates/custom_task_classification_grammar_task.py and updated examples/model_configs/tgi_model.yaml accordingly.

How to run the example

Here's how to run it from the root of the lighteval directory:

[Optional] Remove lighteval cache before the run: rm -rf ~/.cache/huggingface/lighteval/*
Start a TGI server first:

model="unsloth/Qwen2.5-0.5B-Instruct"
volume=./data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:3.3.4 --model-id $model

Run the lighteval task:

uv run --active --extra tgi lighteval endpoint tgi examples/model_configs/tgi_model.yaml "custom|emotion_classification|0|0" --custom-tasks examples/custom_tasks_templates/custom_task_classification_grammar_task.py --output-dir results --save-details --no-public-run --max-samples 10

Logs from the example run

TGI Logs

While running the lighteval task, you'll notice that TGI will register the request and the grammar too in its logs such as:

2025-08-20T13:50:20.563969Z  INFO text_generation_router_v3::radix: backends/v3/src/radix.rs:108: Prefix 0 - Suffix 325
2025-08-20T13:50:20.665852Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3060")) context=Extension(None)}:generate{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: Some(0.9), typical_p: None, do_sample: false, max_new_tokens: Some(64), return_full_text: Some(false), stop: ["\n\n"], truncate: None, watermark: false, details: true, decoder_input_details: true, seed: None, top_n_tokens: None, grammar: Some(Json(Object {"type": String("object"), "properties": Object {"classification": Object {"type": String("string"), "description": String("Emotion classification from the provided list"), "enum": Array [String("sadness"), String("joy"), String("love"), String("anger"), String("fear"), String("surprise")]}}, "required": Array [String("classification")], "additionalProperties": Bool(false)})), adapter_id: None } total_time="102.706171ms" validation_time="778.556µs" queue_time="104.307µs" inference_time="101.823408ms" time_per_token="12.727926ms" seed="Some(5420590878626193495)"}: text_generation_router::server: router/src/server.rs:432: Success

`lighteval` logs

And lighteval will show logs such as:

[2025-08-20 15:50:20,810] [    INFO]: - Prediction: {'classification': 'joy'} (custom_task_classification_grammar_task.py:189)
[2025-08-20 15:50:20,810] [    INFO]: - Expected: joy (index: 1) (custom_task_classification_grammar_task.py:190)
[2025-08-20 15:50:20,811] [    INFO]: - Metrics: {'exact_match': 1.0, 'unknown_prediction': 0.0, 'total_samples': 1.0} (custom_task_classification_grammar_task.py:202)
[2025-08-20 15:50:20,811] [    INFO]: ✓ Correct prediction (custom_task_classification_grammar_task.py:204)

Please lmk if you have any questions!

…ints

NathanHB

Hey ! Thanks for the detailed message, I rechecked and still would change / remove a few things, otherwise much clearer PR :)

examples/model_configs/tgi_model.yaml

src/lighteval/models/model_input.py

tests/logging/test_evaluation_tracker.py

tests/models/endpoints/test_tgi_model.py

examples/custom_tasks_templates/custom_task_classification_grammar_task.py

src/lighteval/models/model_input.py

pyproject.toml

src/lighteval/models/endpoints/tgi_model.py

src/lighteval/main_endpoint.py

cpcdoy · 2025-08-21T14:46:08Z

Thank you for the reviews @NathanHB , I've applied everything! I've also improved a unit test for TGI caching by mocking the HTTP request for the /info route of the TGI server.

… Grammar Generation (#502) * fix: Lighteval communication with TGI * fix: JSON grammar constrained generation * fix: unit tests + add: dep in extra * fix: request var => doc var after refactor * fix: update test to support the new grammar field * fix: TGI endpoint with the new refactor * update: TGI model config in examples with the latest parameters * add: example custom task on a classification dataset to demonstrate the usage of constrained grammar generation using TGI * add: format example task * fix: unit test * add: adapt the in the yaml config to use similarly to the other endpoints * clean: moved new task to community_tasks * fix: format * clean: delete unused grammar field * del: grammar * add: copyright at the top * del: langcodes dep isn't needed anymore * add: use load from file directly in the main endpoint * del: newlines * add: mock HTTP request for info to TGI server --------- Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

cpcdoy added 2 commits January 15, 2025 15:12

fix: Lighteval communication with TGI

ab68a1b

fix: JSON grammar constrained generation

f442a29

cpcdoy changed the title ~~Fix TGI (Text Generation Inference) Endpoint Inference~~ Fix TGI (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation Jan 15, 2025

NathanHB reviewed Feb 5, 2025

View reviewed changes

src/lighteval/models/endpoints/endpoint_model.py Outdated Show resolved Hide resolved

src/lighteval/models/endpoints/tgi_model.py Show resolved Hide resolved

Merge branch 'main' into fix/tgi_inference

6bb6b13

NathanHB and others added 2 commits April 8, 2025 11:49

Merge branch 'main' into fix/tgi_inference

cb13617

Merge branch 'main' into fix/tgi_inference

749887e

cpcdoy requested a review from NathanHB April 14, 2025 13:30

Merge branch 'main' into fix/tgi_inference

55e3790

NathanHB approved these changes Apr 17, 2025

View reviewed changes

cpcdoy added 2 commits June 11, 2025 10:31

Merge branch 'main' into fix/tgi_inference

7cc8ad3

fix: unit tests +

dedab95

add: dep in extra

Merge branch 'main' into fix/tgi_inference

2cbc91d

cpcdoy requested a review from NathanHB June 23, 2025 14:43

cpcdoy added 2 commits June 23, 2025 16:43

Merge branch 'main' into fix/tgi_inference

3014120

Merge branch 'main' into fix/tgi_inference

2691fd4

Merge branch 'main' into fix/tgi_inference

61d47ad

cpcdoy added 3 commits August 15, 2025 09:21

Merge branch 'main' into fix/tgi_inference

4730d1a

fix: request var => doc var after refactor

b98702c

fix: update test to support the new grammar field

9f1b75c

NathanHB reviewed Aug 18, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

cpcdoy added 5 commits August 20, 2025 15:15

fix: TGI endpoint with the new refactor

b049367

update: TGI model config in examples with the latest parameters

cfd61a1

add: example custom task on a classification dataset to demonstrate t…

e144ddd

…he usage of constrained grammar generation using TGI

Merge branch 'main' into fix/tgi_inference

3faea79

add: format example task

c95c88d

cpcdoy added 3 commits August 20, 2025 16:16

fix: unit test

9df8575

add: adapt the in the yaml config to use similarly to the other endpo…

33cebe9

…ints

Merge branch 'main' into fix/tgi_inference

b529ef3

NathanHB requested changes Aug 20, 2025

View reviewed changes

cpcdoy added 4 commits August 20, 2025 17:29

Merge branch 'main' into fix/tgi_inference

1aebd59

clean: moved new task to community_tasks

93a645b

fix: format

a34353b

clean: delete unused grammar field

88da108

cpcdoy requested a review from NathanHB August 20, 2025 16:06

del: grammar

007932d

NathanHB reviewed Aug 21, 2025

View reviewed changes

src/lighteval/models/model_input.py Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

src/lighteval/models/endpoints/tgi_model.py Show resolved Hide resolved

src/lighteval/main_endpoint.py Outdated Show resolved Hide resolved

cpcdoy added 5 commits August 21, 2025 16:42

add: copyright at the top

19afc28

del: langcodes dep isn't needed anymore

90e505f

add: use load from file directly in the main endpoint

ab5c6b9

del: newlines

834aa08

add: mock HTTP request for info to TGI server

2e5c74b

Merge branch 'main' into fix/tgi_inference

a40b754

NathanHB approved these changes Aug 25, 2025

View reviewed changes

NathanHB merged commit da8466b into huggingface:main Aug 25, 2025
4 checks passed

NathanHB added the feature label Sep 9, 2025

Fix TGI (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation #502

Fix TGI (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation #502

Uh oh!

Conversation

cpcdoy commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fixes for TGI Endpoint Inference

Fixes for TGI JSON Grammar Generation

Environment

Command

Dependencies

model_config_path argument for TGI

Test Results

TGI Logs with JSON Grammar Generation

Lighteval Logs

Uh oh!

cpcdoy commented Jan 15, 2025

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

naufalso commented Feb 7, 2025

Uh oh!

cpcdoy commented Feb 7, 2025

Uh oh!

ZQ-Dev8 commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 17, 2025

Uh oh!

NathanHB commented Apr 17, 2025

Uh oh!

cpcdoy commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpcdoy commented Jul 8, 2025

Uh oh!

NathanHB commented Aug 12, 2025

Uh oh!

cpcdoy commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NathanHB commented Aug 18, 2025

Uh oh!

cpcdoy commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to run the example

Logs from the example run

TGI Logs

lighteval logs

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpcdoy commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix `TGI` (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation #502

Fix `TGI` (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation #502

cpcdoy commented Jan 15, 2025 •

edited

Loading

`model_config_path` argument for TGI

cpcdoy commented Jun 11, 2025 •

edited

Loading

cpcdoy commented Aug 15, 2025 •

edited

Loading

cpcdoy commented Aug 20, 2025 •

edited

Loading

`lighteval` logs

cpcdoy commented Aug 21, 2025 •

edited

Loading