Skip to content

Conversation

cpcdoy
Copy link
Contributor

@cpcdoy cpcdoy commented Jan 15, 2025

Description

While implementing a custom task using lighteval, I needed to use constrained grammar generation with TGI and it seems that TGI integration is not up-to-date and not working.

Fixes for TGI Endpoint Inference

  • The /info route of TGI 3.0.1 doesn't always return required fields such as model_dtype, so it was set to None by default if not found:
$ curl http://localhost:8080/info
{"model_id":"unsloth/Qwen2.5-0.5B-Instruct","model_sha":"6a7b5090fc11df0706c796b7ba76762d7beb688b","model_pipeline_tag":"text-generation","max_concurrent_requests":128,"max_best_of":2,"max_stop_sequences":4,"max_input_tokens":32767,"max_total_tokens":32768,"validation_workers":2,"max_client_batch_size":4,"router":"text-generation-router","version":"3.0.1","sha":"bb9095aae339579fbf3b4e7be3909932de26a7ee","docker_label":"sha-bb9095a"}
  • AsyncClient from TGI has a generate function that expects multiple parameters and not a structure.
    • I've set do_sample, return_full_text and watermark parameters as False by default since they come from huggingface_hub which accepts a None default parameters but TGI doesn't accept them
      • Question for a maintainer : Should they be set as such by default? I don't see them being provided to _async_process_request anyway and maybe this should be fixed in another PR. Same for adapter_id for LoRA heads.
  • ModelClient's usage has been fixed to use the config: TGIModelConfig by default instead of named parameters

Fixes for TGI JSON Grammar Generation

  • Updated text_generation to 0.7.0
  • Added support for the grammar field to enable JSON grammar generation

Environment

Command

uv run lighteval endpoint tgi tgi.yaml "custom|...|0|0" --custom-tasks "ner_eval.py" --output-dir "results" --max-samples 10 --override-batch-size 1 --use-chat-template --save-details --no-public-run

Dependencies

dependencies = [
    "datasets>=3.2.0",
    "huggingface-hub>=0.27.1",
    "lighteval[tgi]>=0.7.0",
    "numpy>=1.26.4",
    "pandas>=2.2.3",
    "pydantic>=1.10.21",
    "text-generation==0.6.0",
    "torch>=2.4.1",
    "torchvision>=0.19.1",
]

[tool.uv.sources]
lighteval = { path = "../../../../lighteval", editable = true } # This branch

model_config_path argument for TGI

tgi.yaml:

model:
  instance:
    inference_server_address: "http://localhost:8080"
    inference_server_auth: null
    model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory

Test Results

It works as can be seen from the logs.

TGI Logs with JSON Grammar Generation

2025-01-15T17:09:34.811955Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3060"))}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(128), return_full_text: Some(false), stop: ["\n\n", "<|im_end|>"], truncate: None, watermark: false, details: true, decoder_input_details: true, seed: None, top_n_tokens: None, grammar: Some(Json(Object {"type": String("object"), "properties": Object {"entities": Object {"type": String("array"), "items": Object {"type": String("object"), "properties": Object {"entity": Object {"type": String("string")}, "classification": Object {"type": String("string"), "enum": Array [String("merchant"), String("bank"), String("individual"), String("date"), String("location"), String("unknown")]}}, "required": Array [String("entity"), String("classification")]}}}, "required": Array [String("entities")]})), adapter_id: None } total_time="428.587752ms" validation_time="716.935µs" queue_time="82.504µs" inference_time="427.788413ms" time_per_token="25.164024ms" seed="None"}: text_generation_router::server: router/src/server.rs:422: Success

Lighteval Logs

(py3.11.3) cpcdoy@cpcdoy-desktop:~/projects/.../llm_tasks_eval$ uv run lighteval endpoint tgi tgi.yaml "custom|...|0|0" --custom-tasks "ner_eval.py" --output-dir "results" --max-samples 10 --override-batch-size 1 --use-chat-template --save-details --no-public-run
warning: `VIRTUAL_ENV=/home/cpcdoy/py3.11.3` does not match the project environment path `.venv` and will be ignored
[2025-01-15 15:11:24,861] [    INFO]: PyTorch version 2.4.1 available. (config.py:54)
[2025-01-15 15:11:28,418] [ WARNING]: --max_samples WAS SET. THESE NUMBERS ARE ONLY PARTIAL AND SHOULD NOT BE USED FOR COMPARISON UNLESS YOU KNOW WHAT YOU ARE DOING. (pipeline.py:132)
[2025-01-15 15:11:28,418] [    INFO]: --- LOADING MODEL --- (pipeline.py:168)
[2025-01-15 15:11:28,418] [    INFO]: Load model from inference server: http://localhost:8080 (model_loader.py:110)
[2025-01-15 15:11:28,846] [    INFO]: --- LOADING TASKS --- (pipeline.py:195)
[2025-01-15 15:11:28,858] [ WARNING]: If you want to use extended_tasks, make sure you installed their dependencies using `pip install -e .[extended_tasks]`. (registry.py:136)
[2025-01-15 15:11:28,858] [    INFO]: Found 1 custom tasks in /home/cpcdoy/.cache/huggingface/modules/datasets_modules/datasets/ner_eval/1739d6fd80c40f11df64fba54bf39bd05b1b1408659c4325f28f0ca9ee2a04b0/ner_eval.py (registry.py:141)
[2025-01-15 15:11:28,861] [    INFO]: ... default (lighteval_task.py:187)
[2025-01-15 15:11:28,861] [ WARNING]: Careful, the task ... is using evaluation data to build the few shot examples. (lighteval_task.py:261)
[2025-01-15 15:11:28,898] [    INFO]: --- INIT SEEDS --- (pipeline.py:224)
[2025-01-15 15:11:28,899] [    INFO]: --- RUNNING MODEL --- (pipeline.py:267)
[2025-01-15 15:11:28,899] [    INFO]: Running RequestType.GREEDY_UNTIL requests (pipeline.py:271)
[2025-01-15 15:11:28,903] [ WARNING]: You cannot select the number of dataset splits for a generative evaluation at the moment. Automatically inferring. (data.py:260)
Splits: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.90s/it]
[2025-01-15 15:11:33,800] [    INFO]: --- COMPUTING METRICS --- (pipeline.py:299)                                                                  
[2025-01-15 15:11:33,802] [    INFO]: --- DISPLAYING RESULTS --- (pipeline.py:342)
|            Task             |Version|        Metric         |Value|   |Stderr|
|-----------------------------|------:|-----------------------|----:|---|-----:|
...

[2025-01-15 15:11:33,824] [    INFO]: --- SAVING AND PUSHING RESULTS --- (pipeline.py:332)
[2025-01-15 15:11:33,825] [    INFO]: Saving experiment tracker (evaluation_tracker.py:154)
[2025-01-15 15:11:33,848] [    INFO]: Saving results to ... (evaluation_tracker.py:208)
[2025-01-15 15:11:33,851] [    INFO]: Saving details to ... (evaluation_tracker.py:216)
Creating parquet from Arrow format: 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 82.46ba/s]

Note: I have anonymized parts of the logs

@cpcdoy cpcdoy changed the title Fix TGI (Text Generation Inference) Endpoint Inference Fix TGI (Text Generation Inference) Endpoint Inference and TGI JSON Grammar Generation Jan 15, 2025
@cpcdoy
Copy link
Contributor Author

cpcdoy commented Jan 15, 2025

Updated the PR to add support for JSON Grammar Constrained Generation for TGI

Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR ! Few thing i'm not sure to get / understand

@naufalso
Copy link

naufalso commented Feb 7, 2025

UP! I encountered a similar issue where the bug prevented us from using the TGI endpoint. The key issues I found are:

  • Line 111-113 in `src/lighteval/models/model_loader.py:
    The current implementation:

    model = ModelClient(address=config.inference_server_address, auth_token=config.inference_server_auth, model_id=config.model_id)  

    should be updated to:

    model = ModelClient(config=config)  

    This ensures that the initialization parameters are correctly passed to ModelClient, resolving configuration-related issues.

  • model_dtype issue:
    The model_dtype is not consistently available on the /info route of TGI, which leads to errors when the field is required. To address this, model_dtype should be set to None by default.

@cpcdoy
Copy link
Contributor Author

cpcdoy commented Feb 7, 2025

Exactly @naufalso , this is already solved in this PR!

@ZQ-Dev8
Copy link

ZQ-Dev8 commented Apr 2, 2025

+1 is this going to be merged @NathanHB ? Would really like to use lighteval with locally hosted TGI, but I'm seeing the same TypeError: ModelClient.__init__() got an unexpected keyword argument 'address' error described above.

@cpcdoy cpcdoy requested a review from NathanHB April 14, 2025 13:30
@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@NathanHB
Copy link
Member

hey ! Thanks for the PR it seems good to merge, just need to fix the tests

@cpcdoy
Copy link
Contributor Author

cpcdoy commented Jun 11, 2025

@NathanHB Apologies for the delay, I missed you approval comment!

I've fixed the unit tests in two files that simply needed the new grammar field to be added.

I've also noticed that the langcodes dependency was missing from the multilingual extra when I ran the tests, so I added it there.

I've tried both without and with --runslow:

  • without --runslow: everything passes
➜  lighteval git:(fix/tgi_inference) ✗ uv run --extra tests --extra dev pytest -xvvs /home/cpcdoy/projects/abwab.ai/lighteval/tests/

...

================================================== 634 passed, 7 skipped, 5 warnings in 38.11s ==================================================
  • with --runslow: it seems the accuracy increased in this run, but since it's a vLLM run, I'm expecting it's unrelated? Lmk what you think.
➜  lighteval git:(fix/tgi_inference) ✗ uv run --extra tests --extra dev pytest -xvvs /home/cpcdoy/projects/abwab.ai/lighteval/tests/

...
FAILED tests/slow_tests/test_vllm_model.py::test_vllm_model[examples/model_configs/vllm_model_config.yaml] - AssertionError: Differences found: {'values_changed': {"root['lighteval:agieval:logiqa-en:0']['acc']": {'new_value': 0.3, 'old_value': 0.2}}}
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 1 failed, 593 passed, 4 skipped, 8 warnings in 1637.09s (0:27:17) ==================================================

@cpcdoy cpcdoy requested a review from NathanHB June 23, 2025 14:43
@cpcdoy
Copy link
Contributor Author

cpcdoy commented Jul 8, 2025

Hello @NathanHB , just checking if there's any news on this PR? Lmk if I need to provide any support

@NathanHB
Copy link
Member

hey ! Sorry for the late review. I just retook a look and there's been a refacto of the codebase, this does not seems to affect your code that much but you would need to rename the request variable in the endpoint_model.py file for example.
and overall make sure it all works :)

@cpcdoy
Copy link
Contributor Author

cpcdoy commented Aug 15, 2025

Hey, no worries @NathanHB ! I've adapted the code to use doc instead of request variable after the refacto. I've fixed the tests to include the new grammar field (all of them pass) and I've re-run my own benchmark suite that uses lighteval with TGI as a backend to check that everything still works after the refacto. Everything looks good :)

@NathanHB
Copy link
Member

thanks ! Last thing, can you provide a config in which you use the grammar arg ? I will test locally to make sure everything is fine on this side

@cpcdoy
Copy link
Contributor Author

cpcdoy commented Aug 20, 2025

@NathanHB I have actually noticed that my uv env was using an older version of some of the files of lighteval in my benchmarking suite, so I actually had to make a few more changes to accommodate for your refactoring. I've also adapted generation_parameters to work the same way you're now doing it in other endpoints.

Also, all tests pass.

Furthermore, I have also created an example usage of a custom task that uses a publicly available dataset (emotion dataset) from HF Hub on a classification task that demonstrates the newly implemented constrained grammar generation feature using TGI. I added this example in examples/custom_tasks_templates/custom_task_classification_grammar_task.py and updated examples/model_configs/tgi_model.yaml accordingly.

How to run the example

Here's how to run it from the root of the lighteval directory:

  • [Optional] Remove lighteval cache before the run: rm -rf ~/.cache/huggingface/lighteval/*
  • Start a TGI server first:
model="unsloth/Qwen2.5-0.5B-Instruct"
volume=./data

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:3.3.4 --model-id $model
  • Run the lighteval task:
uv run --active --extra tgi lighteval endpoint tgi examples/model_configs/tgi_model.yaml "custom|emotion_classification|0|0" --custom-tasks examples/custom_tasks_templates/custom_task_classification_grammar_task.py --output-dir results --save-details --no-public-run --max-samples 10

Logs from the example run

TGI Logs

While running the lighteval task, you'll notice that TGI will register the request and the grammar too in its logs such as:

2025-08-20T13:50:20.563969Z  INFO text_generation_router_v3::radix: backends/v3/src/radix.rs:108: Prefix 0 - Suffix 325
2025-08-20T13:50:20.665852Z  INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("1-nvidia-geforce-rtx-3060")) context=Extension(None)}:generate{parameters=GenerateParameters { best_of: None, temperature: Some(0.1), repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: Some(0.9), typical_p: None, do_sample: false, max_new_tokens: Some(64), return_full_text: Some(false), stop: ["\n\n"], truncate: None, watermark: false, details: true, decoder_input_details: true, seed: None, top_n_tokens: None, grammar: Some(Json(Object {"type": String("object"), "properties": Object {"classification": Object {"type": String("string"), "description": String("Emotion classification from the provided list"), "enum": Array [String("sadness"), String("joy"), String("love"), String("anger"), String("fear"), String("surprise")]}}, "required": Array [String("classification")], "additionalProperties": Bool(false)})), adapter_id: None } total_time="102.706171ms" validation_time="778.556µs" queue_time="104.307µs" inference_time="101.823408ms" time_per_token="12.727926ms" seed="Some(5420590878626193495)"}: text_generation_router::server: router/src/server.rs:432: Success

lighteval logs

And lighteval will show logs such as:

[2025-08-20 15:50:20,810] [    INFO]: - Prediction: {'classification': 'joy'} (custom_task_classification_grammar_task.py:189)
[2025-08-20 15:50:20,810] [    INFO]: - Expected: joy (index: 1) (custom_task_classification_grammar_task.py:190)
[2025-08-20 15:50:20,811] [    INFO]: - Metrics: {'exact_match': 1.0, 'unknown_prediction': 0.0, 'total_samples': 1.0} (custom_task_classification_grammar_task.py:202)
[2025-08-20 15:50:20,811] [    INFO]: ✓ Correct prediction (custom_task_classification_grammar_task.py:204)

Please lmk if you have any questions!

Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey ! Thanks for the detailed message, I rechecked and still would change / remove a few things, otherwise much clearer PR :)

@cpcdoy cpcdoy requested a review from NathanHB August 20, 2025 16:06
@cpcdoy
Copy link
Contributor Author

cpcdoy commented Aug 21, 2025

Thank you for the reviews @NathanHB , I've applied everything! I've also improved a unit test for TGI caching by mocking the HTTP request for the /info route of the TGI server.

@NathanHB NathanHB merged commit da8466b into huggingface:main Aug 25, 2025
4 checks passed
NathanHB added a commit that referenced this pull request Sep 19, 2025
… Grammar Generation (#502)

* fix: Lighteval communication with TGI

* fix: JSON grammar constrained generation

* fix: unit tests +
add: dep in extra

* fix: request var => doc var after refactor

* fix: update test to support the new grammar field

* fix: TGI endpoint with the new refactor

* update: TGI model config in examples with the latest parameters

* add: example custom task on a classification dataset to demonstrate the usage of constrained grammar generation using TGI

* add: format example task

* fix: unit test

* add: adapt the  in the yaml config to use  similarly to the other endpoints

* clean: moved new task to community_tasks

* fix: format

* clean: delete unused grammar field

* del: grammar

* add: copyright at the top

* del: langcodes dep isn't needed anymore

* add: use load from file directly in the main endpoint

* del: newlines

* add: mock HTTP request for info to TGI server

---------

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants