Priority
Undecided
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
commit id: a3f9811
Description
Codegen UI shows meaningless for the input of "Write a python program to add two numbers and return the result" in the UI:

Also, the RESTful API call also shows empty response:
$ curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{"messages": "Write a python program to add two numbers and return the result."}'
data: b'\n'
data: b'\n'
data: b'\n'
data: b'\n'
data: b'\n'
data: b'\n'
data: b'\n'
<... omit ...>
data: [DONE]
Reproduce steps
launch the codegen using docker-compose or helm following the guide, and enter "Write a python program to add two numbers and return the result" in the UI.
Raw log
$ sudo -E docker compose -f compose.yaml logs tgi-service
<... omit ...>
tgi-gaudi-server | 2024-10-23T05:48:39.521479Z INFO text_generation_router: router/src/main.rs:383: Setting max batch total tokens to 16000
tgi-gaudi-server | 2024-10-23T05:48:39.521481Z INFO text_generation_router: router/src/main.rs:384: Connected
tgi-gaudi-server | 2024-10-23T05:48:39.521483Z WARN text_generation_router: router/src/main.rs:398: Invalid hostname, defaulting to 0.0.0.0
tgi-gaudi-server | 2024-10-23T05:50:49.348226Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("gpu+optimized"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.01), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="13.222245291s" validation_time="300.277µs" queue_time="31.093µs" inference_time="13.221913991s" time_per_token="12.912025ms" seed="Some(4910319861900315300)"}: text_generation_router::server: router/src/server.rs:513: Success
tgi-gaudi-server | 2024-10-23T05:52:25.297449Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("gpu+optimized"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.01), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="12.236049418s" validation_time="211.502µs" queue_time="26.238µs" inference_time="12.235811747s" time_per_token="11.949034ms" seed="Some(575827429751891741)"}: text_generation_router::server: router/src/server.rs:513: Success
tgi-gaudi-server | 2024-10-23T05:57:48.390920Z INFO compat_generate{default_return_full_text=true compute_type=Extension(ComputeType("gpu+optimized"))}:generate_stream{parameters=GenerateParameters { best_of: None, temperature: Some(0.01), repetition_penalty: Some(1.0), frequency_penalty: None, top_k: Some(10), top_p: Some(0.95), typical_p: None, do_sample: false, max_new_tokens: Some(1024), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: None } total_time="12.230389139s" validation_time="354.88µs" queue_time="17.14µs" inference_time="12.230017194s" time_per_token="11.943376ms" seed="Some(9846498353026713877)"}: text_generation_router::server: router/src/server.rs:513: Success
$ sudo -E docker compose -f compose.yaml logs codegen-gaudi-backend-server
codegen-gaudi-backend-server | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_name_or_path" in Audio2TextDoc has conflict with protected namespace "model_".
codegen-gaudi-backend-server |
codegen-gaudi-backend-server | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
codegen-gaudi-backend-server | warnings.warn(
codegen-gaudi-backend-server | [2024-10-23 05:48:06,227] [ INFO] - Base service - CORS is enabled.
codegen-gaudi-backend-server | [2024-10-23 05:48:06,228] [ INFO] - Base service - Setting up HTTP server
codegen-gaudi-backend-server | [2024-10-23 05:48:06,229] [ INFO] - Base service - Uvicorn server setup on port 7778
codegen-gaudi-backend-server | INFO: Waiting for application startup.
codegen-gaudi-backend-server | INFO: Application startup complete.
codegen-gaudi-backend-server | INFO: Uvicorn running on http://0.0.0.0:7778 (Press CTRL+C to quit)
codegen-gaudi-backend-server | [2024-10-23 05:48:06,239] [ INFO] - Base service - HTTP server setup successful
codegen-gaudi-backend-server | INFO: 100.83.122.254:57922 - "OPTIONS /v1/codegen HTTP/1.1" 200 OK
codegen-gaudi-backend-server | INFO: 100.83.122.254:57922 - "POST /v1/codegen HTTP/1.1" 200 OK
codegen-gaudi-backend-server | INFO: 100.83.122.254:57230 - "POST /v1/codegen HTTP/1.1" 200 OK
codegen-gaudi-backend-server | INFO: 100.83.122.254:37096 - "POST /v1/codegen HTTP/1.1" 200 OK
Priority
Undecided
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
commit id: a3f9811
Description
Codegen UI shows meaningless for the input of "Write a python program to add two numbers and return the result" in the UI:
Also, the RESTful API call also shows empty response:
Reproduce steps
launch the codegen using docker-compose or helm following the guide, and enter "Write a python program to add two numbers and return the result" in the UI.
Raw log