-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Sorry but I am not sure this is a bug report or feature request, so I mark it as the specific type Upstream dependence changes.😇
What happened?
The upstream dependence of this repo, hf-hub, modified some implementations on Dec 30, 2024 (https://github.com/huggingface/hf-hub/pull/85/files#), resulting in some corner cases 🤗TGI will not using the environment variables HF_ENDPOINT and HF_HOME.
What's the impact?
Now (🤗TGI version 3.1.1), users DO can download model files, and HF_ENDPOINT and HF_HOME works well, because it's huggingface_hub that is working.
But after huggingface_hub downloading, the 🤗TGI will use hf-hub to download some JSON files (imo it's also stupid because all the files have already been downloaded locally), like tokenizer_config.json, WITHOUT using the environment variables HF_ENDPOINT and HF_HOME.
This looks bad to me for the 2 reasons:
- This behavior is inconsistent with that of the
huggingface_hub(HF_ENDPOINTandHF_HOME). - For the impact of
HF_ENDPOINT: In some countries and regions where HuggingFace cannot be accessed normally (such as China), users would set this variable to use mirror sites to download model files. After the models are downloaded, users might think everything is fine, little do they know that due to this loophole, some insidious bugs may emerge unnoticed, such as the following warning:
2025-03-08T10:34:46.108918Z INFO text_generation_router::server: router/src/server.rs:1748: Using the Hugging Face API
2025-03-08T10:36:31.518562Z WARN text_generation_router::server: router/src/server.rs:1791: Could not retrieve model info from the Hugging Face hub.
2025-03-08T10:36:31.521185Z WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
2025-03-08T10:36:34.956591Z INFO text_generation_router::server: router/src/server.rs:1881: Using config None
2025-03-08T10:36:35.122247Z WARN text_generation_router::server: router/src/server.rs:1941: no pipeline tag found for model Qwen/Qwen2.5-1.5B-Instruct
, and this will lead some problems. One of them are as follows.
How to prove this is true?
For the impact of HF_ENDPOINT, there is a corner case to prove, also is a problem that has troubled me for a long time:
- Let's download and run a baby model from https://hf-mirror.com, which is a mirror site of HuggingFace and provide service mainly for Chinese users. Note that I've set the
HF_ENDPOINT.
docker run --gpus all --shm-size 1g -p 8080:80 -v huggingface-models:/data -e HF_HUB_ENABLE_HF_TRANSFER=0 -e HF_ENDPOINT=https://hf-mirror.com ghcr.io/huggingface/text-generation-inference:3.1.1 --model-id Qwen/Qwen2.5-1.5B-Instruct
- The logs will show some warnings, which caused by this problem:
2025-03-08T10:34:46.108918Z INFO text_generation_router::server: router/src/server.rs:1748: Using the Hugging Face API
2025-03-08T10:36:31.518562Z WARN text_generation_router::server: router/src/server.rs:1791: Could not retrieve model info from the Hugging Face hub.
2025-03-08T10:36:31.521185Z WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
2025-03-08T10:36:34.956591Z INFO text_generation_router::server: router/src/server.rs:1881: Using config None
2025-03-08T10:36:35.122247Z WARN text_generation_router::server: router/src/server.rs:1941: no pipeline tag found for model Qwen/Qwen2.5-1.5B-Instruct
- Use the following command to use the LLM in OpenAI provider:
curl localhost:8080/v1/chat/completions \
-X POST \
-d '{
"model": "tgi",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"stream": true,
"max_tokens": 20
}' \
-H 'Content-Type: application/json'- You will see the error:
{"error":"Template error: template not found","error_type":"template_error"}%
I've searched issues, and this is because 🤗TGI cannot find the tokenizer_config.json, then it set the chat_template to null. As the warning says:
2025-03-08T10:36:31.521185Z WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
What to do?
Luckly we have at least 4 solutions:
- For 🤗TGI: DO NOT use
hf-hubto download JSON files afterhuggingface_hubdownloaded all files, because all the files have already been downloaded locally. It dose not make sense and may lead some problems. - For
hf-hub: Revert the Adding options for environment variables. hf-hub#85 for consistent of [huggingface_hub]. - For 🤗TGI: need a
text-generation-inference/router/src/server.rs
Line 1714 in c34bd9d
let mut builder = ApiBuilder::new() ApiBuilderinstance, but it usesApiBuilder::new()to create. It's better to useApiBuilder::from_env()which calls theApiBuilder::new()and also set the environment variables. - For unlucky users: Use old versions of 🤗TGI which Adding options for environment variables. hf-hub#85 have not merged.
As for which solution to adopt, or even whether to address this issue, opinions may vary. Because this is just a corner case.
Lastly, I would like to point out that I am not aware if the changes in hf-hub were communicated or negotiated with the maintainers of 🤗TGI or other downstream projects. If not, this issue could be quite serious, and there is a good chance it could happen again, and the next time it occurs, it might not be just a simple corner case.
Best wishes for your guys.🙏