Skip to content

[Upstream dependence changes] The behavior about env var in hf-hub has changed. #3088

@HairlessVillager

Description

@HairlessVillager

Sorry but I am not sure this is a bug report or feature request, so I mark it as the specific type Upstream dependence changes.😇

What happened?

The upstream dependence of this repo, hf-hub, modified some implementations on Dec 30, 2024 (https://github.com/huggingface/hf-hub/pull/85/files#), resulting in some corner cases 🤗TGI will not using the environment variables HF_ENDPOINT and HF_HOME.

What's the impact?

Now (🤗TGI version 3.1.1), users DO can download model files, and HF_ENDPOINT and HF_HOME works well, because it's huggingface_hub that is working.

But after huggingface_hub downloading, the 🤗TGI will use hf-hub to download some JSON files (imo it's also stupid because all the files have already been downloaded locally), like tokenizer_config.json, WITHOUT using the environment variables HF_ENDPOINT and HF_HOME.

This looks bad to me for the 2 reasons:

  1. This behavior is inconsistent with that of the huggingface_hub (HF_ENDPOINT and HF_HOME).
  2. For the impact of HF_ENDPOINT: In some countries and regions where HuggingFace cannot be accessed normally (such as China), users would set this variable to use mirror sites to download model files. After the models are downloaded, users might think everything is fine, little do they know that due to this loophole, some insidious bugs may emerge unnoticed, such as the following warning:
2025-03-08T10:34:46.108918Z  INFO text_generation_router::server: router/src/server.rs:1748: Using the Hugging Face API
2025-03-08T10:36:31.518562Z  WARN text_generation_router::server: router/src/server.rs:1791: Could not retrieve model info from the Hugging Face hub.
2025-03-08T10:36:31.521185Z  WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
2025-03-08T10:36:34.956591Z  INFO text_generation_router::server: router/src/server.rs:1881: Using config None
2025-03-08T10:36:35.122247Z  WARN text_generation_router::server: router/src/server.rs:1941: no pipeline tag found for model Qwen/Qwen2.5-1.5B-Instruct

, and this will lead some problems. One of them are as follows.

How to prove this is true?

For the impact of HF_ENDPOINT, there is a corner case to prove, also is a problem that has troubled me for a long time:

  1. Let's download and run a baby model from https://hf-mirror.com, which is a mirror site of HuggingFace and provide service mainly for Chinese users. Note that I've set the HF_ENDPOINT.
docker run --gpus all --shm-size 1g -p 8080:80 -v huggingface-models:/data -e HF_HUB_ENABLE_HF_TRANSFER=0 -e HF_ENDPOINT=https://hf-mirror.com ghcr.io/huggingface/text-generation-inference:3.1.1 --model-id Qwen/Qwen2.5-1.5B-Instruct
  1. The logs will show some warnings, which caused by this problem:
2025-03-08T10:34:46.108918Z  INFO text_generation_router::server: router/src/server.rs:1748: Using the Hugging Face API
2025-03-08T10:36:31.518562Z  WARN text_generation_router::server: router/src/server.rs:1791: Could not retrieve model info from the Hugging Face hub.
2025-03-08T10:36:31.521185Z  WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
2025-03-08T10:36:34.956591Z  INFO text_generation_router::server: router/src/server.rs:1881: Using config None
2025-03-08T10:36:35.122247Z  WARN text_generation_router::server: router/src/server.rs:1941: no pipeline tag found for model Qwen/Qwen2.5-1.5B-Instruct
  1. Use the following command to use the LLM in OpenAI provider:
curl localhost:8080/v1/chat/completions \
    -X POST \
    -d '{
  "model": "tgi",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is deep learning?"
    }
  ],
  "stream": true,
  "max_tokens": 20
}' \
    -H 'Content-Type: application/json'
  1. You will see the error:
{"error":"Template error: template not found","error_type":"template_error"}%

I've searched issues, and this is because 🤗TGI cannot find the tokenizer_config.json, then it set the chat_template to null. As the warning says:

2025-03-08T10:36:31.521185Z WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified

What to do?

Luckly we have at least 4 solutions:

  1. For 🤗TGI: DO NOT use hf-hub to download JSON files after huggingface_hub downloaded all files, because all the files have already been downloaded locally. It dose not make sense and may lead some problems.
  2. For hf-hub: Revert the Adding options for environment variables. hf-hub#85 for consistent of [huggingface_hub].
  3. For 🤗TGI:
    let mut builder = ApiBuilder::new()
    need a ApiBuilder instance, but it uses ApiBuilder::new() to create. It's better to use ApiBuilder::from_env() which calls the ApiBuilder::new() and also set the environment variables.
  4. For unlucky users: Use old versions of 🤗TGI which Adding options for environment variables. hf-hub#85 have not merged.

As for which solution to adopt, or even whether to address this issue, opinions may vary. Because this is just a corner case.


Lastly, I would like to point out that I am not aware if the changes in hf-hub were communicated or negotiated with the maintainers of 🤗TGI or other downstream projects. If not, this issue could be quite serious, and there is a good chance it could happen again, and the next time it occurs, it might not be just a simple corner case.

Best wishes for your guys.🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions