[Upstream dependence changes] The behavior about env var in `hf-hub` has changed.

Sorry but I am not sure this is a bug report or feature request, so I mark it as the specific type `Upstream dependence changes`.😇

## What happened?

The upstream dependence of this repo, `hf-hub`, modified some implementations on Dec 30, 2024 (https://github.com/huggingface/hf-hub/pull/85/files#), resulting in some corner cases 🤗TGI will not using the environment variables `HF_ENDPOINT` and `HF_HOME`.

## What's the impact？

Now (🤗TGI version 3.1.1), users DO can download model files, and `HF_ENDPOINT` and `HF_HOME` works well, because it's `huggingface_hub` that is working.

But after `huggingface_hub` downloading, the 🤗TGI will use `hf-hub` to download some JSON files (imo it's also stupid because all the files have already been downloaded locally), like `tokenizer_config.json`, WITHOUT using the environment variables `HF_ENDPOINT` and `HF_HOME`.

This looks bad to me for the 2 reasons:
1. This behavior is inconsistent with that of the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) ([`HF_ENDPOINT`](https://github.com/huggingface/huggingface_hub/blob/06c4fbac44fb7f2b81e26ed17917e7a9258b1dff/src/huggingface_hub/constants.py#L66) and [`HF_HOME`](https://github.com/huggingface/huggingface_hub/blob/06c4fbac44fb7f2b81e26ed17917e7a9258b1dff/src/huggingface_hub/constants.py#L116)).
2. For the impact of `HF_ENDPOINT`: In some countries and regions where HuggingFace cannot be accessed normally (such as China), users would set this variable to use mirror sites to download model files. After the models are downloaded, users might think everything is fine, little do they know that due to this loophole, some insidious bugs may emerge unnoticed, such as the following warning:
```text
2025-03-08T10:34:46.108918Z  INFO text_generation_router::server: router/src/server.rs:1748: Using the Hugging Face API
2025-03-08T10:36:31.518562Z  WARN text_generation_router::server: router/src/server.rs:1791: Could not retrieve model info from the Hugging Face hub.
2025-03-08T10:36:31.521185Z  WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
2025-03-08T10:36:34.956591Z  INFO text_generation_router::server: router/src/server.rs:1881: Using config None
2025-03-08T10:36:35.122247Z  WARN text_generation_router::server: router/src/server.rs:1941: no pipeline tag found for model Qwen/Qwen2.5-1.5B-Instruct
```
, and this will lead some problems. One of them are as follows.

## How to prove this is true?

For the impact of `HF_ENDPOINT`, there is a corner case to prove, also is a problem that has troubled me for a long time:

1. Let's download and run a baby model from https://hf-mirror.com, which is a mirror site of HuggingFace and provide service mainly for Chinese users. Note that I've set the `HF_ENDPOINT`.
```
docker run --gpus all --shm-size 1g -p 8080:80 -v huggingface-models:/data -e HF_HUB_ENABLE_HF_TRANSFER=0 -e HF_ENDPOINT=https://hf-mirror.com ghcr.io/huggingface/text-generation-inference:3.1.1 --model-id Qwen/Qwen2.5-1.5B-Instruct
```
2. The logs will show some warnings, which caused by this problem:
```text
2025-03-08T10:34:46.108918Z  INFO text_generation_router::server: router/src/server.rs:1748: Using the Hugging Face API
2025-03-08T10:36:31.518562Z  WARN text_generation_router::server: router/src/server.rs:1791: Could not retrieve model info from the Hugging Face hub.
2025-03-08T10:36:31.521185Z  WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified
2025-03-08T10:36:34.956591Z  INFO text_generation_router::server: router/src/server.rs:1881: Using config None
2025-03-08T10:36:35.122247Z  WARN text_generation_router::server: router/src/server.rs:1941: no pipeline tag found for model Qwen/Qwen2.5-1.5B-Instruct
```
3. Use the following command to use the LLM in OpenAI provider:
```bash
curl localhost:8080/v1/chat/completions \
    -X POST \
    -d '{
  "model": "tgi",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is deep learning?"
    }
  ],
  "stream": true,
  "max_tokens": 20
}' \
    -H 'Content-Type: application/json'
```
4. You will see the error:
```text
{"error":"Template error: template not found","error_type":"template_error"}%
```

I've searched issues, and this is because 🤗TGI cannot find the `tokenizer_config.json`, then it set the `chat_template` to null. As the warning says:

> 2025-03-08T10:36:31.521185Z  WARN text_generation_router::server: router/src/server.rs:1826: Could not find tokenizer config locally and no API specified

## What to do?

Luckly we have at least 4 solutions:
1. For 🤗TGI: DO NOT use `hf-hub` to download JSON files after `huggingface_hub` downloaded all files, because all the files have already been downloaded locally. It dose not make sense and may lead some problems.
2. For `hf-hub`: Revert the https://github.com/huggingface/hf-hub/pull/85 for consistent of [`huggingface_hub`].
3. For 🤗TGI: https://github.com/huggingface/text-generation-inference/blob/c34bd9d8d97f7b93e55772e44ced4e27c649977a/router/src/server.rs#L1714 need a `ApiBuilder` instance, but it uses [`ApiBuilder::new()`](https://github.com/huggingface/hf-hub/blob/357331db2195bacf5f3eb43544958171770b41ff/src/api/tokio.rs#L232) to create. It's better to use [`ApiBuilder::from_env()`](https://github.com/huggingface/hf-hub/blob/357331db2195bacf5f3eb43544958171770b41ff/src/api/tokio.rs#L245) which calls the `ApiBuilder::new()` and also set the environment variables.
4. For unlucky users: Use old versions of 🤗TGI which https://github.com/huggingface/hf-hub/pull/85 have not merged.

As for which solution to adopt, or even whether to address this issue, opinions may vary. Because this is just a corner case.

---

Lastly, I would like to point out that I am not aware if the changes in `hf-hub` were communicated or negotiated with the maintainers of 🤗TGI or other downstream projects. If not, this issue could be quite serious, and there is a good chance it could happen again, and the next time it occurs, it might not be just a simple corner case.

Best wishes for your guys.🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Upstream dependence changes] The behavior about env var in `hf-hub` has changed. #3088

What happened?

What's the impact？

How to prove this is true?

What to do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Upstream dependence changes] The behavior about env var in hf-hub has changed. #3088

Description

What happened?

What's the impact？

How to prove this is true?

What to do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Upstream dependence changes] The behavior about env var in `hf-hub` has changed. #3088