[Bug] Seems to be a Problem Installing to Docker on WSL2/Ubuntu 

### Priority

Undecided

### OS type

Ubuntu

### Hardware type

GPU-Nvidia

### Installation method

- [X] Pull docker images from hub.docker.com
- [ ] Build docker images from source

### Deploy method

- [ ] Docker compose
- [ ] Docker
- [ ] Kubernetes
- [ ] Helm

### Running nodes

Single Node

### What's the version?

I used the instructions on this web page to install: https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7

Docker version locally is Docker version 27.3.1, build ce12230. Host OS is Windows 11, using WSL2 Ubuntu (Linux LAPTOP-60F4I00F 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux). GPU is nVidia GEFORCE RTX 4070. 

### Description

Seems like some sort of problem with Hugging Face TGI download:

2024-09-26T00:47:24.391621Z  INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
2024-09-26T00:47:24.393715Z  INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
2024-09-26T00:47:24.393727Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
2024-09-26T00:47:24.393730Z  INFO text_generation_launcher: Default `max_total_tokens` to 4096
2024-09-26T00:47:24.393730Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
2024-09-26T00:47:24.393851Z  INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
2024-09-26T00:47:34.970340Z  WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
2024-09-26T00:47:34.970364Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
Error: DownloadError
2024-09-26T00:49:39.199001Z ERROR download: text_generation_launcher: Download process was signaled to shutdown with signal 9:
2024-09-26 00:47:29.556 | INFO     | text_generation_server.utils.import_utils:<module>:75 - Detected system ipex
/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
  warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")

### Reproduce steps

Follow guide from this URL (I definitely included my Hugging Face key in the .env etc.): https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7


### Raw log

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Seems to be a Problem Installing to Docker on WSL2/Ubuntu #884

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Seems to be a Problem Installing to Docker on WSL2/Ubuntu #884

Description

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions