Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] win11 wsl2 docker MiniChat-2-3B “legacy=False”启动错误 #325

Open
2 tasks done
ye-jeck opened this issue May 10, 2024 · 1 comment
Open
2 tasks done

Comments

@ye-jeck
Copy link

ye-jeck commented May 10, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 5993 0 --:--:-- --:--:-- --:--:-- 6500
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9615 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9855 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 3816 0 --:--:-- --:--:-- --:--:-- 4333
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | 启动 LLM 服务超时,自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error...
qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。

期望行为 | Expected Behavior

如何解决该报错

运行环境 | Environment

- OS: Ubuntu-22.04 / Windows 11 WSL2
- NVIDIA Driver: 525.105.17
- CUDA: 11.8
- docker: 25.0.2
- docker-compose: 2.24.3
- NVIDIA GPU: RTX 4080
- NVIDIA GPU Memory: 16G

QAnything日志 | QAnything logs

WARNING 05-10 15:16:59 config.py:457] Casting torch.float16 to torch.bfloat16.
INFO 05-10 15:16:59 llm_engine.py:70] Initializing an LLM engine with config: model='/model_repos/CustomLLM/MiniChat-2-3B', tokenizer='/model_repos/CustomLLM/MiniChat-2-3B', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565

复现方法 | Steps To Reproduce

1.bash ./run.sh -c local -i 0 -b vllm -m MiniChat-2-3B -t minichat -p 1 -r 0.81

备注 | Anything else?

No response

@wldgntlmn
Copy link

wldgntlmn commented Jun 29, 2024

同样的问题

终端信息

qanything-container-local  | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未 检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log  以获取更多信息。

fschat_model_worker_7801.log:

2024-06-29 23:10:31 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/MiniChat-2-3B', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='minichat', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-06-29 23:10:31 | INFO | model_worker | Loading the model ['MiniChat-2-3B'] on worker e0f4ff25 ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-06-29 23:10:32 | ERROR | stderr | 
  0%|          | 0/1 [00:00<?, ?it/s]

运行环境

- OS: Windows 11 with WSL2 Ubuntu-22.04
- NVIDIA Driver: 556.12
- docker: 26.1.4
- docker-compose: 2.27.1-desktop.1
- GPU 处理器:NVIDIA GeForce RTX 4060 Laptop GPU
- CUDA 核心:3072
- 专用视频内存:8188 MB GDDR6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants