Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cscr/layout_norm编译不通过 #305

Open
2 tasks done
wuyangjiazhi opened this issue Apr 28, 2024 · 2 comments
Open
2 tasks done

cscr/layout_norm编译不通过 #305

wuyangjiazhi opened this issue Apr 28, 2024 · 2 comments

Comments

@wuyangjiazhi
Copy link

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

flash-attention/csrc/layer_norm# python setup.py install

torch.version = 2.3.0+cu121

running install
/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

    ********************************************************************************
    Please avoid running ``setup.py`` directly.
    Instead, use pypa/build, pypa/installer or other
    standards-based tools.

(py3.10) root@DESKTOP-US6L41J:/mnt/e/lujian/flash-attention/csrc/layer_norm#
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************

!!
self.initialize_options()
/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

    ********************************************************************************
    Please avoid running ``setup.py`` and ``easy_install``.
    Instead, use pypa/build, pypa/installer or other
    standards-based tools.

    See https://github.com/pypa/setuptools/issues/917 for details.
    ********************************************************************************

!!
self.initialize_options()
running bdist_egg
running egg_info
writing dropout_layer_norm.egg-info/PKG-INFO
writing dependency_links to dropout_layer_norm.egg-info/dependency_links.txt
writing top-level names to dropout_layer_norm.egg-info/top_level.txt
reading manifest file 'dropout_layer_norm.egg-info/SOURCES.txt'
writing manifest file 'dropout_layer_norm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'dropout_layer_norm' extension
Emitting ninja build file /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
g++ -pthread -B /root/anaconda3/envs/py3.10/compiler_compat -shared -Wl,-rpath,/root/anaconda3/envs/py3.10/lib -Wl,-rpath-link,/root/anaconda3/envs/py3.10/lib -L/root/anaconda3/envs/py3.10/lib -Wl,-rpath,/root/anaconda3/envs/py3.10/lib -Wl,-rpath-link,/root/anaconda3/envs/py3.10/lib -L/root/anaconda3/envs/py3.10/lib /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_api.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_8192.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_8192.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_8192.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_8192.o -L/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/torch/lib -L/usr/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/dropout_layer_norm.cpython-310-x86_64-linux-gnu.so
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_2048.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_3072.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_4096.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_512.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_5120.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_7168.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_7168.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_768.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_8192.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1024.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1280.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1536.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2048.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_256.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2560.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_3072.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_4096.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_512.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_5120.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_6144.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_7168.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_768.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_8192.o: No such file or directory
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1

期望行为 | Expected Behavior

No response

运行环境 | Environment

- OS:wsl2+Ubuntu20.04
- NVIDIA Driver:546.29
- CUDA:12.1
- docker:docker2
- docker-compose:docker2
- NVIDIA GPU:RTX 3090
- NVIDIA GPU Memory:24G
- GCC G++ 12.2.0

QAnything日志 | QAnything logs

anything-container-local |
qanything-container-local | =============================
qanything-container-local | == Triton Inference Server ==
qanything-container-local | =============================
qanything-container-local |
qanything-container-local | NVIDIA Release 23.05 (build 61161506)
qanything-container-local | Triton Server Version 2.34.0
qanything-container-local |
qanything-container-local | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
qanything-container-local | By pulling and using the container, you accept the terms and conditions of this license:
qanything-container-local | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
qanything-container-local |
qanything-container-local | llm_api is set to [local]
qanything-container-local | device_id is set to [0]
qanything-container-local | runtime_backend is set to [hf]
qanything-container-local | model_name is set to [Qwen-7B-QAnything]
qanything-container-local | conv_template is set to [qwen-7b-qanything]
qanything-container-local | tensor_parallel is set to [1]
qanything-container-local | gpu_memory_utilization is set to [0.81]
qanything-container-local | checksum 8a3fe055906d2f09875bc5a0631de64f
qanything-container-local | default_checksum 8a3fe055906d2f09875bc5a0631de64f
qanything-container-local | GPU ID: 0, 0
qanything-container-local | GPU1 Model: NVIDIA GeForce RTX 3090
qanything-container-local | Compute Capability: 8.6
qanything-container-local | OCR_USE_GPU=True because 8.6 >= 7.5
qanything-container-local | ====================================================
qanything-container-local | ******************** 重要提示 ********************
qanything-container-local | ====================================================
qanything-container-local |
qanything-container-local | 您当前的显存为 24576 MiB 推荐部署7B模型
qanything-container-local | The triton server for embedding and reranker will start on 0 GPUs
qanything-container-local | Executing hf runtime_backend
qanything-container-local | The rerank service is ready! (2/8)
qanything-container-local | rerank服务已就绪! (2/8)
qanything-container-local | The ocr service is ready! (3/8)
qanything-container-local | OCR服务已就绪! (3/8)
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | /workspace/qanything_local/scripts/run_for_local_option.sh: line 401: 166 Killed CUDA_VISIBLE_DEVICES=$gpus nohup python3 -m fastchat.serve.model_worker --host 0.0.0.0 --port 7801 --controller-address http://0.0.0.0:7800 --worker-address http://0.0.0.0:7801 --model-path /model_repos/CustomLLM/$LLM_API_SERVE_MODEL --load-8bit --gpus $gpus --num-gpus $tensor_parallel --dtype bfloat16 --conv-template $LLM_API_SERVE_CONV_TEMPLATE > /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 2>&1 (wd: /workspace/qanything_local/logs/debug_logs/fastchat_logs)
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | The qanything backend service is ready! (4/8)
qanything-container-local | qanything后端服务已就绪! (4/8)
qanything-container-local | I0428 11:45:41.220726 152 grpc_server.cc:377] Thread started for CommonHandler
qanything-container-local | I0428 11:45:41.221539 152 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local | I0428 11:45:41.221886 152 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local | I0428 11:45:41.222256 152 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local | I0428 11:45:41.222535 152 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local | I0428 11:45:41.222877 152 stream_infer_handler.cc:122] New request handler for ModelStreamInferHandler, 0
qanything-container-local | I0428 11:45:41.223303 152 infer_handler.h:1025] Thread started for ModelStreamInferHandler
qanything-container-local | I0428 11:45:41.223685 152 grpc_server.cc:2450] Started GRPCInferenceService at 0.0.0.0:9001qanything-container-local | I0428 11:45:41.224531 152 http_server.cc:3555] Started HTTPService at 0.0.0.0:9000
qanything-container-local | I0428 11:45:41.266425 152 http_server.cc:185] Started Metrics Service at 0.0.0.0:9002
qanything-container-local | I0428 11:46:27.277619 152 http_server.cc:3449] HTTP request: 0 /v2/health/ready
qanything-container-local | The embedding and rerank service is ready!. (7.5/8)
qanything-container-local | Embedding 和 Rerank 服务已准备就绪!(7.5/8)
qanything-container-local | 2024-04-28 19:45:41 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/Qwen-7B-QAnything', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='qwen-7b-qanything', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
qanything-container-local | 2024-04-28 19:45:41 | INFO | model_worker | Loading the model ['Qwen-7B-QAnything'] on worker b83b3047 ...
qanything-container-local | 2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get better performance https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
qanything-container-local | 2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention
0%| | 0/2 [00:00<?, ?it/s]28 19:45:44 | ERROR | stderr |
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 64 0 --:--:-- --:--:-- --:--:-- 64
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 3561 0 --:--:-- --:--:-- --:--:-- 4333
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8713 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8977 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8754 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9084 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8831 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8119 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8575 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8430 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8044 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9090 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9826 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8513 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8150 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7497 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9034 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7606 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 10124 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8119 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 6086 0 --:--:-- --:--:-- --:--:-- 6500
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8089 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8130 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7099 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 10815 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7970 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7921 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7779 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8222 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7382 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | 启动 LLM 服务超时,自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error...
qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。

容器日志/workspace/qanything_local/logs/debug_logs/fastchat_logs# vi fschat_model_worker_7801.log
2024-04-28 19:45:41 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/Qwen-7B-QAnything', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='qwen-7b-qanything', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-04-28 19:45:41 | INFO | model_worker | Loading the model ['Qwen-7B-QAnything'] on worker b83b3047 ...
2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get better performance https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention
2024-04-28 19:45:44 | ERROR | stderr | ^M 0%| | 0/2 [00:00<?, ?it/s]

复现方法 | Steps To Reproduce

No response

备注 | Anything else?

No response

@Frank1212123
Copy link

同样的问题,没解决

@ye-jeck
Copy link

ye-jeck commented May 13, 2024

兄弟现在解决了吗,运行了好几天,一直都是这个超时错误,也找不到报错信息

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants