cscr/layout_norm编译不通过 #305

wuyangjiazhi · 2024-04-28T12:13:34Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

flash-attention/csrc/layer_norm# python setup.py install

torch.version = 2.3.0+cu121

running install
/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

    ********************************************************************************
    Please avoid running ``setup.py`` directly.
    Instead, use pypa/build, pypa/installer or other
    standards-based tools.

(py3.10) root@DESKTOP-US6L41J:/mnt/e/lujian/flash-attention/csrc/layer_norm#
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************

!!
self.initialize_options()
/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

    ********************************************************************************
    Please avoid running ``setup.py`` and ``easy_install``.
    Instead, use pypa/build, pypa/installer or other
    standards-based tools.

    See https://github.com/pypa/setuptools/issues/917 for details.
    ********************************************************************************

!!
self.initialize_options()
running bdist_egg
running egg_info
writing dropout_layer_norm.egg-info/PKG-INFO
writing dependency_links to dropout_layer_norm.egg-info/dependency_links.txt
writing top-level names to dropout_layer_norm.egg-info/top_level.txt
reading manifest file 'dropout_layer_norm.egg-info/SOURCES.txt'
writing manifest file 'dropout_layer_norm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'dropout_layer_norm' extension
Emitting ninja build file /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
g++ -pthread -B /root/anaconda3/envs/py3.10/compiler_compat -shared -Wl,-rpath,/root/anaconda3/envs/py3.10/lib -Wl,-rpath-link,/root/anaconda3/envs/py3.10/lib -L/root/anaconda3/envs/py3.10/lib -Wl,-rpath,/root/anaconda3/envs/py3.10/lib -Wl,-rpath-link,/root/anaconda3/envs/py3.10/lib -L/root/anaconda3/envs/py3.10/lib /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_api.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_8192.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_8192.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_8192.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1024.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1280.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1536.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2048.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_256.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2560.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_3072.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_4096.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_512.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_5120.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_6144.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_7168.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_768.o /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_8192.o -L/root/anaconda3/envs/py3.10/lib/python3.10/site-packages/torch/lib -L/usr/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/dropout_layer_norm.cpython-310-x86_64-linux-gnu.so
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_bwd_2048.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_3072.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_4096.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_512.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_5120.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_fwd_7168.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_7168.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_768.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_bwd_8192.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1024.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1280.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_1536.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2048.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_256.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_2560.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_3072.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_4096.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_512.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_5120.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_6144.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_7168.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_768.o: No such file or directory
/root/anaconda3/envs/py3.10/compiler_compat/ld: cannot find /mnt/e/lujian/flash-attention/csrc/layer_norm/build/temp.linux-x86_64-cpython-310/ln_parallel_fwd_8192.o: No such file or directory
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1

期望行为 | Expected Behavior

No response

运行环境 | Environment

- OS:wsl2+Ubuntu20.04
- NVIDIA Driver:546.29
- CUDA:12.1
- docker:docker2
- docker-compose:docker2
- NVIDIA GPU:RTX 3090
- NVIDIA GPU Memory:24G
- GCC G++ 12.2.0

QAnything日志 | QAnything logs

anything-container-local |
qanything-container-local | =============================
qanything-container-local | == Triton Inference Server ==
qanything-container-local | =============================
qanything-container-local |
qanything-container-local | NVIDIA Release 23.05 (build 61161506)
qanything-container-local | Triton Server Version 2.34.0
qanything-container-local |
qanything-container-local | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
qanything-container-local | By pulling and using the container, you accept the terms and conditions of this license:
qanything-container-local | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
qanything-container-local |
qanything-container-local | llm_api is set to [local]
qanything-container-local | device_id is set to [0]
qanything-container-local | runtime_backend is set to [hf]
qanything-container-local | model_name is set to [Qwen-7B-QAnything]
qanything-container-local | conv_template is set to [qwen-7b-qanything]
qanything-container-local | tensor_parallel is set to [1]
qanything-container-local | gpu_memory_utilization is set to [0.81]
qanything-container-local | checksum 8a3fe055906d2f09875bc5a0631de64f
qanything-container-local | default_checksum 8a3fe055906d2f09875bc5a0631de64f
qanything-container-local | GPU ID: 0, 0
qanything-container-local | GPU1 Model: NVIDIA GeForce RTX 3090
qanything-container-local | Compute Capability: 8.6
qanything-container-local | OCR_USE_GPU=True because 8.6 >= 7.5
qanything-container-local | ====================================================
qanything-container-local | ******************** 重要提示 ********************
qanything-container-local | ====================================================
qanything-container-local |
qanything-container-local | 您当前的显存为 24576 MiB 推荐部署7B模型
qanything-container-local | The triton server for embedding and reranker will start on 0 GPUs
qanything-container-local | Executing hf runtime_backend
qanything-container-local | The rerank service is ready! (2/8)
qanything-container-local | rerank服务已就绪! (2/8)
qanything-container-local | The ocr service is ready! (3/8)
qanything-container-local | OCR服务已就绪! (3/8)
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | /workspace/qanything_local/scripts/run_for_local_option.sh: line 401: 166 Killed CUDA_VISIBLE_DEVICES=$gpus nohup python3 -m fastchat.serve.model_worker --host 0.0.0.0 --port 7801 --controller-address http://0.0.0.0:7800 --worker-address http://0.0.0.0:7801 --model-path /model_repos/CustomLLM/$LLM_API_SERVE_MODEL --load-8bit --gpus $gpus --num-gpus $tensor_parallel --dtype bfloat16 --conv-template $LLM_API_SERVE_CONV_TEMPLATE > /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 2>&1 (wd: /workspace/qanything_local/logs/debug_logs/fastchat_logs)
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | The qanything backend service is ready! (4/8)
qanything-container-local | qanything后端服务已就绪! (4/8)
qanything-container-local | I0428 11:45:41.220726 152 grpc_server.cc:377] Thread started for CommonHandler
qanything-container-local | I0428 11:45:41.221539 152 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local | I0428 11:45:41.221886 152 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local | I0428 11:45:41.222256 152 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local | I0428 11:45:41.222535 152 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local | I0428 11:45:41.222877 152 stream_infer_handler.cc:122] New request handler for ModelStreamInferHandler, 0
qanything-container-local | I0428 11:45:41.223303 152 infer_handler.h:1025] Thread started for ModelStreamInferHandler
qanything-container-local | I0428 11:45:41.223685 152 grpc_server.cc:2450] Started GRPCInferenceService at 0.0.0.0:9001qanything-container-local | I0428 11:45:41.224531 152 http_server.cc:3555] Started HTTPService at 0.0.0.0:9000
qanything-container-local | I0428 11:45:41.266425 152 http_server.cc:185] Started Metrics Service at 0.0.0.0:9002
qanything-container-local | I0428 11:46:27.277619 152 http_server.cc:3449] HTTP request: 0 /v2/health/ready
qanything-container-local | The embedding and rerank service is ready!. (7.5/8)
qanything-container-local | Embedding 和 Rerank 服务已准备就绪！(7.5/8)
qanything-container-local | 2024-04-28 19:45:41 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/Qwen-7B-QAnything', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='qwen-7b-qanything', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
qanything-container-local | 2024-04-28 19:45:41 | INFO | model_worker | Loading the model ['Qwen-7B-QAnything'] on worker b83b3047 ...
qanything-container-local | 2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get better performance https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
qanything-container-local | 2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention
0%| | 0/2 [00:00<?, ?it/s]28 19:45:44 | ERROR | stderr |
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 64 0 --:--:-- --:--:-- --:--:-- 64
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 3561 0 --:--:-- --:--:-- --:--:-- 4333
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8713 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8977 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8754 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9084 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8831 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8119 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8575 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8430 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8044 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9090 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9826 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8513 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8150 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7497 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 9034 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7606 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 10124 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8119 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 6086 0 --:--:-- --:--:-- --:--:-- 6500
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8089 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8130 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7099 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 10815 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7970 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7921 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7779 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 8222 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 13 100 13 0 0 7382 0 --:--:-- --:--:-- --:--:-- 13000
qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :)
qanything-container-local | LLM 服务正在启动，可能需要一段时间...你有时间去冲杯咖啡 :)
qanything-container-local | 启动 LLM 服务超时，自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error...
qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。

容器日志/workspace/qanything_local/logs/debug_logs/fastchat_logs# vi fschat_model_worker_7801.log
2024-04-28 19:45:41 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/Qwen-7B-QAnything', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='qwen-7b-qanything', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-04-28 19:45:41 | INFO | model_worker | Loading the model ['Qwen-7B-QAnything'] on worker b83b3047 ...
2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get better performance https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
2024-04-28 19:45:42 | INFO | stdout | Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention
2024-04-28 19:45:44 | ERROR | stderr | ^M 0%| | 0/2 [00:00<?, ?it/s]

复现方法 | Steps To Reproduce

No response

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

Frank1212123 · 2024-04-28T12:25:34Z

同样的问题，没解决

ye-jeck · 2024-05-13T06:28:12Z

兄弟现在解决了吗，运行了好几天，一直都是这个超时错误，也找不到报错信息

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cscr/layout_norm编译不通过 #305

cscr/layout_norm编译不通过 #305

wuyangjiazhi commented Apr 28, 2024

Frank1212123 commented Apr 28, 2024

ye-jeck commented May 13, 2024

cscr/layout_norm编译不通过 #305

cscr/layout_norm编译不通过 #305

Comments

wuyangjiazhi commented Apr 28, 2024

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

运行环境 | Environment

QAnything日志 | QAnything logs

复现方法 | Steps To Reproduce

备注 | Anything else?

Frank1212123 commented Apr 28, 2024

ye-jeck commented May 13, 2024