We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, I'm running this repo over RTX A6000 from Docker pytorch:23.12. I've checked that my environment is set up correctly by:
print(torch.cuda.get_arch_list())
['sm_52', 'sm_60', 'sm_61', 'sm_70', 'sm_72', 'sm_75', 'sm_80', 'sm_86', 'sm_87', 'sm_90', 'compute_90']
nvcc --version
Build cuda_12.3.r12.3/compiler.33567101_0
I've followed the installation guide, but run into an error when building
cd awq/kernels python setup.py install
Seems like the code tries to build itself with compute_70, while my machine has much higher one (sm_90).
compute_70
sm_90
Can you please help solving this issue configuring the right compute capability (sm) for my machine using docker?
sm
Error:
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 711; error : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors
Thanks
Full console log for running python setup.py install:
python setup.py install
running install /usr/local/lib/python3.10/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( /usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing awq_inference_engine.egg-info/PKG-INFO writing dependency_links to awq_inference_engine.egg-info/dependency_links.txt writing requirements to awq_inference_engine.egg-info/requires.txt writing top-level names to awq_inference_engine.egg-info/top_level.txt reading manifest file 'awq_inference_engine.egg-info/SOURCES.txt' writing manifest file 'awq_inference_engine.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'awq_inference_engine' extension creating /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new creating /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm creating /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemv Emitting ninja build file /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 FAILED: /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "ls" is undefined {ls -l /dev/nvidia* ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "l" is undefined {ls -l /dev/nvidia* ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "dev" is undefined {ls -l /dev/nvidia* ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "nvidia" is undefined {ls -l /dev/nvidia* ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(92): error: expected an expression __asm__ __volatile__( ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(96): warning #12-D: parsing restarts here after previous syntax error : "r"(addr)); ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" 5 errors detected in the compilation of "/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu". [2/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 FAILED: /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(34): warning #177-D: variable "ZERO" was declared but never referenced static constexpr uint32_t ZERO = 0x0; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced int blockIdx_x = 0; ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(65): warning #177-D: variable "ld_zero_flag" was declared but never referenced bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64; ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(21): warning #177-D: function "__pack_half2" was declared but never referenced __pack_half2(const half x, const half y) { ^ ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 711; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 715; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 719; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 723; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 727; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 731; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 735; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 739; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 743; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 747; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 751; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 755; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 759; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 763; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 767; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 771; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 823; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 827; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 831; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 835; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 839; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 843; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 847; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 851; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 855; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 859; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 863; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 867; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 871; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 875; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 879; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 883; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2187; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2191; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2195; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2199; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2203; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2207; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2211; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2215; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2219; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2223; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2227; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2231; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2235; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2239; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2243; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2247; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2299; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2303; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2307; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2311; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2315; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2319; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2323; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2327; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2331; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2335; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2339; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2343; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2347; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2351; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2355; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2359; error : Feature '.m16n8k16' requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(34): warning #177-D: variable "ZERO" was declared but never referenced static constexpr uint32_t ZERO = 0x0; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced int blockIdx_x = 0; ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(65): warning #177-D: variable "ld_zero_flag" was declared but never referenced bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64; ^ /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(21): warning #177-D: function "__pack_half2" was declared but never referenced __pack_half2(const half x, const half y) { ^ [3/9] c++ -MMD -MF /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/ft_attention.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/ft_attention.o -g -O3 -fopenmp -lgomp -std=c++17 -DENABLE_BF16 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp: In instantiation of ‘void set_params(Masked_multihead_attention_params<T>&, size_t, size_t, size_t, size_t, size_t, int, int, float, float, bool, int, T*, T*, T*, T*, T*, int*, float*, T*) [with T = short unsigned int; Masked_multihead_attention_params<T> = Multihead_attention_params<short unsigned int, false>; size_t = long unsigned int]’: /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:166:5: required from here /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:73:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘Masked_multihead_attention_params<short unsigned int>’ {aka ‘struct Multihead_attention_params<short unsigned int, false>’}; use assignment or value-initialization instead [-Wclass-memaccess] 73 | memset(¶ms, 0, sizeof(params)); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:8: /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.h:122:8: note: ‘Masked_multihead_attention_params<short unsigned int>’ {aka ‘struct Multihead_attention_params<short unsigned int, false>’} declared here 122 | struct Multihead_attention_params: public Multihead_attention_params_base<T> { | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp: In instantiation of ‘void set_params(Masked_multihead_attention_params<T>&, size_t, size_t, size_t, size_t, size_t, int, int, float, float, bool, int, T*, T*, T*, T*, T*, int*, float*, T*) [with T = __nv_bfloat16; Masked_multihead_attention_params<T> = Multihead_attention_params<__nv_bfloat16, false>; size_t = long unsigned int]’: /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:166:5: required from here /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:73:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘Masked_multihead_attention_params<__nv_bfloat16>’ {aka ‘struct Multihead_attention_params<__nv_bfloat16, false>’}; use assignment or value-initialization instead [-Wclass-memaccess] 73 | memset(¶ms, 0, sizeof(params)); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:8: /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.h:122:8: note: ‘Masked_multihead_attention_params<__nv_bfloat16>’ {aka ‘struct Multihead_attention_params<__nv_bfloat16, false>’} declared here 122 | struct Multihead_attention_params: public Multihead_attention_params_base<T> { | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp: In instantiation of ‘void set_params(Masked_multihead_attention_params<T>&, size_t, size_t, size_t, size_t, size_t, int, int, float, float, bool, int, T*, T*, T*, T*, T*, int*, float*, T*) [with T = float; Masked_multihead_attention_params<T> = Multihead_attention_params<float, false>; size_t = long unsigned int]’: /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:166:5: required from here /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:73:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘Masked_multihead_attention_params<float>’ {aka ‘struct Multihead_attention_params<float, false>’}; use assignment or value-initialization instead [-Wclass-memaccess] 73 | memset(¶ms, 0, sizeof(params)); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:8: /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.h:122:8: note: ‘Masked_multihead_attention_params<float>’ {aka ‘struct Multihead_attention_params<float, false>’} declared here 122 | struct Multihead_attention_params: public Multihead_attention_params_base<T> { | ^~~~~~~~~~~~~~~~~~~~~~~~~~ [4/9] c++ -MMD -MF /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/pybind.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/pybind.cpp -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/pybind.o -g -O3 -fopenmp -lgomp -std=c++17 -DENABLE_BF16 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 [5/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/position_embedding/pos_encoding_kernels.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/position_embedding/pos_encoding_kernels.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/position_embedding/pos_encoding_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 [6/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemv_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemv_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemv_cuda.cu(224): warning #177-D: variable "blockDim_z" was declared but never referenced int blockDim_z = num_out_feats; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemv_cuda.cu(224): warning #177-D: variable "blockDim_z" was declared but never referenced int blockDim_z = num_out_feats; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" [7/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/layernorm/layernorm.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/layernorm/layernorm.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/layernorm/layernorm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 [8/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemv/gemv_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemv/gemv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu(83): warning #177-D: variable "kShuffleSize" was declared but never referenced static constexpr int kShuffleSize = 32; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu(83): warning #177-D: variable "kShuffleSize" was declared but never referenced static constexpr int kShuffleSize = 32; ^ Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" [9/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/decoder_masked_multihead_attention.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/decoder_masked_multihead_attention.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(989): warning #177-D: variable "v_offset" was declared but never referenced int v_offset = k_offset; ^ detected during: instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(995): warning #177-D: variable "v_bias_offset" was declared but never referenced int v_bias_offset = k_bias_offset; ^ detected during: instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(989): warning #177-D: variable "v_offset" was declared but never referenced int v_offset = k_offset; ^ detected during: instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(995): warning #177-D: variable "v_bias_offset" was declared but never referenced int v_bias_offset = k_bias_offset; ^ detected during: instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2105, in _run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.10/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/install.py", line 74, in run self.do_egg_install() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/install.py", line 116, in do_egg_install self.run_command('bdist_egg') File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_egg.py", line 164, in run cmd = self.call_command('install_lib', warn_dir=0) File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_egg.py", line 150, in call_command self.run_command(cmdname) File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/install_lib.py", line 11, in run self.build() File "/usr/lib/python3.10/distutils/command/install_lib.py", line 109, in build self.run_command('build_ext') File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 876, in build_extensions build_ext.build_extensions(self) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension _build_ext.build_extension(self, ext) File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 135, in build_extension super(build_ext, self).build_extension(ext) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension objects = self.compiler.compile(sources, File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 689, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1777, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2121, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension python-BaseException
The text was updated successfully, but these errors were encountered:
Following #93 I was able to force the sm_90 by running: TORCH_CUDA_ARCH_LIST="9.0" python setup.py install.
TORCH_CUDA_ARCH_LIST="9.0" python setup.py install
Sorry, something went wrong.
No branches or pull requests
Hello,
I'm running this repo over RTX A6000 from Docker pytorch:23.12.
I've checked that my environment is set up correctly by:
print(torch.cuda.get_arch_list())
>>['sm_52', 'sm_60', 'sm_61', 'sm_70', 'sm_72', 'sm_75', 'sm_80', 'sm_86', 'sm_87', 'sm_90', 'compute_90']
nvcc --version
>>Build cuda_12.3.r12.3/compiler.33567101_0
I've followed the installation guide, but run into an error when building
Seems like the code tries to build itself with
compute_70
, while my machine has much higher one (sm_90
).Can you please help solving this issue configuring the right compute capability (
sm
) for my machine using docker?Error:
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 711; error : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors
Thanks
Full console log for running
python setup.py install
:The text was updated successfully, but these errors were encountered: