Invalid Compute Capability when building Docker pytorch:23.12 #198

razpa · 2024-06-09T12:50:49Z

Hello,
I'm running this repo over RTX A6000 from Docker pytorch:23.12.
I've checked that my environment is set up correctly by:

print(torch.cuda.get_arch_list()) >> ['sm_52', 'sm_60', 'sm_61', 'sm_70', 'sm_72', 'sm_75', 'sm_80', 'sm_86', 'sm_87', 'sm_90', 'compute_90']
nvcc --version >> Build cuda_12.3.r12.3/compiler.33567101_0

I've followed the installation guide, but run into an error when building

cd awq/kernels
python setup.py install

Seems like the code tries to build itself with compute_70, while my machine has much higher one (sm_90).

Can you please help solving this issue configuring the right compute capability (sm) for my machine using docker?

Error:

ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 711; error : Feature '.m16n8k16' requires .target sm_80 or higher

ptxas fatal : Ptx assembly aborted due to errors

Thanks

Full console log for running python setup.py install:

running install
/usr/local/lib/python3.10/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing awq_inference_engine.egg-info/PKG-INFO
writing dependency_links to awq_inference_engine.egg-info/dependency_links.txt
writing requirements to awq_inference_engine.egg-info/requires.txt
writing top-level names to awq_inference_engine.egg-info/top_level.txt
reading manifest file 'awq_inference_engine.egg-info/SOURCES.txt'
writing manifest file 'awq_inference_engine.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'awq_inference_engine' extension
creating /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new
creating /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm
creating /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemv
Emitting ninja build file /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o 
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemm/gemm_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "ls" is undefined
  {ls -l /dev/nvidia*
   ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "l" is undefined
  {ls -l /dev/nvidia*
       ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "dev" is undefined
  {ls -l /dev/nvidia*
          ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(91): error: identifier "nvidia" is undefined
  {ls -l /dev/nvidia*
              ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(92): error: expected an expression
    __asm__ __volatile__(
    ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu(96): warning #12-D: parsing restarts here after previous syntax error
        : "r"(addr));
                    ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
5 errors detected in the compilation of "/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemm/gemm_cuda.cu".
[2/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o 
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemm_cuda_gen.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(34): warning #177-D: variable "ZERO" was declared but never referenced
    static constexpr uint32_t ZERO = 0x0;
                              ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced
    int blockIdx_x = 0;
        ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(65): warning #177-D: variable "ld_zero_flag" was declared but never referenced
    bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64;
         ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(21): warning #177-D: function "__pack_half2" was declared but never referenced
  __pack_half2(const half x, const half y) {
  ^
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 711; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 715; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 719; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 723; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 727; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 731; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 735; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 739; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 743; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 747; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 751; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 755; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 759; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 763; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 767; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 771; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 823; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 827; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 831; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 835; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 839; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 843; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 847; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 851; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 855; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 859; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 863; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 867; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 871; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 875; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 879; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 883; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2187; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2191; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2195; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2199; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2203; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2207; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2211; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2215; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2219; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2223; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2227; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2231; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2235; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2239; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2243; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2247; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2299; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2303; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2307; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2311; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2315; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2319; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2323; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2327; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2331; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2335; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2339; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2343; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2347; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2351; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2355; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas /tmp/tmpxft_0000240b_00000000-7_gemm_cuda_gen.compute_75.ptx, line 2359; error   : Feature '.m16n8k16' requires .target sm_80 or higher
ptxas fatal   : Ptx assembly aborted due to errors
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(34): warning #177-D: variable "ZERO" was declared but never referenced
    static constexpr uint32_t ZERO = 0x0;
                              ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(44): warning #177-D: variable "blockIdx_x" was declared but never referenced
    int blockIdx_x = 0;
        ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(65): warning #177-D: variable "ld_zero_flag" was declared but never referenced
    bool ld_zero_flag = (threadIdx.y * 32 + threadIdx.x) * 8 < 64;
         ^
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemm_cuda_gen.cu(21): warning #177-D: function "__pack_half2" was declared but never referenced
  __pack_half2(const half x, const half y) {
  ^
[3/9] c++ -MMD -MF /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/ft_attention.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/ft_attention.o -g -O3 -fopenmp -lgomp -std=c++17 -DENABLE_BF16 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp: In instantiation of ‘void set_params(Masked_multihead_attention_params<T>&, size_t, size_t, size_t, size_t, size_t, int, int, float, float, bool, int, T*, T*, T*, T*, T*, int*, float*, T*) [with T = short unsigned int; Masked_multihead_attention_params<T> = Multihead_attention_params<short unsigned int, false>; size_t = long unsigned int]’:
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:166:5:   required from here
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:73:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘Masked_multihead_attention_params<short unsigned int>’ {aka ‘struct Multihead_attention_params<short unsigned int, false>’}; use assignment or value-initialization instead [-Wclass-memaccess]
   73 |     memset(&params, 0, sizeof(params));
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:8:
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.h:122:8: note: ‘Masked_multihead_attention_params<short unsigned int>’ {aka ‘struct Multihead_attention_params<short unsigned int, false>’} declared here
  122 | struct Multihead_attention_params: public Multihead_attention_params_base<T> {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp: In instantiation of ‘void set_params(Masked_multihead_attention_params<T>&, size_t, size_t, size_t, size_t, size_t, int, int, float, float, bool, int, T*, T*, T*, T*, T*, int*, float*, T*) [with T = __nv_bfloat16; Masked_multihead_attention_params<T> = Multihead_attention_params<__nv_bfloat16, false>; size_t = long unsigned int]’:
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:166:5:   required from here
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:73:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘Masked_multihead_attention_params<__nv_bfloat16>’ {aka ‘struct Multihead_attention_params<__nv_bfloat16, false>’}; use assignment or value-initialization instead [-Wclass-memaccess]
   73 |     memset(&params, 0, sizeof(params));
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:8:
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.h:122:8: note: ‘Masked_multihead_attention_params<__nv_bfloat16>’ {aka ‘struct Multihead_attention_params<__nv_bfloat16, false>’} declared here
  122 | struct Multihead_attention_params: public Multihead_attention_params_base<T> {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp: In instantiation of ‘void set_params(Masked_multihead_attention_params<T>&, size_t, size_t, size_t, size_t, size_t, int, int, float, float, bool, int, T*, T*, T*, T*, T*, int*, float*, T*) [with T = float; Masked_multihead_attention_params<T> = Multihead_attention_params<float, false>; size_t = long unsigned int]’:
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:166:5:   required from here
/tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:73:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘Masked_multihead_attention_params<float>’ {aka ‘struct Multihead_attention_params<float, false>’}; use assignment or value-initialization instead [-Wclass-memaccess]
   73 |     memset(&params, 0, sizeof(params));
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/pycharm_project_505/awq/kernels/csrc/attention/ft_attention.cpp:8:
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.h:122:8: note: ‘Masked_multihead_attention_params<float>’ {aka ‘struct Multihead_attention_params<float, false>’} declared here
  122 | struct Multihead_attention_params: public Multihead_attention_params_base<T> {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~
[4/9] c++ -MMD -MF /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/pybind.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/pybind.cpp -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/pybind.o -g -O3 -fopenmp -lgomp -std=c++17 -DENABLE_BF16 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1
[5/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/position_embedding/pos_encoding_kernels.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/position_embedding/pos_encoding_kernels.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/position_embedding/pos_encoding_kernels.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
[6/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemv_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemv_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization/gemv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemv_cuda.cu(224): warning #177-D: variable "blockDim_z" was declared but never referenced
      int blockDim_z = num_out_feats;
          ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pycharm_project_505/awq/kernels/csrc/quantization/gemv_cuda.cu(224): warning #177-D: variable "blockDim_z" was declared but never referenced
      int blockDim_z = num_out_feats;
          ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
[7/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/layernorm/layernorm.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/layernorm/layernorm.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/layernorm/layernorm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
[8/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemv/gemv_cuda.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/quantization_new/gemv/gemv_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu(83): warning #177-D: variable "kShuffleSize" was declared but never referenced
      static constexpr int kShuffleSize = 32;
                           ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pycharm_project_505/awq/kernels/csrc/quantization_new/gemv/gemv_cuda.cu(83): warning #177-D: variable "kShuffleSize" was declared but never referenced
      static constexpr int kShuffleSize = 32;
                           ^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
[9/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/decoder_masked_multihead_attention.o.d -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu -o /tmp/pycharm_project_505/awq/kernels/build/temp.linux-x86_64-3.10/csrc/attention/decoder_masked_multihead_attention.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -DENABLE_BF16 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads=8 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=awq_inference_engine -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(989): warning #177-D: variable "v_offset" was declared but never referenced
      int v_offset = k_offset;
          ^
          detected during:
            instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
            instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(995): warning #177-D: variable "v_bias_offset" was declared but never referenced
      int v_bias_offset = k_bias_offset;
          ^
          detected during:
            instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
            instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(989): warning #177-D: variable "v_offset" was declared but never referenced
      int v_offset = k_offset;
          ^
          detected during:
            instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
            instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
/tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention_template.hpp(995): warning #177-D: variable "v_bias_offset" was declared but never referenced
      int v_bias_offset = k_bias_offset;
          ^
          detected during:
            instantiation of "void mmha_launch_kernel<T,Dh,Dh_MAX,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, Dh=32, Dh_MAX=32, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 70 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
            instantiation of "void multihead_attention_<T,KERNEL_PARAMS_TYPE>(const KERNEL_PARAMS_TYPE &, const cudaStream_t &) [with T=float, KERNEL_PARAMS_TYPE=Multihead_attention_params<float, false>]" at line 111 of /tmp/pycharm_project_505/awq/kernels/csrc/attention/decoder_masked_multihead_attention.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2105, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/install.py", line 116, in do_egg_install
    self.run_command('bdist_egg')
  File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/usr/lib/python3.10/distutils/command/install_lib.py", line 109, in build
    self.run_command('build_ext')
  File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 876, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
    super(build_ext, self).build_extension(ext)
  File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
    objects = self.compiler.compile(sources,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 689, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1777, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2121, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
python-BaseException

The text was updated successfully, but these errors were encountered:

razpa · 2024-06-16T08:32:14Z

Following #93 I was able to force the sm_90 by running:
TORCH_CUDA_ARCH_LIST="9.0" python setup.py install.

razpa closed this as completed Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid Compute Capability when building Docker pytorch:23.12 #198

Invalid Compute Capability when building Docker pytorch:23.12 #198

razpa commented Jun 9, 2024

razpa commented Jun 16, 2024

Invalid Compute Capability when building Docker pytorch:23.12 #198

Invalid Compute Capability when building Docker pytorch:23.12 #198

Comments

razpa commented Jun 9, 2024

razpa commented Jun 16, 2024