Obtaining file:///home/lmzheng/flashinfer/python Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done' Installing collected packages: flashinfer Running setup.py develop for flashinfer error: subprocess-exited-with-error × python setup.py develop did not run successfully. │ exit code: 1 ╰─> [992 lines of output] running develop /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated. !! ******************************************************************************** Please avoid running ``setup.py`` and ``easy_install``. Instead, use pypa/build, pypa/installer or other standards-based tools. See https://github.com/pypa/setuptools/issues/917 for details. ******************************************************************************** !! easy_install.initialize_options(self) /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !! ******************************************************************************** Please avoid running ``setup.py`` directly. Instead, use pypa/build, pypa/installer or other standards-based tools. See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. ******************************************************************************** !! self.initialize_options() running egg_info writing flashinfer.egg-info/PKG-INFO writing dependency_links to flashinfer.egg-info/dependency_links.txt writing top-level names to flashinfer.egg-info/top_level.txt reading manifest file 'flashinfer.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files found matching 'tests/' no previously-included directories found matching '*/__pycache__' warning: no previously-included files matching '*.so' found anywhere in distribution writing manifest file 'flashinfer.egg-info/SOURCES.txt' running build_ext /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py:414: UserWarning: The detected CUDA version (12.0) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.0 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}') building 'flashinfer._kernels' extension Emitting ninja build file /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [2/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [3/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [4/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [5/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [6/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [7/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [8/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [9/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [10/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [11/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [12/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [13/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [14/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [15/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [16/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [17/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [18/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [19/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [20/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [21/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [22/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [23/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [24/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [25/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [26/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [27/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [28/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [29/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [30/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [31/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [32/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [33/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [34/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [35/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [36/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [37/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [38/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [39/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [40/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [41/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [42/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [43/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [44/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [45/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [46/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [47/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [48/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [49/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [50/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [51/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [52/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [53/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [54/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [55/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [56/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [57/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [58/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [59/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [60/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [61/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [62/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [63/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [64/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [65/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [66/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [67/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [68/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [69/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [70/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [71/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [72/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [73/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [74/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [75/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [76/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [77/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [78/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [79/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [80/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [81/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [82/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [83/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [84/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [85/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [86/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [87/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [88/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [89/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [90/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [91/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [92/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [93/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [94/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [95/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [96/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [97/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [98/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/batch_prefill.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/batch_prefill.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/batch_prefill.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) Error limit reached. 100 errors detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/batch_prefill.cu". Compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/home/lmzheng/flashinfer/python/setup.py", line 210, in setuptools.setup( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/__init__.py", line 103, in setup return distutils.core.setup(**attrs) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run self.install_for_development() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/develop.py", line 109, in install_for_development self.run_command('build_ext') File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions build_ext.build_extensions(self) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error × python setup.py develop did not run successfully. │ exit code: 1 ╰─> [992 lines of output] running develop /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated. !! ******************************************************************************** Please avoid running ``setup.py`` and ``easy_install``. Instead, use pypa/build, pypa/installer or other standards-based tools. See https://github.com/pypa/setuptools/issues/917 for details. ******************************************************************************** !! easy_install.initialize_options(self) /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !! ******************************************************************************** Please avoid running ``setup.py`` directly. Instead, use pypa/build, pypa/installer or other standards-based tools. See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. ******************************************************************************** !! self.initialize_options() running egg_info writing flashinfer.egg-info/PKG-INFO writing dependency_links to flashinfer.egg-info/dependency_links.txt writing top-level names to flashinfer.egg-info/top_level.txt reading manifest file 'flashinfer.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files found matching 'tests/' no previously-included directories found matching '*/__pycache__' warning: no previously-included files matching '*.so' found anywhere in distribution writing manifest file 'flashinfer.egg-info/SOURCES.txt' running build_ext /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py:414: UserWarning: The detected CUDA version (12.0) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) /home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.0 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}') building 'flashinfer._kernels' extension Emitting ninja build file /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [2/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [3/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [4/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [5/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [6/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [7/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [8/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [9/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [10/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [11/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [12/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [13/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [14/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [15/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [16/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [17/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [18/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [19/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [20/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [21/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [22/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [23/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [24/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [25/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [26/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [27/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [28/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [29/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [30/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [31/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [32/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [33/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [34/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [35/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [36/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [37/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [38/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [39/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [40/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [41/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [42/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [43/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [44/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [45/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [46/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [47/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [48/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [49/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [50/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [51/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [52/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [53/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [54/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [55/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [56/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [57/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [58/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [59/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [60/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [61/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_bf16.cu". [62/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [63/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [64/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [65/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [66/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [67/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [68/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [69/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [70/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [71/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_bf16.cu". [72/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [73/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [74/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [75/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [76/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [77/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_fp16.cu". [78/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_bf16.cu". [79/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [80/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_fp16.cu". [81/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [82/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutHND_rotaryNone_fp16.cu". [83/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [84/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head64_causalFalse_fp16qkFalse_layoutHND_rotaryLlama_bf16.cu". [85/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutHND_rotaryNone_bf16.cu". [86/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [87/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [88/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryLlama_fp16.cu". [89/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [90/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryLlama_bf16.cu". [91/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_fp16.cu". [92/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutHND_rotaryNone_fp16.cu". [93/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalFalse_fp16qkFalse_layoutNHD_rotaryNone_fp16.cu". [94/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalFalse_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [95/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kHND, GROUP_SIZE=4U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kLlama, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=false, DTypeIn=nv_half, DTypeOut=nv_half, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group4_head128_causalFalse_fp16qkTrue_layoutHND_rotaryLlama_fp16.cu". [96/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=128U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=false, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head128_causalTrue_fp16qkFalse_layoutNHD_rotaryNone_bf16.cu". [97/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu(7): error: function "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched(flashinfer::BatchPrefillHandler *, DTypeIn *, IdType *, flashinfer::paged_kv_t, DTypeOut *, float *, float, float, cudaStream_t) [with page_storage=flashinfer::PageStorage::kIndices, kv_layout=flashinfer::QKVLayout::kNHD, GROUP_SIZE=1U, HEAD_DIM=64U, ROTARY_MODE=flashinfer::RotaryMode::kNone, ALLOW_FP16_QK_REDUCTION=true, CAUSAL=true, DTypeIn=nv_bfloat16, DTypeOut=nv_bfloat16, IdType=int32_t]" cannot be instantiated -- no template definition was supplied 1 error detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/generated/paged_batch_prefill_group1_head64_causalTrue_fp16qkTrue_layoutNHD_rotaryNone_bf16.cu". [98/580] /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/batch_prefill.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/batch_prefill.o /usr/local/cuda/bin/nvcc -I/home/lmzheng/flashinfer/python/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/TH -I/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/lmzheng/anaconda3/envs/fastchat/include/python3.9 -c -c /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu -o /home/lmzheng/flashinfer/python/build/temp.linux-x86_64-cpython-39/csrc/batch_prefill.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) /home/lmzheng/flashinfer/python/csrc/batch_prefill.cu(99): error: no instance of function template "flashinfer::BatchPrefillWithPagedKVCacheWrapperDispatched" matches the argument list argument types are: (flashinfer::BatchPrefillHandler *, c_type *, int32_t *, std::nullptr_t, flashinfer::paged_kv_t, c_type *, float *, float, float, cudaStream_t) Error limit reached. 100 errors detected in the compilation of "/home/lmzheng/flashinfer/python/csrc/batch_prefill.cu". Compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/home/lmzheng/flashinfer/python/setup.py", line 210, in setuptools.setup( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/__init__.py", line 103, in setup return distutils.core.setup(**attrs) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run self.install_for_development() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/develop.py", line 109, in install_for_development self.run_command('build_ext') File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions build_ext.build_extensions(self) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/home/lmzheng/anaconda3/envs/fastchat/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension [end of output] note: This error originates from a subprocess, and is likely not a problem with pip.