Bug: Compilation failure with CUDA support on Windows: 1 error detected in the compilation of ggml-cuda/sum.cu.

Commit [202084d](https://github.com/ggerganov/llama.cpp/commit/202084d31d4247764fc6d6d40d2e2bda0c89a73a) introduced compillation issue on Windows with CUDA: 1 error detected in the compilation of "llama.cpp/ggml/src/ggml-cuda/sum.cu".

Windows 11 26100.1591
CUDA Toolkit 12.6.1

Visual Studio 2022 version 17.8.13 (Desktop development with C++) - the latest working version with AVX1 support
Also the same issue with Visual Studio 2019 version 16.11.39


Logs:

<details>

<summary>cmake -B build -DGGML_CUDA=ON</summary>

```
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.26100.
-- The C compiler identification is MSVC 19.29.30154.0
-- The CXX compiler identification is MSVC 19.29.30154.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Enterprise/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.46.0.windows.1")
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Found OpenMP_C: -openmp (found version "2.0")
-- Found OpenMP_CXX: -openmp (found version "2.0")
-- Found OpenMP: TRUE (found version "2.0")
-- OpenMP found
-- Using llamafile
-- Found CUDAToolkit: F:/LLM/Apps/Cuda/include (found version "12.6.68")
-- CUDA found
-- Using CUDA architectures: 52;61;70;75
-- The CUDA compiler identification is NVIDIA 12.6.68
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: F:/LLM/Apps/Cuda/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM:
-- x86 detected
-- Performing Test HAS_AVX_1
-- Performing Test HAS_AVX_1 - Success
-- Performing Test HAS_AVX2_1
-- Performing Test HAS_AVX2_1 - Failed
-- Performing Test HAS_AVX2_2
-- Performing Test HAS_AVX2_2 - Failed
-- Performing Test HAS_FMA_1
-- Performing Test HAS_FMA_1 - Failed
-- Performing Test HAS_FMA_2
-- Performing Test HAS_FMA_2 - Failed
-- Performing Test HAS_AVX512_1
-- Performing Test HAS_AVX512_1 - Failed
-- Performing Test HAS_AVX512_2
-- Performing Test HAS_AVX512_2 - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: F:/LLM/llama.cpp/build
```
</details>

<details>
<summary>cmake --build build --config Release</summary>

```
Microsoft (R) Build Engine version 16.11.2+f32259642 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

  Checking Build System
  Building Custom Rule F:/LLM/llama.cpp/examples/gguf-hash/CMakeLists.txt
  Building Custom Rule F:/LLM/llama.cpp/examples/gguf-hash/CMakeLists.txt
  sha256.c
  xxhash.c
  sha256.vcxproj -> F:\LLM\llama.cpp\build\examples\gguf-hash\sha256.dir\Release\sha256.lib
  Generating build details from Git
  xxhash.vcxproj -> F:\LLM\llama.cpp\build\examples\gguf-hash\xxhash.dir\Release\xxhash.lib
  Building Custom Rule F:/LLM/llama.cpp/ggml/src/CMakeLists.txt
  -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.46.0.windows.1")
  Building Custom Rule F:/LLM/llama.cpp/examples/gguf-hash/CMakeLists.txt
  Building Custom Rule F:/LLM/llama.cpp/common/CMakeLists.txt
  sha1.c
  build-info.cpp
  build_info.vcxproj -> F:\LLM\llama.cpp\build\common\build_info.dir\Release\build_info.lib
  sha1.vcxproj -> F:\LLM\llama.cpp\build\examples\gguf-hash\sha1.dir\Release\sha1.lib
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\argsort.cu...
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\binbcast.cu...
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\arange.cu...
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\acc.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\argsort.obj "F:\L
  LM\llama.cpp\ggml\src\ggml-cuda\argsort.cu"

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\binbcast.obj "F:\
  LLM\llama.cpp\ggml\src\ggml-cuda\binbcast.cu"

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\acc.obj "F:\LLM\l
  lama.cpp\ggml\src\ggml-cuda\acc.cu"

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\arange.obj "F:\LL
  M\llama.cpp\ggml\src\ggml-cuda\arange.cu"
  acc.cu
  arange.cu
  argsort.cu
  tmpxft_00000cb8_00000000-7_acc.compute_75.cudafe1.cpp
  tmpxft_000024e8_00000000-7_arange.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\clamp.cu...
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\concat.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\clamp.obj "F:\LLM
  \llama.cpp\ggml\src\ggml-cuda\clamp.cu"

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\concat.obj "F:\LL
  M\llama.cpp\ggml\src\ggml-cuda\concat.cu"
  tmpxft_000026d0_00000000-7_argsort.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\conv-transpose-1d.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\conv-transpose-1d
  .obj "F:\LLM\llama.cpp\ggml\src\ggml-cuda\conv-transpose-1d.cu"
  binbcast.cu
  tmpxft_00001f6c_00000000-7_binbcast.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\convert.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\convert.obj "F:\L
  LM\llama.cpp\ggml\src\ggml-cuda\convert.cu"
  clamp.cu
  concat.cu
  conv-transpose-1d.cu
  tmpxft_0000487c_00000000-7_clamp.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\cpy.cu...
  tmpxft_000017fc_00000000-7_concat.compute_75.cudafe1.cpp
  tmpxft_000047b0_00000000-7_conv-transpose-1d.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\cross-entropy-loss.cu...
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\diagmask.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\cpy.obj "F:\LLM\l
  lama.cpp\ggml\src\ggml-cuda\cpy.cu"

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\cross-entropy-los
  s.obj "F:\LLM\llama.cpp\ggml\src\ggml-cuda\cross-entropy-loss.cu"

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\diagmask.obj "F:\
  LLM\llama.cpp\ggml\src\ggml-cuda\diagmask.cu"
  convert.cu
  tmpxft_00004c50_00000000-7_convert.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\dmmv.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\dmmv.obj "F:\LLM\
  llama.cpp\ggml\src\ggml-cuda\dmmv.cu"
  diagmask.cu
  cross-entropy-loss.cu
  tmpxft_0000478c_00000000-7_diagmask.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\fattn-tile-f16.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\fattn-tile-f16.ob
  j "F:\LLM\llama.cpp\ggml\src\ggml-cuda\fattn-tile-f16.cu"
  tmpxft_000010b8_00000000-7_cross-entropy-loss.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\fattn-tile-f32.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\fattn-tile-f32.ob
  j "F:\LLM\llama.cpp\ggml\src\ggml-cuda\fattn-tile-f32.cu"
  cpy.cu
  tmpxft_000040c8_00000000-7_cpy.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\fattn.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\fattn.obj "F:\LLM
  \llama.cpp\ggml\src\ggml-cuda\fattn.cu"
  dmmv.cu
  tmpxft_000054c4_00000000-7_dmmv.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\getrows.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\getrows.obj "F:\L
  LM\llama.cpp\ggml\src\ggml-cuda\getrows.cu"
  getrows.cu
  tmpxft_00005734_00000000-7_getrows.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\im2col.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\im2col.obj "F:\LL
  M\llama.cpp\ggml\src\ggml-cuda\im2col.cu"
  im2col.cu
  tmpxft_0000464c_00000000-7_im2col.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\mmq.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\mmq.obj "F:\LLM\l
  lama.cpp\ggml\src\ggml-cuda\mmq.cu"
  mmq.cu
  fattn.cu
  tmpxft_000054ec_00000000-7_fattn.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\mmvq.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\mmvq.obj "F:\LLM\
  llama.cpp\ggml\src\ggml-cuda\mmvq.cu"
  fattn-tile-f16.cu
  tmpxft_000052c0_00000000-7_fattn-tile-f16.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\norm.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\norm.obj "F:\LLM\
  llama.cpp\ggml\src\ggml-cuda\norm.cu"
  fattn-tile-f32.cu
  tmpxft_000038b4_00000000-7_fattn-tile-f32.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\pad.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\pad.obj "F:\LLM\l
  lama.cpp\ggml\src\ggml-cuda\pad.cu"
  tmpxft_000056e0_00000000-7_mmq.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\pool2d.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\pool2d.obj "F:\LL
  M\llama.cpp\ggml\src\ggml-cuda\pool2d.cu"
  norm.cu
  tmpxft_00005840_00000000-7_norm.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\quantize.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\quantize.obj "F:\
  LLM\llama.cpp\ggml\src\ggml-cuda\quantize.cu"
  pad.cu
  tmpxft_00001990_00000000-7_pad.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\rope.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\rope.obj "F:\LLM\
  llama.cpp\ggml\src\ggml-cuda\rope.cu"
  pool2d.cu
  tmpxft_000020b8_00000000-7_pool2d.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\scale.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\scale.obj "F:\LLM
  \llama.cpp\ggml\src\ggml-cuda\scale.cu"
  quantize.cu
  rope.cu
  tmpxft_000056b8_00000000-7_rope.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\softmax.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\softmax.obj "F:\L
  LM\llama.cpp\ggml\src\ggml-cuda\softmax.cu"
  scale.cu
  tmpxft_000047c4_00000000-7_scale.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\sum.cu...

  F:\LLM\llama.cpp\build\ggml\src>"F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch=compute_52,code=\"comput
  e_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_61,code=\"c
  ompute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencode=arch=compute_70,cod
  e=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gencode=arch=compute_7
  5,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -c
  cbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\Hos
  tX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -IF:\LLM\Apps\C
  uda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregcount=0   --m
  achine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -DGGML_USE_
  CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP
  -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_
  PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml
  _EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNI
  NGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA
  _MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPE
  N_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nolo
  go /O2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\sum.obj "F:\LLM\l
  lama.cpp\ggml\src\ggml-cuda\sum.cu"
C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\um\oaidl.h(804): error : expected an identif
ier [F:\LLM\llama.cpp\build\ggml\src\ggml.vcxproj]
            600 = CC_MSCPASCAL,
            ^

  tmpxft_00000ab8_00000000-7_quantize.compute_75.cudafe1.cpp
  Compiling CUDA source file ..\..\..\ggml\src\ggml-cuda\sumrows.cu...
  1 error detected in the compilation of "F:/LLM/llama.cpp/ggml/src/ggml-cuda/sum.cu".
  sum.cu
C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Microsoft\VC\v160\BuildCustomizat
ions\CUDA 12.6.targets(799,9): error MSB3721: The command ""F:\LLM\Apps\Cuda\bin\nvcc.exe" -gencode=arch
=compute_52,code=\"compute_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=a
rch=compute_61,code=\"compute_61,compute_61\" -gencode=arch=compute_61,code=\"sm_61,compute_61\" -gencod
e=arch=compute_70,code=\"compute_70,compute_70\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -gen
code=arch=compute_75,code=\"compute_75,compute_75\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" -
-use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.2
9.30133\bin\HostX64\x64" -x cu   -IF:\LLM\llama.cpp\ggml\src\..\include -IF:\LLM\llama.cpp\ggml\src\. -I
F:\LLM\Apps\Cuda\include -IF:\LLM\Apps\Cuda\include     --keep-dir x64\Release -use_fast_math -maxrregco
unt=0   --machine 64 --compile -cudart static -Xcompiler="/EHsc -Ob2 /arch:AVX"   -D_WINDOWS -DNDEBUG -D
GGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARNINGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_
OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_
CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_SOURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dg
gml_EXPORTS -DWIN32 -D_WINDOWS -DNDEBUG -DGGML_USE_CUDA -DGGML_SHARED -DGGML_BUILD -D_CRT_SECURE_NO_WARN
INGS -DGGML_SCHED_MAX_COPIES=4 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_
MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -D_XOPEN_S
OURCE=600 -D"CMAKE_INTDIR=\"Release\"" -Dggml_EXPORTS -D_WINDLL -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O
2 /FS   /MD /GR" -Xcompiler "/Fdggml.dir\Release\vc142.pdb" -o ggml.dir\Release\sum.obj "F:\LLM\llama.cp
p\ggml\src\ggml-cuda\sum.cu"" exited with code 1. [F:\LLM\llama.cpp\build\ggml\src\ggml.vcxproj]
```
</details>

I know nothing about coding, so I don't know how to fix this. I have tried different versions of the Windows SDK without success with Visual Studio 2022 version 17.8.13. Then I installed Visual Studio 19 in the hope of fixing this, but got the same problem.

Hope you can fix this, thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Compilation failure with CUDA support on Windows: 1 error detected in the compilation of ggml-cuda/sum.cu. #9376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Compilation failure with CUDA support on Windows: 1 error detected in the compilation of ggml-cuda/sum.cu. #9376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions