whisper.cpp ggml-cuda was compiled without support for the current GPU architecture #1513

bdqfork · 2023-12-30T09:26:07Z

LocalAI version:
quay.io/go-skynet/local-ai:v2.3.0-cublas-cuda12-ffmpeg-core

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

Describe the bug
ERROR: ggml-cuda was compiled without support for the current GPU architecture.

To Reproduce

Expected behavior

Logs
Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:whisper/ggml-large-v3-q5_0.bin ContextSize:0 Seed:0 NBatch:0 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:0 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/whisper/ggml-large-v3-q5_0.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_from_file_with_params_no_state: loading model from '/models/whisper/ggml-large-v3-q5_0.bin'
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: loading model
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_vocab = 51866
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_ctx = 1500
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_ctx = 448
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_mels = 128
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: ftype = 8
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: qntvr = 2
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: type = 5 (large v3)
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: adding 1609 extra tokens
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_langs = 100
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: found 1 CUDA devices:
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr Device 0: GRID RTX6000-24Q, compute capability 7.5
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: CUDA buffer size = 1080.97 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: model size = 1080.47 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv self size = 220.20 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv cross size = 245.76 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (conv) = 32.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (encode) = 212.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (cross) = 9.38 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (decode) = 99.24 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.

Additional context

devwearsprada · 2024-03-10T09:27:59Z

Having the same ggml-cuda was compiled without support for the current GPU architecture error on v.2.9.0. Did you manage to fix or find a workaround for this? I am using the ggml-tiny.en-q5_1.bin model.

bdqfork added the bug Something isn't working label Dec 30, 2023

bdqfork assigned mudler Dec 30, 2023

mudler linked a pull request Mar 18, 2024 that will close this issue

deps(whisper.cpp): update, fix cublas build #1846

Merged

1 task

mudler closed this as completed in #1846 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper.cpp ggml-cuda was compiled without support for the current GPU architecture #1513

whisper.cpp ggml-cuda was compiled without support for the current GPU architecture #1513

bdqfork commented Dec 30, 2023

devwearsprada commented Mar 10, 2024 •

edited

Loading

whisper.cpp ggml-cuda was compiled without support for the current GPU architecture #1513

whisper.cpp ggml-cuda was compiled without support for the current GPU architecture #1513

Comments

bdqfork commented Dec 30, 2023

devwearsprada commented Mar 10, 2024 • edited Loading

devwearsprada commented Mar 10, 2024 •

edited

Loading