Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper.cpp ggml-cuda was compiled without support for the current GPU architecture #1513

Closed
bdqfork opened this issue Dec 30, 2023 · 1 comment · Fixed by #1846
Closed
Assignees
Labels
bug Something isn't working

Comments

@bdqfork
Copy link

bdqfork commented Dec 30, 2023

LocalAI version:
quay.io/go-skynet/local-ai:v2.3.0-cublas-cuda12-ffmpeg-core

Environment, CPU architecture, OS, and Version:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 GRID RTX6000-24Q On | 00000000:06:10.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 0MiB / 21504MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

Describe the bug
ERROR: ggml-cuda was compiled without support for the current GPU architecture.

To Reproduce

Expected behavior

Logs
Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:whisper/ggml-large-v3-q5_0.bin ContextSize:0 Seed:0 NBatch:0 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:0 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/whisper/ggml-large-v3-q5_0.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_from_file_with_params_no_state: loading model from '/models/whisper/ggml-large-v3-q5_0.bin'
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: loading model
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_vocab = 51866
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_ctx = 1500
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_ctx = 448
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_mels = 128
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: ftype = 8
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: qntvr = 2
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: type = 5 (large v3)
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: adding 1609 extra tokens
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_langs = 100
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: found 1 CUDA devices:
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr Device 0: GRID RTX6000-24Q, compute capability 7.5
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: CUDA buffer size = 1080.97 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: model size = 1080.47 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv self size = 220.20 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv cross size = 245.76 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (conv) = 32.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (encode) = 212.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (cross) = 9.38 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (decode) = 99.24 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.

Additional context

@bdqfork bdqfork added the bug Something isn't working label Dec 30, 2023
@devwearsprada
Copy link

devwearsprada commented Mar 10, 2024

Having the same ggml-cuda was compiled without support for the current GPU architecture error on v.2.9.0. Did you manage to fix or find a workaround for this? I am using the ggml-tiny.en-q5_1.bin model.

@mudler mudler linked a pull request Mar 18, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants