You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Environment, CPU architecture, OS, and Version:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 GRID RTX6000-24Q On | 00000000:06:10.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 0MiB / 21504MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Describe the bug
ERROR: ggml-cuda was compiled without support for the current GPU architecture.
To Reproduce
Expected behavior
Logs
Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:whisper/ggml-large-v3-q5_0.bin ContextSize:0 Seed:0 NBatch:0 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:0 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/whisper/ggml-large-v3-q5_0.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_from_file_with_params_no_state: loading model from '/models/whisper/ggml-large-v3-q5_0.bin'
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: loading model
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_vocab = 51866
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_ctx = 1500
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_ctx = 448
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_mels = 128
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: ftype = 8
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: qntvr = 2
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: type = 5 (large v3)
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: adding 1609 extra tokens
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_langs = 100
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: found 1 CUDA devices:
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr Device 0: GRID RTX6000-24Q, compute capability 7.5
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: CUDA buffer size = 1080.97 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: model size = 1080.47 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv self size = 220.20 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv cross size = 245.76 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (conv) = 32.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (encode) = 212.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (cross) = 9.38 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (decode) = 99.24 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
Additional context
The text was updated successfully, but these errors were encountered:
Having the same ggml-cuda was compiled without support for the current GPU architecture error on v.2.9.0. Did you manage to fix or find a workaround for this? I am using the ggml-tiny.en-q5_1.bin model.
LocalAI version:
quay.io/go-skynet/local-ai:v2.3.0-cublas-cuda12-ffmpeg-core
Environment, CPU architecture, OS, and Version:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 GRID RTX6000-24Q On | 00000000:06:10.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 0MiB / 21504MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Describe the bug
ERROR: ggml-cuda was compiled without support for the current GPU architecture.
To Reproduce
Expected behavior
Logs
Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:} sizeCache:0 unknownFields:[] Model:whisper/ggml-large-v3-q5_0.bin ContextSize:0 Seed:0 NBatch:0 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:0 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/whisper/ggml-large-v3-q5_0.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0}
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_from_file_with_params_no_state: loading model from '/models/whisper/ggml-large-v3-q5_0.bin'
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: loading model
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_vocab = 51866
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_ctx = 1500
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_audio_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_ctx = 448
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_state = 1280
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_head = 20
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_text_layer = 32
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_mels = 128
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: ftype = 8
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: qntvr = 2
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: type = 5 (large v3)
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: adding 1609 extra tokens
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: n_langs = 100
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr ggml_init_cublas: found 1 CUDA devices:
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr Device 0: GRID RTX6000-24Q, compute capability 7.5
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: CUDA buffer size = 1080.97 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_model_load: model size = 1080.47 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_backend_init: using CUDA backend
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv self size = 220.20 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: kv cross size = 245.76 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (conv) = 32.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (encode) = 212.42 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (cross) = 9.38 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stderr whisper_init_state: compute buffer (decode) = 99.24 MB
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
localai-localai-1 | 9:11AM DBG GRPC(whisper/ggml-large-v3-q5_0.bin-127.0.0.1:36495): stdout ERROR: ggml-cuda was compiled without support for the current GPU architecture.
Additional context
The text was updated successfully, but these errors were encountered: