Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast build failing on q3 model #1725

Closed
TheTerrasque opened this issue Jun 6, 2023 · 2 comments
Closed

CLBlast build failing on q3 model #1725

TheTerrasque opened this issue Jun 6, 2023 · 2 comments
Labels

Comments

@TheTerrasque
Copy link

TheTerrasque commented Jun 6, 2023

When trying to run wizardlm-30b.ggmlv3.q3_K_M.bin from https://huggingface.co/TheBloke/WizardLM-30B-GGML using CLBlast build, it fails with GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-opencl.cpp:1009: to_fp32_cl != nullptr

PS H:\Files\Downloads\llama-master-2d7bf11-bin-win-clblast-x64> .\main.exe -m C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin -ngl 20
main: build = 631 (2d7bf11)
main: seed  = 1686095068
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3080'
ggml_opencl: device FP16 support: false
llama.cpp: loading model from C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 12 (mostly Q3_K - Medium)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 12303.88 MB (+ 3124.00 MB per state)
llama_model_load_internal: offloading 20 layers to GPU
llama_model_load_internal: total VRAM used: 4913 MB
..................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-opencl.cpp:1009: to_fp32_cl != nullptr

PS H:\Files\Downloads\llama-master-2d7bf11-bin-win-clblast-x64> certutil -hashfile C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin SHA256
SHA256 hash of C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin:
65e3770689b388c50bf39406484cd5755854b57d57d802380bedfb4d31a63e8b
CertUtil: -hashfile command completed successfully.

Running without GPU layers works

@arch-btw
Copy link
Contributor

arch-btw commented Jun 7, 2023

Same issue on guanaco-7B.ggmlv3.q4_K_S.bin and guanaco-7B.ggmlv3.q4_K_M.bin

Running without GPU works indeed.

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants