-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid configuration argument #1732
Comments
same here |
Same here. It seems to happen only when splitting the load across two GPUs. If I use the
|
same here on main.exe and server |
This issue seems to only occur on Windows systems with multiple graphics cards. |
Still happening on latest build 0bf7cf1 |
Seems to be fixed at least as of 303f580 |
Getting this error on Linux after compiling with cublas |
@JoseConseco funny enough it was that exact same model too |
yes, this is problem with the model. not with llama. so this is not related to issue in current thread. |
E:\tools\llama>main.exe -m ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin -ngl 32
main: build = 632 (35a8491)
main: seed = 1686234538
ggml_init_cublas: found 4 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti
Device 1: NVIDIA GeForce RTX 2080 Ti
Device 2: NVIDIA GeForce RTX 2080 Ti
Device 3: NVIDIA GeForce RTX 2080 Ti
llama.cpp: loading model from ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 2080 Ti) as main device
llama_model_load_internal: mem required = 3756.23 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 layers to GPU
llama_model_load_internal: total VRAM used: 6564 MB
...............................................................................
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 24 / 48 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument
The text was updated successfully, but these errors were encountered: