invalid configuration argument #1732

kingminsvn · 2023-06-07T06:44:34Z

E:\tools\llama>main.exe -m ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin -ngl 32
main: build = 632 (35a8491)
main: seed = 1686234538
ggml_init_cublas: found 4 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti
Device 1: NVIDIA GeForce RTX 2080 Ti
Device 2: NVIDIA GeForce RTX 2080 Ti
Device 3: NVIDIA GeForce RTX 2080 Ti
llama.cpp: loading model from ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 2080 Ti) as main device
llama_model_load_internal: mem required = 3756.23 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 layers to GPU
llama_model_load_internal: total VRAM used: 6564 MB
...............................................................................
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 24 / 48 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

Vencibo · 2023-06-07T10:31:48Z

same here

ThioJoe · 2023-06-07T21:27:22Z

Same here. It seems to happen only when splitting the load across two GPUs. If I use the -ts parameter (described here) to force everything onto one GPU, such as -ts 1,0 or even -ts 0,1, it works. So that's at least a workaround in the meantime, just without multi gpu.

>main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct -m Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin --n-gpu-layers 40
main: build = 635 (5c64a09)
main: seed  = 1686175494
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090
  Device 1: NVIDIA GeForce RTX 3090
llama.cpp: loading model from Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 4090) as main device
llama_model_load_internal: mem required  = 2380.14 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 40 layers to GPU
llama_model_load_internal: total VRAM used: 13370 MB
...................................................................................................
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Human:'
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.200000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

goyanx · 2023-06-07T22:37:04Z

same here on main.exe and server

kingminsvn · 2023-06-08T03:37:26Z

This issue seems to only occur on Windows systems with multiple graphics cards.

ThioJoe · 2023-06-08T22:05:08Z

Still happening on latest build 0bf7cf1

ThioJoe · 2023-06-10T17:26:28Z

Seems to be fixed at least as of 303f580

dillfrescott · 2023-10-25T14:53:57Z

Getting this error on Linux after compiling with cublas

JoseConseco · 2023-10-26T13:24:39Z

Same with https://huggingface.co/TheBloke/CausalLM-14B-GGUF

dillfrescott · 2023-10-26T18:41:10Z

@JoseConseco funny enough it was that exact same model too

JoseConseco · 2023-10-26T19:21:16Z

yes, this is problem with the model. not with llama. so this is not related to issue in current thread.

ThioJoe mentioned this issue Jun 7, 2023

Shape Error When Running Inference after Converting OpenLlama 3B to GGML #1709

Closed

kingminsvn closed this as completed Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

invalid configuration argument #1732

invalid configuration argument #1732

kingminsvn commented Jun 7, 2023

Vencibo commented Jun 7, 2023

ThioJoe commented Jun 7, 2023 •

edited

Loading

goyanx commented Jun 7, 2023

kingminsvn commented Jun 8, 2023

ThioJoe commented Jun 8, 2023

ThioJoe commented Jun 10, 2023

dillfrescott commented Oct 25, 2023

JoseConseco commented Oct 26, 2023

dillfrescott commented Oct 26, 2023

JoseConseco commented Oct 26, 2023

invalid configuration argument #1732

invalid configuration argument #1732

Comments

kingminsvn commented Jun 7, 2023

Vencibo commented Jun 7, 2023

ThioJoe commented Jun 7, 2023 • edited Loading

goyanx commented Jun 7, 2023

kingminsvn commented Jun 8, 2023

ThioJoe commented Jun 8, 2023

ThioJoe commented Jun 10, 2023

dillfrescott commented Oct 25, 2023

JoseConseco commented Oct 26, 2023

dillfrescott commented Oct 26, 2023

JoseConseco commented Oct 26, 2023

ThioJoe commented Jun 7, 2023 •

edited

Loading