Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid configuration argument #1732

Closed
kingminsvn opened this issue Jun 7, 2023 · 10 comments
Closed

invalid configuration argument #1732

kingminsvn opened this issue Jun 7, 2023 · 10 comments

Comments

@kingminsvn
Copy link

E:\tools\llama>main.exe -m ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin -ngl 32
main: build = 632 (35a8491)
main: seed = 1686234538
ggml_init_cublas: found 4 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti
Device 1: NVIDIA GeForce RTX 2080 Ti
Device 2: NVIDIA GeForce RTX 2080 Ti
Device 3: NVIDIA GeForce RTX 2080 Ti
llama.cpp: loading model from ....\GPT_MOD\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 2080 Ti) as main device
llama_model_load_internal: mem required = 3756.23 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 layers to GPU
llama_model_load_internal: total VRAM used: 6564 MB
...............................................................................
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 24 / 48 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0

CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

@Vencibo
Copy link

Vencibo commented Jun 7, 2023

same here

@ThioJoe
Copy link

ThioJoe commented Jun 7, 2023

Same here. It seems to happen only when splitting the load across two GPUs. If I use the -ts parameter (described here) to force everything onto one GPU, such as -ts 1,0 or even -ts 0,1, it works. So that's at least a workaround in the meantime, just without multi gpu.

>main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --instruct -m Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin --n-gpu-layers 40
main: build = 635 (5c64a09)
main: seed  = 1686175494
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090
  Device 1: NVIDIA GeForce RTX 3090
llama.cpp: loading model from Wizard-Vicuna-13B-Uncensored.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 4090) as main device
llama_model_load_internal: mem required  = 2380.14 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 40 layers to GPU
llama_model_load_internal: total VRAM used: 13370 MB
...................................................................................................
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Human:'
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.200000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 CUDA error 9 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:1574: invalid configuration argument

@goyanx
Copy link

goyanx commented Jun 7, 2023

same here on main.exe and server

@kingminsvn
Copy link
Author

This issue seems to only occur on Windows systems with multiple graphics cards.

@ThioJoe
Copy link

ThioJoe commented Jun 8, 2023

Still happening on latest build 0bf7cf1

@ThioJoe
Copy link

ThioJoe commented Jun 10, 2023

Seems to be fixed at least as of 303f580

@dillfrescott
Copy link

Getting this error on Linux after compiling with cublas

@JoseConseco
Copy link

@dillfrescott
Copy link

@JoseConseco funny enough it was that exact same model too

@JoseConseco
Copy link

yes, this is problem with the model. not with llama. so this is not related to issue in current thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants