-
Couldn't load subscription status.
- Fork 13.5k
Description
Name and Version
version: 6745 (a31cf36)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA H100
Models
Meta-Llama-3.2-3B-Instruct.gguf and finetuned lora model
Problem description & steps to reproduce
I've tested several recent versions of llama.cpp, including:
- b6745
- b6739
- b6715
- b6710
- b6708
- b6672
However, I consistently encounter the same error:
llama.cpp/ggml/src/ggml-backend.cpp:1718: GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed
The CLI command I'm running is:
build/bin/llama-cli -m model/Meta-Llama-3.2-3B-Instruct.gguf --lora model/TEST_3B_1.1.20_192-NB_lora.gguf -sys "You are a help assistant"
If I run the base model without LoRA, everything works fine
build/bin/llama-cli -m model/Meta-Llama-3.2-3B-Instruct.gguf -sys "You are a help assistant"
But as soon as I include the LoRA model, the same error occurs.
Any idea what might be causing this or how to resolve it?
First Bad Commit
No response
Relevant log output
llama_adapter_lora_init_impl: loading lora adapter from 'model/TEST_3B_1.1.20_192-NB_lora.gguf' ...
llama_adapter_lora_init_impl: Dumping metadata keys/values.
llama_adapter_lora_init_impl: - kv 0: general.architecture str = llama
llama_adapter_lora_init_impl: - kv 1: general.type str = adapter
llama_adapter_lora_init_impl: - kv 2: adapter.type str = lora
llama_adapter_lora_init_impl: - kv 3: general.name str = Checkpoint 55
llama_adapter_lora_init_impl: - kv 4: general.version str = 55
llama_adapter_lora_init_impl: - kv 5: general.basename str = checkpoint
llama_adapter_lora_init_impl: - kv 6: general.base_model.count u32 = 1
llama_adapter_lora_init_impl: - kv 7: general.base_model.0.name str = Meta Llama 3.2 3B Instruct
llama_adapter_lora_init_impl: - kv 8: adapter.lora.alpha f32 = 32.000000
llama_adapter_lora_init_impl: - kv 9: general.quantization_version u32 = 2
llama_adapter_lora_init_impl: CUDA0 LoRA buffer size = 185.50 MiB
llama_adapter_lora_init_impl: loaded 392 tensors from lora file
common_init_from_params: added <|end_of_text|> logit bias = -inf
common_init_from_params: added <|eom_id|> logit bias = -inf
common_init_from_params: added <|eot_id|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/root/llama.cpp_gpu_b6708/llama.cpp/ggml/src/ggml-backend.cpp:1718: GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(+0x183cb)[0x7f153224f3cb]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(ggml_print_backtrace+0x21f)[0x7f153224f82f]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(ggml_abort+0x152)[0x7f153224fa02]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(ggml_backend_sched_alloc_graph+0x1c3)[0x7f153226a113]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libllama.so(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xf3)[0x7f15323897f3]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libllama.so(_ZN13llama_context6decodeERK11llama_batch+0x2e7)[0x7f15323900b7]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libllama.so(llama_decode+0x10)[0x7f1532390f50]
build/bin/llama-cli(+0x158935)[0x558cbaa5b935]
build/bin/llama-cli(+0x442a3)[0x558cba9472a3]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f1531c55d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f1531c55e40]
build/bin/llama-cli(+0x49c15)[0x558cba94cc15]
Aborted (core dumped)