Eval bug: LoRA inference crashes GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed

### Name and Version

version: 6745 (a31cf36a)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

 NVIDIA H100

### Models

Meta-Llama-3.2-3B-Instruct.gguf and finetuned lora model

### Problem description & steps to reproduce

I've tested several recent versions of llama.cpp, including:
- b6745
- b6739
- b6715
- b6710
- b6708
- b6672

However, I consistently encounter the same error:
`llama.cpp/ggml/src/ggml-backend.cpp:1718: GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed`


The CLI command I'm running is:
`build/bin/llama-cli -m model/Meta-Llama-3.2-3B-Instruct.gguf --lora model/TEST_3B_1.1.20_192-NB_lora.gguf -sys "You are a help assistant"
`

If I run the base model without LoRA, everything works fine
`build/bin/llama-cli -m model/Meta-Llama-3.2-3B-Instruct.gguf -sys "You are a help assistant"`
But as soon as I include the LoRA model, the same error occurs.



Any idea what might be causing this or how to resolve it?


### First Bad Commit

_No response_

### Relevant log output

```shell
llama_adapter_lora_init_impl: loading lora adapter from 'model/TEST_3B_1.1.20_192-NB_lora.gguf' ...
llama_adapter_lora_init_impl: Dumping metadata keys/values.
llama_adapter_lora_init_impl: - kv   0:                       general.architecture str              = llama
llama_adapter_lora_init_impl: - kv   1:                               general.type str              = adapter
llama_adapter_lora_init_impl: - kv   2:                               adapter.type str              = lora
llama_adapter_lora_init_impl: - kv   3:                               general.name str              = Checkpoint 55
llama_adapter_lora_init_impl: - kv   4:                            general.version str              = 55
llama_adapter_lora_init_impl: - kv   5:                           general.basename str              = checkpoint
llama_adapter_lora_init_impl: - kv   6:                   general.base_model.count u32              = 1
llama_adapter_lora_init_impl: - kv   7:                  general.base_model.0.name str              = Meta Llama 3.2 3B Instruct
llama_adapter_lora_init_impl: - kv   8:                         adapter.lora.alpha f32              = 32.000000
llama_adapter_lora_init_impl: - kv   9:               general.quantization_version u32              = 2
llama_adapter_lora_init_impl:      CUDA0 LoRA buffer size =   185.50 MiB
llama_adapter_lora_init_impl: loaded 392 tensors from lora file
common_init_from_params: added <|end_of_text|> logit bias = -inf
common_init_from_params: added <|eom_id|> logit bias = -inf
common_init_from_params: added <|eot_id|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/root/llama.cpp_gpu_b6708/llama.cpp/ggml/src/ggml-backend.cpp:1718: GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(+0x183cb)[0x7f153224f3cb]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(ggml_print_backtrace+0x21f)[0x7f153224f82f]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(ggml_abort+0x152)[0x7f153224fa02]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libggml-base.so(ggml_backend_sched_alloc_graph+0x1c3)[0x7f153226a113]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libllama.so(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xf3)[0x7f15323897f3]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libllama.so(_ZN13llama_context6decodeERK11llama_batch+0x2e7)[0x7f15323900b7]
/root/llama.cpp_gpu_b6708/llama.cpp/build/bin/libllama.so(llama_decode+0x10)[0x7f1532390f50]
build/bin/llama-cli(+0x158935)[0x558cbaa5b935]
build/bin/llama-cli(+0x442a3)[0x558cba9472a3]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f1531c55d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f1531c55e40]
build/bin/llama-cli(+0x49c15)[0x558cba94cc15]
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Eval bug: LoRA inference crashes GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed #16555

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: LoRA inference crashes GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed #16555

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions