Skip to content

Eval bug: [Mac] LoRA inference crashes with GGML_ASSERT graph size error (works on Windows) #16475

@raja-gamevrs

Description

@raja-gamevrs

Name and Version

llama-cli version 6710 (74b8fc1)
Also tested with:

  • Official xcframework (latest release from GitHub)
  • Homebrew llama.cpp version 6710 and last 2 builds
    Both exhibit the same crash.

Operating systems

Mac

GGML backends

Metal, CPU

Hardware

Apple M2 Ultra

Models

Base Model:

LoRA Adapter:

  • Name: Hermes-3-Llama-3.2-3B_adapter.
  • Size: 93MB
  • Format: GGUF
  • Trained and tested on Windows (works fine on llama cli there)

Problem description & steps to reproduce

LoRA adapter inference crashes on Mac with a graph size assertion error, but works perfectly on Windows. Base model inference and LoRA adapter loading work fine on Mac - only inference with an active LoRA adapter crashes.
TEPS TO REPRODUCE:

  1. Test base model (this works):
    llama-cli -m Hermes-3-Llama-3.2-3B_q4_0.gguf
    -n 20 -ngl 99 -c 2048 -b 256
    Result: Works perfectly, generates text

  2. Test with LoRA adapter (this crashes):
    llama-cli -m Hermes-3-Llama-3.2-3B_q4_0.gguf
    --lora gandalf_Hermes-3-Llama-3.2-3B_adapter.gguf
    -n 20 -ngl 99 -c 2048 -b 256
    Result: Crashes immediately with GGML_ASSERT

Workaround attempted(all failed):

  • Reduced context: -c 512, -c 1024

  • Reduced batch: -b 128

  • CPU only: -ngl 0

  • Limited threads: -t 4

  • Different prompts

  • Latest xcframework (just updated)

  • Homebrew build (v6710)

  • Windows: Base + LoRA works perfectly

  • Mac (xcframework): Base works, LoRA crashes

  • Mac (Homebrew): Base works, LoRA crashes

First Bad Commit

Unable to determine - issue appears to exist in latest and last 2 versions.

Relevant log output

Command executed:
llama-cli -m models/hermes3/base/Hermes-3-Llama-3.2-3B_q4_0.gguf --lora models/hermes3/adapters/Hermes-3-Llama-3.2-3B_adapter.gguf -p "Tell me about wizards." -n 20 -ngl 99 -c 2048 -b 256

Output:
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M2 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 115448.73 MB
build: 6710 (74b8fc17) with Apple clang version 17.0.0 (clang-1700.3.19.1) for arm64-apple-darwin25.0.0
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Metal (Apple M2 Ultra) (unknown id) - 110100 MiB free
[... model loading succeeds ...]
llama_lora_adapter_init_impl: applying lora adapter from 'models/hermes3/adapters/gandalf_Hermes-3-Llama-3.2-3B_adapter.gguf'
[... LoRA loading succeeds ...]

Process 40788 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x000000018d35142c libsystem_kernel.dylib`__wait4 + 8

Stack trace:
frame #0: libsystem_kernel.dylib`__wait4 + 8
frame #1: libggml-base.dylib`ggml_abort + 156
frame #2: libggml-base.dylib`ggml_backend_sched_alloc_graph + 464
frame #3: libllama.dylib`llama_context::process_ubatch + 516
frame #4: libllama.dylib`llama_context::decode + 1148
frame #5: libllama.dylib`llama_decode + 20
frame #6: llama-cli`common_init_from_params + 2168
frame #7: llama-cli`main + 636

Error message:
/Users/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-backend.cpp:1718: GGML_ASSERT((int)sched->hash_set.size >= graph->n_nodes + graph->n_leafs) failed

Exit: signal SIGABRT (abort)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions