Bug: ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer

### What happened?

1. Build with 
```bash
cmake .. -DGGML_METAL=on -DGGML_RPC=ON
cmake --build . --config Release
```
2. Logs of rpc-server
```
(base) ➜  build-rpc ./bin/rpc-server -p 50051 -H 0.0.0.0 -m 154000

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
         Never expose the RPC server to an open network!
         This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

create_backend: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 154618.82 MB
Starting RPC server on 0.0.0.0:50051, backend memory: 154000 MB
```

3. Logs of llama-cli
```bash
(base) ➜  build-rpc ~/Softwares/llama.cpp-b3720/build-rpc/bin/llama-cli -m ~/Softwares/llm-models/NousResearch/Hermes-3-Llama-3.1-8B-GGUF/Hermes-3-Llama-3.1-8B.Q8_0.gguf -p "Hello ,my name is" --repeat-penalty 1.0 -n 64 --rpc 169.254.151.33:50051 -ngl 99

...
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size =    0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =   532.31 MiB
llm_load_tensors: RPC[169.254.151.33:50051] buffer size =  7605.34 MiB
.........................................................................................
llama_new_context_with_model: n_ctx      = 131072
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 154618.82 MB
llama_kv_cache_init: RPC[169.254.151.33:50051] KV buffer size = 16384.00 MiB
llama_new_context_with_model: KV self size  = 16384.00 MiB, K (f16): 8192.00 MiB, V (f16): 8192.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer, size =     0.00 MiB
ggml_gallocr_reserve_n: failed to allocate Metal buffer of size 0
llama_new_context_with_model: failed to allocate compute buffers
ggml_metal_free: deallocating
llama_init_from_gpt_params: error: failed to create context with model '/Users/mac527a/Softwares/llm-models/NousResearch/Hermes-3-Llama-3.1-8B-GGUF/Hermes-3-Llama-3.1-8B.Q8_0.gguf'
main: error: unable to load model
```
4. Furether information:
When I delete '--rpc 169.254.151.33:50051' or just set '-ngl 0',  llama-cli runs correctly and gives the generated context.

### Name and Version

(base) ➜  build-rpc ./bin/llama-cli --version
version: 0 (unknown)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

llama-cli is built on the source code of llama.cpp-b3720

### What operating system are you seeing the problem on?

Mac

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer #9460

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer #9460

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions