-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
- Build with
cmake .. -DGGML_METAL=on -DGGML_RPC=ON
cmake --build . --config Release
- Logs of rpc-server
(base) ➜ build-rpc ./bin/rpc-server -p 50051 -H 0.0.0.0 -m 154000
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Host ('0.0.0.0') is != '127.0.0.1'
Never expose the RPC server to an open network!
This is an experimental feature and is not secure!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
create_backend: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 154618.82 MB
Starting RPC server on 0.0.0.0:50051, backend memory: 154000 MB
- Logs of llama-cli
(base) ➜ build-rpc ~/Softwares/llama.cpp-b3720/build-rpc/bin/llama-cli -m ~/Softwares/llm-models/NousResearch/Hermes-3-Llama-3.1-8B-GGUF/Hermes-3-Llama-3.1-8B.Q8_0.gguf -p "Hello ,my name is" --repeat-penalty 1.0 -n 64 --rpc 169.254.151.33:50051 -ngl 99
...
llm_load_print_meta: max token length = 256
llm_load_tensors: ggml ctx size = 0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 532.31 MiB
llm_load_tensors: RPC[169.254.151.33:50051] buffer size = 7605.34 MiB
.........................................................................................
llama_new_context_with_model: n_ctx = 131072
llama_new_context_with_model: n_batch = 2048
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 154618.82 MB
llama_kv_cache_init: RPC[169.254.151.33:50051] KV buffer size = 16384.00 MiB
llama_new_context_with_model: KV self size = 16384.00 MiB, K (f16): 8192.00 MiB, V (f16): 8192.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
ggml_backend_metal_buffer_type_alloc_buffer: error: failed to allocate buffer, size = 0.00 MiB
ggml_gallocr_reserve_n: failed to allocate Metal buffer of size 0
llama_new_context_with_model: failed to allocate compute buffers
ggml_metal_free: deallocating
llama_init_from_gpt_params: error: failed to create context with model '/Users/mac527a/Softwares/llm-models/NousResearch/Hermes-3-Llama-3.1-8B-GGUF/Hermes-3-Llama-3.1-8B.Q8_0.gguf'
main: error: unable to load model
- Furether information:
When I delete '--rpc 169.254.151.33:50051' or just set '-ngl 0', llama-cli runs correctly and gives the generated context.
Name and Version
(base) ➜ build-rpc ./bin/llama-cli --version
version: 0 (unknown)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0
llama-cli is built on the source code of llama.cpp-b3720
What operating system are you seeing the problem on?
Mac
Relevant log output
No response
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)