-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Description
Name and Version
Operating systems
Linux
GGML backends
CPU
Hardware
Mac M4
Models
llama3.2
Problem description & steps to reproduce
When running llama-server using ramalama (which runs llama.cpp inside the container) and with the necessary argument -slot-save-path /tmp to enable the slots feature when I try to do this command curl -X POST "http://localhost:8080/slots/0?action=erase" it will hang until i do control c then on the server side i see the response. But the response is never received by the curl command. I tried doing it inside the container as well to avoid networking issues but it still hangs
My goal is to clear the prompt cache for a summarization feature ie when the context size is reached clear the cache summarize the history and feed it back. The workaround is to just specify a small timeout but this seems like a bug.
ramalama latest llama.cpp commit = b52edd2
First Bad Commit
No response
Relevant log output
bmahabir@bmahabir-mac ramalama % curl -X POST "http://localhost:8080/slots/0?action=erase"
^C
bmahabir@bmahabir-mac ramalama %
srv remove_waiti: remove task 9 from waiting list. current waiting = 1 (before remove)
srv log_server_r: request: POST /slots/0 192.168.127.1 200
srv log_server_r: request:
srv log_server_r: response: {"id_slot":0,"n_erased":43}
The server log only happens after the control C. something is hanging in the llamaserver