-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Closed
Bug
Copy link
Labels
bugSomething isn't workingSomething isn't workinglow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)server/webui
Description
The /v1/chat/completions response http headers are doubled when I run llama-server in multiple models mode (with --models-dir parameter). For following request:
POST /v1/chat/completions HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:140.0) Gecko/20100101 Firefox/140.0
Accept: */*
Accept-Language: en-US,en;q=0.7,cs;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8080/
Content-Type: application/json
Content-Length: 522
Origin: http://127.0.0.1:8080
DNT: 1
Connection: keep-alive
Cookie: sidebar:state=true
Priority: u=4
{"messages":[{"role":"user","content":"test"}],"stream":true,"model":"Qwen3-0.6B-Q8_0","reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["top_k","typ_p","top_p","min_p","temperature"],"timings_per_token":true}
the server returns:
HTTP/1.1 200 OK
Server: llama.cpp
Server: llama.cpp
Access-Control-Allow-Origin: http://127.0.0.1:8080
Access-Control-Allow-Origin: http://127.0.0.1:8080
Content-Type: application/json; charset=utf-8
Content-Type: text/event-stream
Transfer-Encoding: chunked
Transfer-Encoding: chunked
Keep-Alive: timeout=5, max=100
Keep-Alive: timeout=5, max=100
Everything seems to work if I connect directly, however it is problematic when the proxy server (nginx in my case) is used:
2025/12/02 14:42:25 [error] 21668#0: *2408 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 172.0.0.1, server: llm.home, request: "POST /v1/chat/completions HTTP/1.1", upstream: "http://127.0.0.1:8080/v1/chat/completions", host: "llm.home", referrer: "http://llm.home/"
Similarly, llm tool does not work as well, with following error:
Error: Connection error.
The issue is related to PR #17470.
ServeurpersoCom
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinglow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)server/webui