Skip to content

server: Doubled http headers in /v1/chat/completions response in multiple models mode #17693

@arkamar

Description

@arkamar

The /v1/chat/completions response http headers are doubled when I run llama-server in multiple models mode (with --models-dir parameter). For following request:

POST /v1/chat/completions HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:140.0) Gecko/20100101 Firefox/140.0
Accept: */*
Accept-Language: en-US,en;q=0.7,cs;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8080/
Content-Type: application/json
Content-Length: 522
Origin: http://127.0.0.1:8080
DNT: 1
Connection: keep-alive
Cookie: sidebar:state=true
Priority: u=4

{"messages":[{"role":"user","content":"test"}],"stream":true,"model":"Qwen3-0.6B-Q8_0","reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["top_k","typ_p","top_p","min_p","temperature"],"timings_per_token":true}

the server returns:

HTTP/1.1 200 OK
Server: llama.cpp
Server: llama.cpp
Access-Control-Allow-Origin: http://127.0.0.1:8080
Access-Control-Allow-Origin: http://127.0.0.1:8080
Content-Type: application/json; charset=utf-8
Content-Type: text/event-stream
Transfer-Encoding: chunked
Transfer-Encoding: chunked
Keep-Alive: timeout=5, max=100
Keep-Alive: timeout=5, max=100

Everything seems to work if I connect directly, however it is problematic when the proxy server (nginx in my case) is used:

2025/12/02 14:42:25 [error] 21668#0: *2408 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 172.0.0.1, server: llm.home, request: "POST /v1/chat/completions HTTP/1.1", upstream: "http://127.0.0.1:8080/v1/chat/completions", host: "llm.home", referrer: "http://llm.home/"

Similarly, llm tool does not work as well, with following error:

Error: Connection error.

The issue is related to PR #17470.

Metadata

Metadata

Labels

bugSomething isn't workinglow severityUsed to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)server/webui

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions