Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

Make sure to read the contributing guidelines before submitting a PR

Approach: Filter at source

This patch filters headers before forwarding them to avoid duplication.

Why headers get duplicated:
When the router proxies child process responses, both the router (via
set_default_headers) and the child send the same headers (Server,
Transfer-Encoding, Keep-Alive, CORS). The proxy was forwarding everything,
resulting in duplicates.

Solution:

  1. Skip headers that will be added by the router or httplib:

    • server, transfer-encoding, keep-alive (duplicated by router defaults)
    • access-control-* (CORS headers injected by router)
  2. Handle Content-Type separately via msg_t.content_type to avoid duplication
    when httplib calls set_chunked_content_provider() or set_content().

Tested with:

curl -v http://127.0.0.1:8082/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "Codestral-22B-v0.1-i1-GGUF",
    "messages": [{"role": "user", "content": "hi"}],
    "max_tokens": 5
  }'
*   Trying 127.0.0.1:8082...
* Connected to 127.0.0.1 (127.0.0.1) port 8082 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: 127.0.0.1:8082
> User-Agent: curl/7.88.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 121
>
< HTTP/1.1 200 OK
< Keep-Alive: timeout=5, max=100
< Transfer-Encoding: chunked
< Access-Control-Allow-Origin:
< Server: llama.cpp
< Content-Length: 591
< Content-Type: application/json; charset=utf-8
< Connection: close
<
* Closing connection 0
{"choices":[{"finish_reason":"length","index":0,"message":{"role":"assistant","content":" Hello! I'm"}}],"created":1764699011,"model":"Codestral-22B-v0.1-i1-GGUF","system_fingerprint":"b7274-f57a165eb","object":"chat.completion","usage":{"completion_tokens":5,"prompt_tokens":5,"total_tokens":10},"id":"chatcmpl-6WGgrUU4EWxLfKAXfzUq9hVDLrbybfJm","timings":{"cache_n":4,"prompt_n":1,"prompt_ms":21.409,"prompt_per_token_ms":21.409,"prompt_per_second":46.7093278527722,"predicted_n":5,"predicted_ms":51.717,"predicted_per_token_ms":10.343399999999999,"predicted_per_second":96.68000850784075}}

Before: duplicate Server, Transfer-Encoding, Keep-Alive, Access-Control-Allow-Origin, Content-Type
After: all headers appear exactly once

Fixes #17693

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 2, 2025

Note: I first tried a is_proxied flag approach but it required more code

struct server_http_res {
     int status = 200;
     std::string data;
     std::map<std::string, std::string> headers;
+    bool is_proxied = false;

with logic split between modules. Filtering at source is simpler.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall! would appropriate if you can address some small comments

@arkamar
Copy link

arkamar commented Dec 2, 2025

I just want to confirm that this PR solves my issue, nginx errors are gone and llm works as well.

- restrict scope of header after std::move
- simplify header check (remove unordered_set)
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks! (merging once the CI passes)

@ngxson ngxson merged commit 5ceed62 into ggml-org:master Dec 3, 2025
65 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: Doubled http headers in /v1/chat/completions response in multiple models mode

3 participants