llama-server: fix duplicate HTTP headers in multiple models mode #17698
+47
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
Approach: Filter at source
This patch filters headers before forwarding them to avoid duplication.
Why headers get duplicated:
When the router proxies child process responses, both the router (via
set_default_headers) and the child send the same headers (Server,
Transfer-Encoding, Keep-Alive, CORS). The proxy was forwarding everything,
resulting in duplicates.
Solution:
Skip headers that will be added by the router or httplib:
Handle Content-Type separately via msg_t.content_type to avoid duplication
when httplib calls set_chunked_content_provider() or set_content().
Tested with:
Before: duplicate Server, Transfer-Encoding, Keep-Alive, Access-Control-Allow-Origin, Content-Type
After: all headers appear exactly once
Fixes #17693