Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 3, 2025

fix #17710

Note: there is also another alternative fix is to allow server_http_proxy to select if it wants to return a stream
or not. but IMO this solution is not elegant because in case of non-stream:

  • it is difficult to track timeout errors
  • proxy with large amount of data becomes slow
  • higher memory usage
  • add lot more code

@pwilkin
Copy link
Collaborator

pwilkin commented Dec 3, 2025

Just an idea - wouldn't it be a better just to sanitize headers at the end and then just remove the duplicates? Or does the proxy add duplicates on its own?

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 3, 2025

Just an idea - wouldn't it be a better just to sanitize headers at the end and then just remove the duplicates? Or does the proxy add duplicates on its own?

some headers are added by httplib, so we don't actually have direct control over what headers are added.

btw, the headers type is already std::set, so duplicate is not even allowed.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 3, 2025

just to note that, adding content-length doesn't result in duplicated header, but some HTTP clients are quite strict and doesn't allow content-length to be specified in stream mode (chunk encoding), so we have to remove it here

for example, most browsers don't care about this, but nginx throws an error

@ngxson ngxson merged commit 9d02299 into ggml-org:master Dec 4, 2025
75 of 80 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Dec 4, 2025
* origin/master:
server: strip content-length header on proxy (ggml-org#17734)
server: move msg diffs tracking to HTTP thread (ggml-org#17740)
examples : add missing code block end marker [no ci] (ggml-org#17756)
common : skip model validation when --help is requested (ggml-org#17755)
ggml-cpu : remove asserts always evaluating to false (ggml-org#17728)
convert: use existing local chat_template if mistral-format model has one. (ggml-org#17749)
cmake : simplify build info detection using standard variables (ggml-org#17423)
ci : disable ggml-ci-x64-amd-* (ggml-org#17753)
common: use native MultiByteToWideChar (ggml-org#17738)
metal : use params per pipeline instance (ggml-org#17739)
llama : fix sanity checks during quantization (ggml-org#17721)
build : move _WIN32_WINNT definition to headers (ggml-org#17736)
build: enable parallel builds in msbuild using MTT (ggml-org#17708)
ggml-cpu: remove duplicate conditional check 'iid' (ggml-org#17650)
Add a couple of file types to the text section (ggml-org#17670)
convert : support latest mistral-common (fix conversion with --mistral-format) (ggml-org#17712)
Use OpenAI-compatible `/v1/models` endpoint by default (ggml-org#17689)
webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (ggml-org#17445)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Server multi-model mode add invalid Transfer-Encoding: chunked header to response

3 participants