server: move msg diffs tracking to HTTP thread #17740

ngxson · 2025-12-03T14:28:00Z

The proposed approach: Delegate state tracking to HTTP thread

When creating a new task, we also create a new task_result_state that holds the state of the current generation "session". This object will be owned by the HTTP thread
Each time we receive a new result, we call result.update(state); this allow the result to populate partial returns, at the same time also update the state object

ngxson · 2025-12-03T14:29:04Z

tools/server/server-context.cpp


    llama_token sampled;

-    common_chat_format chat_format = COMMON_CHAT_FORMAT_CONTENT_ONLY;


chat_format seems to be unused, so I removed this. no idea why it was added in the first place

ngxson · 2025-12-03T15:33:42Z

Server tests should be OK now, running from the mirror PR: ngxson#49

ngxson · 2025-12-04T10:49:37Z

tools/server/server-context.cpp

-            states.emplace_back(task.params.oaicompat_chat_syntax);

            tasks.push_back(std::move(task));
+            states.push_back(task.params.oaicompat_chat_syntax);


task is already moved at this point

ggerganov · 2025-12-04T11:12:30Z

tools/server/server-context.cpp

-            // send the results
-            json res_json = result->to_json();
-            if (result->is_error()) {
+        res->next = [res_this = res.get(), res_type, &should_stop, states = std::move(states)](std::string & output) mutable -> bool {


Instead of making the lambda mutable, wouldn't it be cleaner to maintain the states in the server_res_generator instance (i.e. res)?

Yes that's good idea. I moved it to server_response_reader instead. update() is now called automatically whenever the result is received.

Also added a safe-guard ba2af58 to avoid someone forget to call update()

ggerganov

Note that a87d9a4 is already merged in master, so probably need to rebase

* origin/master: server: strip content-length header on proxy (ggml-org#17734) server: move msg diffs tracking to HTTP thread (ggml-org#17740) examples : add missing code block end marker [no ci] (ggml-org#17756) common : skip model validation when --help is requested (ggml-org#17755) ggml-cpu : remove asserts always evaluating to false (ggml-org#17728) convert: use existing local chat_template if mistral-format model has one. (ggml-org#17749) cmake : simplify build info detection using standard variables (ggml-org#17423) ci : disable ggml-ci-x64-amd-* (ggml-org#17753) common: use native MultiByteToWideChar (ggml-org#17738) metal : use params per pipeline instance (ggml-org#17739) llama : fix sanity checks during quantization (ggml-org#17721) build : move _WIN32_WINNT definition to headers (ggml-org#17736) build: enable parallel builds in msbuild using MTT (ggml-org#17708) ggml-cpu: remove duplicate conditional check 'iid' (ggml-org#17650) Add a couple of file types to the text section (ggml-org#17670) convert : support latest mistral-common (fix conversion with --mistral-format) (ggml-org#17712) Use OpenAI-compatible `/v1/models` endpoint by default (ggml-org#17689) webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (ggml-org#17445)

ngxson mentioned this pull request Dec 3, 2025

chat : reserve memory in compute_diffs and improve naming #17729

Merged

ngxson commented Dec 3, 2025

View reviewed changes

ngxson changed the title ~~Xsn/server improve msg diff~~ server: move msg diffs tracking to HTTP thread Dec 3, 2025

ngxson marked this pull request as ready for review December 3, 2025 15:33

ngxson requested a review from ggerganov as a code owner December 3, 2025 15:33

github-actions bot added examples server labels Dec 3, 2025

ngxson commented Dec 4, 2025

View reviewed changes

ggerganov reviewed Dec 4, 2025

View reviewed changes

ggerganov approved these changes Dec 4, 2025

View reviewed changes

ngxson and others added 9 commits December 4, 2025 14:13

server: move msg diffs tracking to HTTP thread

405e9c1

wip

60f12c2

tool call tests ok

d9faf85

minor : style

f63e760

cont : fix

537ffa9

move states to server_response_reader

0b1704a

add safe-guard

a7efdee

fix

ad72207

fix 2

1c30c28

ngxson force-pushed the xsn/server_improve_msg_diff branch from aad20a6 to 1c30c28 Compare December 4, 2025 13:13

ngxson merged commit c4c10bf into ggml-org:master Dec 4, 2025
69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: move msg diffs tracking to HTTP thread #17740

server: move msg diffs tracking to HTTP thread #17740

ngxson commented Dec 3, 2025 •

edited

Loading

Uh oh!

ngxson Dec 3, 2025

Uh oh!

ngxson commented Dec 3, 2025

Uh oh!

ngxson Dec 4, 2025

Uh oh!

ggerganov Dec 4, 2025

Uh oh!

ngxson Dec 4, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		llama_token sampled;

		common_chat_format chat_format = COMMON_CHAT_FORMAT_CONTENT_ONLY;

server: move msg diffs tracking to HTTP thread #17740

server: move msg diffs tracking to HTTP thread #17740

Conversation

ngxson commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Dec 3, 2025

Uh oh!

ngxson Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Dec 3, 2025 •

edited

Loading