-
Notifications
You must be signed in to change notification settings - Fork 13.9k
server: move msg diffs tracking to HTTP thread #17740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| llama_token sampled; | ||
|
|
||
| common_chat_format chat_format = COMMON_CHAT_FORMAT_CONTENT_ONLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chat_format seems to be unused, so I removed this. no idea why it was added in the first place
|
Server tests should be OK now, running from the mirror PR: ngxson#49 |
tools/server/server-context.cpp
Outdated
| states.emplace_back(task.params.oaicompat_chat_syntax); | ||
|
|
||
| tasks.push_back(std::move(task)); | ||
| states.push_back(task.params.oaicompat_chat_syntax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task is already moved at this point
tools/server/server-context.cpp
Outdated
| // send the results | ||
| json res_json = result->to_json(); | ||
| if (result->is_error()) { | ||
| res->next = [res_this = res.get(), res_type, &should_stop, states = std::move(states)](std::string & output) mutable -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making the lambda mutable, wouldn't it be cleaner to maintain the states in the server_res_generator instance (i.e. res)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's good idea. I moved it to server_response_reader instead. update() is now called automatically whenever the result is received.
Also added a safe-guard ba2af58 to avoid someone forget to call update()
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a87d9a4 is already merged in master, so probably need to rebase
aad20a6 to
1c30c28
Compare
* origin/master: server: strip content-length header on proxy (ggml-org#17734) server: move msg diffs tracking to HTTP thread (ggml-org#17740) examples : add missing code block end marker [no ci] (ggml-org#17756) common : skip model validation when --help is requested (ggml-org#17755) ggml-cpu : remove asserts always evaluating to false (ggml-org#17728) convert: use existing local chat_template if mistral-format model has one. (ggml-org#17749) cmake : simplify build info detection using standard variables (ggml-org#17423) ci : disable ggml-ci-x64-amd-* (ggml-org#17753) common: use native MultiByteToWideChar (ggml-org#17738) metal : use params per pipeline instance (ggml-org#17739) llama : fix sanity checks during quantization (ggml-org#17721) build : move _WIN32_WINNT definition to headers (ggml-org#17736) build: enable parallel builds in msbuild using MTT (ggml-org#17708) ggml-cpu: remove duplicate conditional check 'iid' (ggml-org#17650) Add a couple of file types to the text section (ggml-org#17670) convert : support latest mistral-common (fix conversion with --mistral-format) (ggml-org#17712) Use OpenAI-compatible `/v1/models` endpoint by default (ggml-org#17689) webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (ggml-org#17445)
Fix #17726
The proposed approach: Delegate state tracking to HTTP thread
task_result_statethat holds the state of the current generation "session". This object will be owned by the HTTP threadresult.update(state); this allow the result to populate partial returns, at the same time also update thestateobject