[OpenAI] Support stateful chat completion #330

CharlieFRuan · 2024-03-12T02:50:26Z

This PR adds the stateful option to ChatCompletionRequest. When set to true, we preserve previous chat history, allowing multi-round chat, essentially behaving like generate(). Note that a stateful chat can only have n=1.

In addition, we expose getMessage() to ChatInterface. This allows streaming chat completion requests to extract the final response more easily (rather than manually concatenating the deltas).

Changes in WebLLM: - Stateful chat completion: #330 - OpenAI's `logit_bias`: #331 - OpenAI's `logprobs` and `top_logprobs`: #333 Changes in TVMjs: - apache/tvm#16650 - Fix param download issues (already reflected in 0.2.26, but at the time this PR was not merged yet) - Expose `sampleTopPFromProb` to support `logprobs` (new in 0.2.27)

…tCompletion (#359) We introduced the field `stateful` in `chatCompletion()` earlier to allow easier multi-round chatting in #330. However, this is not ideal since we would prefer APIs that are functional in behavior, giving us various benefits (e.g. better fault tolerance for future use cases). Therefore, in this PR: - We disable `chatCompletionRequest.stateful`, and ask users to maintain the chat history explicitly - Instead, we introduce implicit KVCache reuse for multi-round chatting - When we detect users are doing multi-round chatting, we will not reset the KV cache, so only the new message will be prefilled - To detect multi-round chatting, we instantiate a `Conversation` instance for each request, and compare it with the current internal `Conversation`. If they match, it means that we can safely not reset the internal state, and only prefill the new input. To see the behavior, check out `mainMultiroundChat()` in `examples/openai-api/src/openai_api.ts`. Implementation details: - Instantiate `Conversation` object in `ChatModule.prefill()`, since this is the place where various workflows meet (streaming, non-streaming, n > 1, etc.) - The object's state is determined by system prompt, message history, and function calling usages - Inside `prefill()`, we then compare the two objects with `compareConversationObject()`, reset all internal states if false - Another detail is that, instead of overriding `conversation.config.system_message`, we add a field `conversation.override_system_message`, making `conversation.config` protected - We further remove all methods in `ChatModule` that overrides `this.getPipeline().conversation` by changing `updateConversationWithChatCompletionMessages()` to `getConversationFromChatCompletionRequest()`, keeping things more functional internally

[OpenAI] Support stateful chat completion

a14d10a

CharlieFRuan merged commit 212ae18 into mlc-ai:main Mar 12, 2024

CharlieFRuan mentioned this pull request Mar 14, 2024

[Version] Bump version to 0.2.27 #334

Merged

CharlieFRuan mentioned this pull request Apr 3, 2024

[KVCache] Add implicit KVCache reuse, disable stateful option for chatCompletion #359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenAI] Support stateful chat completion #330

[OpenAI] Support stateful chat completion #330

Uh oh!

CharlieFRuan commented Mar 12, 2024

Uh oh!

Uh oh!