Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
.Net Single source of truth for chat message content (#5088)
### Motivation, Context and Description Today, the ChatMessageContent class has two sources of truth for its content - the Content property and the Items property. This may be acceptable for now, as all SK and industry chat completion services follow the same protocol. They use the Content property for system, user, assistant, and tool messages and alternatively allow passing image and text content via the Items collection for user messages only. However, this might not be suitable when there's a chat completion service that doesn't follow the protocol mentioned above. For example, it could be a new advanced chat completion service with multimodal support for assistant messages. In this case, consumer code working through the IChatCompletionService interface won't be able to handle content for assistant messages polymorphically, and all consumers that need to work with both current and new chat completion services will have to use code like this: ```C# var message = await chatCompletionService.GetChatMessageContentAsync(...); if (message.Content != null) { // Handle content specified in the Content property of the assistance message } else if (message.Items is { Count: > 0 } items) { // Handle content specified in the Items/items property of the assistance message } // or check the Items property first and then Content one? ``` The problem becomes more apparent and manifests itself immediately in the agent's space. Each agent needs to have logic like the one above to identify the source of the content and map it to an internal API data model. For example, here's the code for a ChatCompletion agent that we would need to write ```C# async Task<ChatMessageContent[]> InvokeAsync(ChatMessageContent[] messages) { var chat = new ChatHistory(); foreach (var message in messages) { if (message.Role == AuthorRole.User) { // User messages can have content in either 'Items' property or the 'Content' property. // Assuming one of the two properties has content, adding the message to the chat and continue. chat.Add(message); continue; } // The system, assistant and tool messages are expected to have content in the 'Content' property. // This expectation is specific to OpenAI chat completion services and may not be relevant to other chat // completion service types, e.g., multimodality for assistant messages, where content would be expected // to be provided via the 'Items' collection. if (!string.IsNullOrEmpty(message.Content)) { chat.Add(message); continue; } // Doing our best to identify content for the message and add it to the chat. if (message.Items is { Count: > 0 } items && items[0] is TextContent textContent) { // The problem with the clone code below is that we loose the original message type that could have been interpreted by // underlying API differently than the type of the clone - ChatMessageContent var clone = new ChatMessageContent( role: message.Role, content: textContent.Text, modelId: message.ModelId, innerContent: message.InnerContent, encoding: message.Encoding, metadata: message.Metadata ); chat.Add(clone); continue; } // If we get here it means that all the above conditions failed, so we add the message // to let underlying API handle it. chat.Add(message); } var chatMessageContent = await chatCompletionService.GetChatMessageContentsAsync(chat, ...); return chatMessageContent.Select(m => { m.Source = this; return m; }).ToArray(); } ``` To avoid all the unnecessary 'if/else' mapping logic needed to identify the source of content depending on the message role, it would be beneficial to have only one source of content - Items. Ideally, the 'Content' property should be removed, but doing so would unnecessarily break a lot of consumer code. As a middle-ground solution, this PR changes the purpose of the 'Content' property from being a separate source of content to a shortcut for the first item of text content type in the 'Items' collection. This way, the 'Content' property will be nothing more than just a convenient method to add, update, or return the text of the first item of text content type. The 'Items' collection, on the other hand, becomes the only source of content that can be used polymorphically by consumer code. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄
- Loading branch information