-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[.Net] Bring Semantic-Kernel to AutoGen && Propose a new message abstraction to support multi-modal/parallel function call && potential breaking change from openai api #1676
Conversation
@@ -162,4 +163,190 @@ public static ChatRequestFunctionMessage ToChatRequestFunctionMessage(this Messa | |||
|
|||
return functionMessage; | |||
} | |||
|
|||
public static IEnumerable<ChatRequestMessage> ToOpenAIChatRequestMessage(this IAgent agent, IMessage message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to break it into small pieces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise we can break it up into a chain of message handlers. Though functionality-wise we don't need to do it now. Unless you have some other considerations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a better implementation but yes, functionality-wise no need to change that code. Other than we can probably make it internal (no need to expose this as a public method)
https://github.com/microsoft/autogen/pull/1676/files#diff-99bde4c9640d0c773927d1462185e5bca0dae555ad22df353645469b755c921aR164
} | ||
|
||
[Fact] | ||
public void ToOpenAIChatRequestMessageTest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use example for TextMessage
, MultiModalMessage
and so on.
|
||
ChatRequestMessage[] messages = | ||
[ | ||
new ChatRequestUserMessage("Hello"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPTAgent allows you to use messages from the original aoai sdk as well
{ | ||
public static async Task RunAsync() | ||
{ | ||
var openAIKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? throw new Exception("Please set OPENAI_API_KEY environment variable."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewbolanos Any suggestion on making a better example with AutoGen + SemanticKernel?
{ | ||
// might be a plain text return or a function call return | ||
var msg = reply.First(); | ||
if (msg is OpenAIChatMessageContent oaiContent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewbolanos and @SergeyMenshykh Could you review the convertion from AutoGen message to sk message content here
} | ||
else | ||
{ | ||
throw new InvalidOperationException("Unsupported return type, multiple messages are not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the chat completion service can return many massages. By default, it returns one, but can be configured to return many choices/completions each of which will be represented by a separate chat message content. This behavior is controlled by the ResultsPerPrompt property of the OpenAIPromptExecutionSettings settings class.
This example demonstrates how to configure it to return 2 completions - Example36_MultiCompletion
{ | ||
// might be a plain text return or a function call return | ||
var msg = reply.First(); | ||
if (msg is OpenAIChatMessageContent oaiContent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling only the OpenAIChatMessageContent message will not allow this agent to work with any other non-OpenAI existing or future implementations of the IChatCompletionService interface. For example -AzureOpenAIWithDataChatMessageContent.cs.
Additionally, referencing the OpenAIChatMessageContent type in the agent unnecessarily couples it with OpenAI message. This is acceptable in the short-term perspective for referencing it for tool-calling purposes; however, in the long-term perspective, the dependency of the agent on LLM-specific messages should be dropped. Instead, an LLM-agnostic model (which we plan to start working on soon) for tool-calling should be used.
So, I suggest to slightly change the message handling logic:
if (msg is OpenAIChatMessageContent oaiContent && oaiContent.ToolCalls is { Count: > 0 } toolCalls)
{
//Get tools call info only - id, name, arguments
return new function message
}
foreach(var content in msg.Items)
{
... switch ...
{
TextContent => text message
ImageContent => image message
}
}
{ | ||
return new Message(Role.Assistant, content, this.Name); | ||
} | ||
else if (oaiContent.ToolCalls is { Count: 1 } && oaiContent.ToolCalls.First() is ChatCompletionsFunctionToolCall toolCall) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- LLM can easily request more than one tool call. Is there any reason to handle only the first one?
- Just FYI: Tool calling is controlled by the ToolCallBehavior property. If the
auto
behavior is configured, the tool(s) will be called automatically behind the scenes, with no tool calling messages returned by theGetChatMessageContentsAsync
method. This example demonstrates this in action: AgentAutoFunctionInvocationAsync. To get theGetChatMessageContentsAsync
method to return an OpenAIChatMessageContent with a filled ToolCalls collection, the behavior should be set to manual, as demonstrated in this example: AgentManualFunctionInvocationAsync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyMenshykh I just didn't find out an example that processes more than one tool call, so I don't know how to implement the conversation from semantic kernel to autogen. Could you share an example of multi-tool-usage in semantic kernel
/// <summary> | ||
/// The agent that intergrade with the semantic kernel. | ||
/// </summary> | ||
public class SemanticKernelAgent : IStreamingAgent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: We are already working on ChatCompletionAgent in the feature-agents-abstraction branch. Streaming is in progress as well - .Net: [Agents] Abstraction for streaming
{ | ||
this.Role = role; | ||
this.From = from; | ||
this.Url = url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the URL to the image for this message? Would a message contain more than one images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and for multi-image situations, MultiModalMessage
is the message to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR. Perhaps it is a time to think about the next steps, in addition to documentation.
Update on 2024/03/03
This PR is ready to be reviewed and merged, apology for creating such a large PR and I wish to be able to separate it into small pieces. But considering that this PR introduces several API break changes and it's probably better to break them all together in one PR.
So, here's a few scenarios with the new
IMessage
abstraction and built-in message types.Scenario 1: type-matching in processing messages
Below scenario shows how to use type-matching to convert an image message to text message so it can be passed to an agent that only understand text.
Original post
Why are these changes needed?
This PR proposes a new abstraction for the message type in AutoGen.Net, plus some built-in Message classes. The general idea behind this is to avoid crafting a general, all-in-one message for all agents and replace it with multiple simple, specific messages. This PR is also required by #1647
The original
Message
type is kept for backward compatibility, the new added built-in Messages areAgents can decide which messages they can accept and consume. For example, for dalle agent, would probably support TextMessage and generate ImageMessage. For GPT agents, it would support TextMessage, ImageMessage, MultiModalMessage, and so on....
If an agent workflow, like two-agent chat, the messages supported by both agents are different, there will be no magic happening and the conversation will default to an error. However, that behavior can be overwritten with middleware. For example, if a multi-modal message flows into gpt-3.5-turbo agent, the default behavior would be a 400 error from openai side, but user can use middleware to perform a check over the messages and return a default message instead of a multi-modal message detected.
So, anyway, I feel like myself typing too much and thanks for your patience if you read here. Please review the PR and leaves feedback, greatly appreciated!
Related issue number
Checks