Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.Net] Bring Semantic-Kernel to AutoGen && Propose a new message abstraction to support multi-modal/parallel function call && potential breaking change from openai api #1676

Merged
merged 32 commits into from
Mar 3, 2024

Conversation

LittleLittleCloud
Copy link
Collaborator

@LittleLittleCloud LittleLittleCloud commented Feb 14, 2024

Update on 2024/03/03

This PR is ready to be reviewed and merged, apology for creating such a large PR and I wish to be able to separate it into small pieces. But considering that this PR introduces several API break changes and it's probably better to break them all together in one PR.

So, here's a few scenarios with the new IMessage abstraction and built-in message types.

Scenario 1: type-matching in processing messages

Below scenario shows how to use type-matching to convert an image message to text message so it can be passed to an agent that only understand text.

var chatHistory = [
   new TextMessage("Hey explain the image below"),
   new ImageMessage(image: xxx),
]

var imageAgent // a client or agent that can understand image
var textAgent // an agent that can only process text information
var agent = textAgent
    .RegisterMiddleware(async (messages, option, innerAgent, ct) => {
    var msgs = messages.Select(msg => 
        {
            if (msg is ImageMessage) // use imageAgent to explain what's inside image and return a text message)
            else return msg;
       }

   return await innerAgent.GenerateReplyAsync(msgs);
   });

var reply = agent.SendAsync(chatHistory);

Original post

Why are these changes needed?

This PR proposes a new abstraction for the message type in AutoGen.Net, plus some built-in Message classes. The general idea behind this is to avoid crafting a general, all-in-one message for all agents and replace it with multiple simple, specific messages. This PR is also required by #1647

The original Message type is kept for backward compatibility, the new added built-in Messages are

  • TextMessage: plain text message
  • ImageMessage: image message
  • MultiModaMessage: A combination of [text message, image message, ...]
  • ToolCallMessage: a message that contains one or several tool calls
  • TooCallResultMessage: a tool/function message that contains one or several tool calls result
  • AggregateMessage<M1, M2>: an aggregate message of M1 + M2
  • MessageEnvelope: a message wrapper for an arbitrary object

Agents can decide which messages they can accept and consume. For example, for dalle agent, would probably support TextMessage and generate ImageMessage. For GPT agents, it would support TextMessage, ImageMessage, MultiModalMessage, and so on....

If an agent workflow, like two-agent chat, the messages supported by both agents are different, there will be no magic happening and the conversation will default to an error. However, that behavior can be overwritten with middleware. For example, if a multi-modal message flows into gpt-3.5-turbo agent, the default behavior would be a 400 error from openai side, but user can use middleware to perform a check over the messages and return a default message instead of a multi-modal message detected.

So, anyway, I feel like myself typing too much and thanks for your patience if you read here. Please review the PR and leaves feedback, greatly appreciated!

Related issue number

Checks

@LittleLittleCloud LittleLittleCloud changed the base branch from main to dotnet February 14, 2024 08:01
@@ -162,4 +163,190 @@ public static ChatRequestFunctionMessage ToChatRequestFunctionMessage(this Messa

return functionMessage;
}

public static IEnumerable<ChatRequestMessage> ToOpenAIChatRequestMessage(this IAgent agent, IMessage message)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to break it into small pieces

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise we can break it up into a chain of message handlers. Though functionality-wise we don't need to do it now. Unless you have some other considerations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a better implementation but yes, functionality-wise no need to change that code. Other than we can probably make it internal (no need to expose this as a public method)
https://github.com/microsoft/autogen/pull/1676/files#diff-99bde4c9640d0c773927d1462185e5bca0dae555ad22df353645469b755c921aR164

}

[Fact]
public void ToOpenAIChatRequestMessageTest()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use example for TextMessage, MultiModalMessage and so on.


ChatRequestMessage[] messages =
[
new ChatRequestUserMessage("Hello"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPTAgent allows you to use messages from the original aoai sdk as well

@LittleLittleCloud LittleLittleCloud changed the title [.Net] Propose a new message abstraction to support multi-modal/parallel function call && potential breaking change from openai api [.Net] Bring Semantic-Kernel to AutoGen && Propose a new message abstraction to support multi-modal/parallel function call && potential breaking change from openai api Feb 14, 2024
{
public static async Task RunAsync()
{
var openAIKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? throw new Exception("Please set OPENAI_API_KEY environment variable.");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewbolanos Any suggestion on making a better example with AutoGen + SemanticKernel?

{
// might be a plain text return or a function call return
var msg = reply.First();
if (msg is OpenAIChatMessageContent oaiContent)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewbolanos and @SergeyMenshykh Could you review the convertion from AutoGen message to sk message content here

@LittleLittleCloud LittleLittleCloud added the AutoGen.Net issues related to AutoGen.Net label Feb 28, 2024
}
else
{
throw new InvalidOperationException("Unsupported return type, multiple messages are not supported.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the chat completion service can return many massages. By default, it returns one, but can be configured to return many choices/completions each of which will be represented by a separate chat message content. This behavior is controlled by the ResultsPerPrompt property of the OpenAIPromptExecutionSettings settings class.

This example demonstrates how to configure it to return 2 completions - Example36_MultiCompletion

{
// might be a plain text return or a function call return
var msg = reply.First();
if (msg is OpenAIChatMessageContent oaiContent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling only the OpenAIChatMessageContent message will not allow this agent to work with any other non-OpenAI existing or future implementations of the IChatCompletionService interface. For example -AzureOpenAIWithDataChatMessageContent.cs.

Additionally, referencing the OpenAIChatMessageContent type in the agent unnecessarily couples it with OpenAI message. This is acceptable in the short-term perspective for referencing it for tool-calling purposes; however, in the long-term perspective, the dependency of the agent on LLM-specific messages should be dropped. Instead, an LLM-agnostic model (which we plan to start working on soon) for tool-calling should be used.

So, I suggest to slightly change the message handling logic:

if (msg is OpenAIChatMessageContent oaiContent && oaiContent.ToolCalls is { Count: > 0 } toolCalls)
{
   //Get tools call info only - id, name, arguments
   return new function message
}

foreach(var content in msg.Items)
{
   ... switch ...
   {
      TextContent => text message
      ImageContent =>  image message
   }
}

{
return new Message(Role.Assistant, content, this.Name);
}
else if (oaiContent.ToolCalls is { Count: 1 } && oaiContent.ToolCalls.First() is ChatCompletionsFunctionToolCall toolCall)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. LLM can easily request more than one tool call. Is there any reason to handle only the first one?
  2. Just FYI: Tool calling is controlled by the ToolCallBehavior property. If the auto behavior is configured, the tool(s) will be called automatically behind the scenes, with no tool calling messages returned by the GetChatMessageContentsAsync method. This example demonstrates this in action: AgentAutoFunctionInvocationAsync. To get the GetChatMessageContentsAsync method to return an OpenAIChatMessageContent with a filled ToolCalls collection, the behavior should be set to manual, as demonstrated in this example: AgentManualFunctionInvocationAsync.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyMenshykh I just didn't find out an example that processes more than one tool call, so I don't know how to implement the conversation from semantic kernel to autogen. Could you share an example of multi-tool-usage in semantic kernel

/// <summary>
/// The agent that intergrade with the semantic kernel.
/// </summary>
public class SemanticKernelAgent : IStreamingAgent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: We are already working on ChatCompletionAgent in the feature-agents-abstraction branch. Streaming is in progress as well - .Net: [Agents] Abstraction for streaming

@ekzhu ekzhu requested a review from BeibinLi March 3, 2024 03:40
{
this.Role = role;
this.From = from;
this.Url = url;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the URL to the image for this message? Would a message contain more than one images?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and for multi-image situations, MultiModalMessage is the message to use.

Copy link
Collaborator

@ekzhu ekzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR. Perhaps it is a time to think about the next steps, in addition to documentation.

@LittleLittleCloud LittleLittleCloud merged commit e12a824 into dotnet Mar 3, 2024
8 checks passed
@LittleLittleCloud LittleLittleCloud deleted the u/xiaoyun/dotnet/message branch May 2, 2024 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoGen.Net issues related to AutoGen.Net
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants