[.Net] Bring Semantic-Kernel to AutoGen && Propose a new message abstraction to support multi-modal/parallel function call && potential breaking change from openai api #1676

LittleLittleCloud · 2024-02-14T08:00:50Z

Update on 2024/03/03

This PR is ready to be reviewed and merged, apology for creating such a large PR and I wish to be able to separate it into small pieces. But considering that this PR introduces several API break changes and it's probably better to break them all together in one PR.

So, here's a few scenarios with the new IMessage abstraction and built-in message types.

Scenario 1: type-matching in processing messages

Below scenario shows how to use type-matching to convert an image message to text message so it can be passed to an agent that only understand text.

var chatHistory = [
   new TextMessage("Hey explain the image below"),
   new ImageMessage(image: xxx),
]

var imageAgent // a client or agent that can understand image
var textAgent // an agent that can only process text information
var agent = textAgent
    .RegisterMiddleware(async (messages, option, innerAgent, ct) => {
    var msgs = messages.Select(msg => 
        {
            if (msg is ImageMessage) // use imageAgent to explain what's inside image and return a text message)
            else return msg;
       }

   return await innerAgent.GenerateReplyAsync(msgs);
   });

var reply = agent.SendAsync(chatHistory);

Original post

Why are these changes needed?

This PR proposes a new abstraction for the message type in AutoGen.Net, plus some built-in Message classes. The general idea behind this is to avoid crafting a general, all-in-one message for all agents and replace it with multiple simple, specific messages. This PR is also required by #1647

The original Message type is kept for backward compatibility, the new added built-in Messages are

TextMessage: plain text message
ImageMessage: image message
MultiModaMessage: A combination of [text message, image message, ...]
ToolCallMessage: a message that contains one or several tool calls
TooCallResultMessage: a tool/function message that contains one or several tool calls result
AggregateMessage<M1, M2>: an aggregate message of M1 + M2
MessageEnvelope: a message wrapper for an arbitrary object

Agents can decide which messages they can accept and consume. For example, for dalle agent, would probably support TextMessage and generate ImageMessage. For GPT agents, it would support TextMessage, ImageMessage, MultiModalMessage, and so on....

If an agent workflow, like two-agent chat, the messages supported by both agents are different, there will be no magic happening and the conversation will default to an error. However, that behavior can be overwritten with middleware. For example, if a multi-modal message flows into gpt-3.5-turbo agent, the default behavior would be a 400 error from openai side, but user can use middleware to perform a check over the messages and return a default message instead of a multi-modal message detected.

So, anyway, I feel like myself typing too much and thanks for your patience if you read here. Please review the PR and leaves feedback, greatly appreciated!

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

LittleLittleCloud · 2024-02-14T08:05:15Z

dotnet/src/AutoGen/OpenAI/Extension/MessageExtension.cs

@@ -162,4 +163,190 @@ public static ChatRequestFunctionMessage ToChatRequestFunctionMessage(this Messa

        return functionMessage;
    }
+
+    public static IEnumerable<ChatRequestMessage> ToOpenAIChatRequestMessage(this IAgent agent, IMessage message)


Maybe we need to break it into small pieces

Code-wise we can break it up into a chain of message handlers. Though functionality-wise we don't need to do it now. Unless you have some other considerations?

This might be a better implementation but yes, functionality-wise no need to change that code. Other than we can probably make it internal (no need to expose this as a public method)
https://github.com/microsoft/autogen/pull/1676/files#diff-99bde4c9640d0c773927d1462185e5bca0dae555ad22df353645469b755c921aR164

LittleLittleCloud · 2024-02-14T08:06:21Z

dotnet/test/AutoGen.Tests/OpenAIMessageTests.cs

+    }
+
+    [Fact]
+    public void ToOpenAIChatRequestMessageTest()


The use example for TextMessage, MultiModalMessage and so on.

LittleLittleCloud · 2024-02-14T08:07:18Z

dotnet/test/AutoGen.Tests/OpenAIMessageTests.cs

+
+        ChatRequestMessage[] messages =
+            [
+                new ChatRequestUserMessage("Hello"),


GPTAgent allows you to use messages from the original aoai sdk as well

LittleLittleCloud · 2024-02-14T23:19:03Z

dotnet/sample/AutoGen.BasicSamples/Example09_SemanticKernel.cs

+{
+    public static async Task RunAsync()
+    {
+        var openAIKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? throw new Exception("Please set OPENAI_API_KEY environment variable.");


@matthewbolanos Any suggestion on making a better example with AutoGen + SemanticKernel?

LittleLittleCloud · 2024-02-14T23:20:45Z