Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent that both proposes and executes tools #2223

Open
sonichi opened this issue Mar 31, 2024 · 23 comments
Open

agent that both proposes and executes tools #2223

sonichi opened this issue Mar 31, 2024 · 23 comments
Labels
function/tool suggestion and execution of function/tool call nested-chat nested chat and society of mind agent

Comments

@sonichi
Copy link
Collaborator

sonichi commented Mar 31, 2024

I like the concept of Autogen, I like to use a couple of features of it. But I currently just need a simple tool executor.

I notice that Autogen heavily works on the concept of generating code and executing it. Does it not has an simple way of executing code directly after it has been "selected"? - So basically same like OpenAI Function calling? Does it all use function calling?

I ask that, because for my use case it's sometimes enough to just select the correct tool. No need of extra checks etc.

Originally posted by @WebsheetPlugin in #2208

Suggestion: Create a reply function `propose_and_execute_tools_nested_reply" which uses a nested chat between an AssistantAgent and a UserProxyAgent with human_input_mode="NEVER".

cc @qingyun-wu

@sonichi sonichi added the function/tool suggestion and execution of function/tool call label Mar 31, 2024
@shippy
Copy link
Collaborator

shippy commented Mar 31, 2024

Doesn't this work already? I frequently use register_function with the same agent assigned to both caller and executor; the latter is a little misleadingly named, because such agent doesn't even need a code execution configuration to use said tool.

@GeorgSatyros
Copy link

GeorgSatyros commented Apr 6, 2024

It works in an unorthodox way. In your case the same agent would be selected to speak twice, given that it is the only one with access to the tool in question. So assuming the default groupchat flexibility in agent selection, it works. The moment you start dictating agent order, though, it will fail. I have implemented a workaround to this issue with Society Of Mind agents (a single agent that is composed of multiple under the hood). One SoM agent is called that under the hood calls a tool caller and then a tool executor, returning the result. I could make a PR with it, but I feel a more integrated solution to this would be preferable as SoM is experimental.

@sonichi
Copy link
Collaborator Author

sonichi commented Apr 6, 2024

@shippy it works in group chat while it takes two messages in the group chat for tool proposal and tool execution. @WebsheetPlugin probably wants to encapsulate tool proposal and execution inside one agent's inner conversation and only returns a single message of the result to the outer chat.

@GeorgSatyros thanks for sharing your experience. SoM Agent is experimental, while nested chat is in the core library. I suggest using nested chat to implement a new agent with the same functionality: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns#nested-chats
It should be very easy to use a single register_nested_chat to implement it. Would you like to give it a try?

@sonichi sonichi added the nested-chat nested chat and society of mind agent label Apr 6, 2024
@shippy
Copy link
Collaborator

shippy commented Apr 7, 2024

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@GeorgSatyros
Copy link

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@shippy I could have definitely elaborated more above, so let me fix that!
Basically, assuming you are using the speaker_selection_method parameter in groupchats and not overriding/shadowing the speaker_selection method itself, then the system should work.
But the core of the problem is this: The system will actually, under the hood, not follow your defined transitions. If you want agent_A->tool_agent_B->agent_C, the actual order will be agent_A->tool_agent_B->tool_agent_B->agent_C. This could be a non-issue for many projects that have looser requirements on agent order, but for other projects it will be a problem. It is also a relatively unintuitive and inexplicit pattern, as is evident on how one may disallow agents speaking twice through the allow_repeat_speaker flag and the system will ignore that when it comes to tool execution. Here's the relevant code snippet for the above as reference: snippet

@sonichi Sure, I was looking for an excuse to dive deeper into nested chats anyway! Is there any planned support for SoM going forward or will the "agent composed of agents" niche be fulfilled by nested chats? Asking because I was considering contributing to SoM and that effort may be better spent on the core component instead.

@ChristianWeyer
Copy link
Collaborator

@shippy it works in group chat while it takes two messages in the group chat for tool proposal and tool execution. @WebsheetPlugin probably wants to encapsulate tool proposal and execution inside one agent's inner conversation and only returns a single message of the result to the outer chat.

@GeorgSatyros thanks for sharing your experience. SoM Agent is experimental, while nested chat is in the core library. I suggest using nested chat to implement a new agent with the same functionality: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns#nested-chats It should be very easy to use a single register_nested_chat to implement it. Would you like to give it a try?

Would be great to have a sample for this 👍🏼.

@ChristianWeyer
Copy link
Collaborator

@shippy it works in group chat while it takes two messages in the group chat for tool proposal and tool execution. @WebsheetPlugin probably wants to encapsulate tool proposal and execution inside one agent's inner conversation and only returns a single message of the result to the outer chat.
@GeorgSatyros thanks for sharing your experience. SoM Agent is experimental, while nested chat is in the core library. I suggest using nested chat to implement a new agent with the same functionality: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns#nested-chats It should be very easy to use a single register_nested_chat to implement it. Would you like to give it a try?

Would be great to have a sample for this 👍🏼.

I guess we already have it ☺️
https://microsoft.github.io/autogen/docs/notebooks/agentchat_nested_chats_chess

@sonichi
Copy link
Collaborator Author

sonichi commented Apr 15, 2024

@GeorgSatyros it'll be great if you could make a PR to reimplement the SoM agent using nested chat. It'll be easier to maintain. The current SoM agent can retire after feature parity.

@GeorgSatyros
Copy link

@sonichi I agree, that would be a more graceful solution than deprecation. Will be opening a PR with a solution as soon as my schedule allows!

@WebsheetPlugin
Copy link
Collaborator

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@shippy I could have definitely elaborated more above, so let me fix that! Basically, assuming you are using the speaker_selection_method parameter in group chats and not overriding/shadowing the speaker_selection method itself, then the system should work. But the core of the problem is this: The system will actually, under the hood, not follow your defined transitions. If you want agent_A->tool_agent_B->agent_C, the actual order will be agent_A->tool_agent_B->tool_agent_B->agent_C. This could be a non-issue for many projects that have looser requirements on agent order, but for other projects, it will be a problem. It is also a relatively unintuitive and inexplicit pattern, as is evident in how one may disallow agents speaking twice through the allow_repeat_speaker flag and the system will ignore that when it comes to tool execution. Here's the relevant code snippet for the above as reference: snippet

@sonichi Sure, I was looking for an excuse to dive deeper into nested chats anyway! Is there any planned support for SoM going forward or will the "agent composed of agents" niche be fulfilled by nested chats? Asking because I was considering contributing to SoM and that effort may be better spent on the core component instead.

Yes, I use speaker selection to decide the order of Agents.
And for me, it seemed unintuitive to pass a round here just to execute the tool by the same agent again. As I had some logic to already "select" the next agent in order. So my solution was that if the last message is tool selection just allow the same agent again, which had the tool attached.

But again, for me, all this seemed unintuitive. And it's still not clear to me why another agent should execute the tool. Or why it should not be executed if selected.

Is there a case where you want to select a tool by Agent A and execute it by Agent B or C or not execute it at all?

@sonichi
Copy link
Collaborator Author

sonichi commented Apr 18, 2024

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@shippy I could have definitely elaborated more above, so let me fix that! Basically, assuming you are using the speaker_selection_method parameter in group chats and not overriding/shadowing the speaker_selection method itself, then the system should work. But the core of the problem is this: The system will actually, under the hood, not follow your defined transitions. If you want agent_A->tool_agent_B->agent_C, the actual order will be agent_A->tool_agent_B->tool_agent_B->agent_C. This could be a non-issue for many projects that have looser requirements on agent order, but for other projects, it will be a problem. It is also a relatively unintuitive and inexplicit pattern, as is evident in how one may disallow agents speaking twice through the allow_repeat_speaker flag and the system will ignore that when it comes to tool execution. Here's the relevant code snippet for the above as reference: snippet
@sonichi Sure, I was looking for an excuse to dive deeper into nested chats anyway! Is there any planned support for SoM going forward or will the "agent composed of agents" niche be fulfilled by nested chats? Asking because I was considering contributing to SoM and that effort may be better spent on the core component instead.

Yes, I use speaker selection to decide the order of Agents. And for me, it seemed unintuitive to pass a round here just to execute the tool by the same agent again. As I had some logic to already "select" the next agent in order. So my solution was that if the last message is tool selection just allow the same agent again, which had the tool attached.

But again, for me, all this seemed unintuitive. And it's still not clear to me why another agent should execute the tool. Or why it should not be executed if selected.

Is there a case where you want to select a tool by Agent A and execute it by Agent B or C or not execute it at all?

For example, it makes it possible for Agent B to perform extra conversations with other agents or humans before executing.

@WebsheetPlugin
Copy link
Collaborator

Ahhh, now I got it. I am acutaly counting each token twice :) so this would be not my use case, but finaly I understand it. Makes sense.

Just an observation. I feel that Autogen is pivoted for use cases where many tokens are beings used, and it's kinda leaving out more simple use cases like the one mentioned above.

@sonichi
Copy link
Collaborator Author

sonichi commented Apr 18, 2024

Ahhh, now I got it. I am acutaly counting each token twice :) so this would be not my use case, but finaly I understand it. Makes sense.

Just an observation. I feel that Autogen is pivoted for use cases where many tokens are beings used, and it's kinda leaving out more simple use cases like the one mentioned above.

I think the self-executing agent based on nested chat is what you need. What do you think?

@ChristianWeyer
Copy link
Collaborator

Ahhh, now I got it. I am acutaly counting each token twice :) so this would be not my use case, but finaly I understand it. Makes sense.
Just an observation. I feel that Autogen is pivoted for use cases where many tokens are beings used, and it's kinda leaving out more simple use cases like the one mentioned above.

I think the self-executing agent based on nested chat is what you need. What do you think?

Can you point us to docs or a sample for this @sonichi ? Thx.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 20, 2024

@ChristianWeyer
Copy link
Collaborator

@ChristianWeyer see https://microsoft.github.io/autogen/docs/notebooks/agentchat_nested_chats_chess

OK, the same I linked to above 🤓 - thx!

@sonichi
Copy link
Collaborator Author

sonichi commented Apr 21, 2024

Maybe we can make a special agent class that does self-execution using nested chat out of the box.

@scruffynerf
Copy link

scruffynerf commented Jun 19, 2024

by default, tools are sent onward to be processed. The alternative (non nested) is to catch the tools calls before send, and run them, add results back to the original message. Nest chat is cleaner, but not the minimal case.

The only negative is if you want the agent to loop: invoke tool, react to tool results, invoke tool, react to tool results...
and I'd be cautious of doing that without a way for some other agent/user to have a say in there or risk runaway agent.

But 'invoke tool, add tool results to message that invoked them (clearing tool calls as processed from message so next agent won't redo them), and let someone else have a turn, that could the true minimal 'self-tool-executing' agent.

@GeorgSatyros
Copy link

@scruffynerf that is exactly what I'm trying to do at the moment. Quite easy if you are only interested in the tool output being within the "context" of the response. But adding and executing proper "tool_calls" into the message is much trickier as openai effectively requires 2 messages in the chat history per tool execution. As such you would need an agent that injects multiple messages in the conversation per call, and so they cannot be included in a "send". That is probably why a nested chat may be the more graceful solution in that case, where two agents are doing the execution under the hood instead.
I am still working on a good, generic solution to this as it has been a quite persistent thorn in my team's side.

@scruffynerf
Copy link

scruffynerf commented Jun 20, 2024

that is exactly what I'm trying to do at the moment. Quite easy if you are only interested in the tool output being within the "context" of the response. But adding and executing proper "tool_calls" into the message is much trickier as openai effectively requires 2 messages in the chat history per tool execution. As such you would need an agent that injects multiple messages in the conversation per call, and so they cannot be included in a "send". That is probably why a nested chat may be the more graceful solution in that case, where two agents are doing the execution under the hood instead. I am still working on a good, generic solution to this as it has been a quite persistent thorn in my team's side.

The method I'm using for 'toolsfortoolless' (adding tool response processing in pre/post API for models/services that don't support tool_calls) can be used for this, if it fits your use case. Unsure.

#2966 (comment)
is my flow of the parts (I haven't committed my PR yet, soon I hope, still tweaking it)

In it, I hook 'process_message_before_send' to add the tool_calls. You could do the same to process the tool calls, that is to self_execute them, and remove the tool_call so nobody else sees it.
Yes, you then have multiple messages (the content if any, and results of tool calls), and then you can merge the results into ONE complete message (which I do in the fix_messages part of the flow, and which might need to just be self-standing to allow other LLMs in chat to handle the message flow in their templates).

While OpenAI api has trained in a 'tool call' OR 'content' behavior, they admit it's not actually a hardcoded OR, just trained in. Officially the spec does allows for both at once, I linked to a discussion saying so elsewhere. So you could even prompt it, to do something like this:

LLM chatting here, explaining and rambling on why and what it will do...
[tool_call id or other way to know where tool call results belong, generated by LLM itself]
and then when important thing happened, that date matters:
[Insert tool use: searchtool(argument'date of important thing')]
etc etc etc.

or maybe you want 2 known tool_calls merged:
example prompt '
call the 'next free date' tool' to get an available date, and reserve it.
also use tool that books catered lunches, but set the date for 1/1/1970'

and then you get two tool_calls, and process the next free date function, which returns
{'next unscheduled day':"7/1/2024", 'Status': "now reserved"}
and catch the second tool call, dump in the date, and THEN process it, and now you have a lunch delivery on 7/1/2024. You delete the tool_calls, and insert text into the final message before send to indicate the results

Next free date: 7/1/2024
Lunch booked for 7/1/2024, confirmation number XJ234

And THAT text would be what everyone including the calling LLM would see, and it would look like that LLM did as asked. Seamlessly self-executed.

That would have been at least 2-3 LLM calls:
request for the asking to reserve date and book a lunch:
reply to use reservation tool
request back with date now reserved (tool process)
reply back to use lunch tool with date
(optional but likely) request back to confirm lunch is scheduled (tool process)
reply back "Ok, on 7/1/2024, you are now booked and lunch will be delivered"

As I said, NOT ideal, but it might be a huge saving if you could reduce LLM calls in half, or more, for very fixed processes you can predict and merge together without the LLM managing it.

not ideal, the call/response method of even a nested chat with a LLM-less tool executor proxy allows the LLM to take the results and pretty it up, but yes, it's extra calls back and forth. If you don't need that, you can shortcircuit it.

@scruffynerf
Copy link

I may even add an option to add 'self-executing' for toolsfortoolless, just to see how it works.

@scruffynerf
Copy link

I may even add an option to add 'self-executing' for toolsfortoolless, just to see how it works.

Actually, mulling it over, making this a separate capability makes more sense. It's far easier to add 3-4 capabilities than have 1 with stuff you don't want, even if they all have flags to disable them

@CallumMcMahon
Copy link

CallumMcMahon commented Jul 7, 2024

I've been trying the state transitions feature and wanted to keep the graph as simple as possible, keeping possible transitions to 1 for much of the graph. My thoughts are

  • avoiding encoding the possibility of calling the same agent twice in the graph (which I believe would require an LLM call each time to decide how to proceed, and make the transition llm-dependent so unreliable.
  • add an extra "tool execution" node after each tool calling agent. This works but would mean programatically augmenting the transition graph, and make it harder to follow where you are in the transition graph.

I came up with this solution

# where applicable, make the same agent able to both invoke and execute the same function
agent.register_for_execution()(function)
agent.register_for_llm()(function)

def state_transition(last_speaker: Agent, groupchat: GroupChat):
    messages = groupchat.messages
    if "tool_calls" in messages[-1]:
        called = messages[-1]["tool_calls"][0]["function"]["name"]
        if called in last_speaker.function_map:
            return last_speaker
    return "auto"

groupchat = GroupChat(
    ...
    allowed_or_disallowed_speaker_transitions=allowed_transitions,
    speaker_transitions_type="allowed",
    speaker_selection_method=state_transition,
)

It's

  • not a solution for the simple agent case, leveraging the group chat system
  • hard-codes the agent selection to a specific value (in this case, "auto")
  • could check for both tool_calls and function_call modes

I'm new to autogen, but would be hesitant to try nested chats given the big jump in complexity managing state for web-apps. This solution works well for me, keeping complexity low on both function calling and managing messages.
Let me know if I'm missing anything obvious! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
function/tool suggestion and execution of function/tool call nested-chat nested chat and society of mind agent
Projects
None yet
Development

No branches or pull requests

8 participants