[PROPOSAL] Communication structure #474
Closed
JanPokorny
started this conversation in
Ideas
Replies: 1 comment
-
|
Given that ACP already formed some opinion on the communication, I think it's fair to close this discussion. More info available in the new docs page: https://agentcommunicationprotocol.dev/introduction/welcome. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Communication structure
Motivation
The communication with a modern LLM agent is quite complex. While the most basic use case is "send request - receive reply", we quickly run into more complex scenarios, such as:
This proposal aims to provide a general framework for communication with agents, without imposing rigid structure on how the agents are supposed to work.
Proposal
Data structures
We define these data structures:
idlestate, awaiting user message.It needs to be understood that these data structures are quite "virtual" / "abstract". They do not correspond directly to any physical communication channels, as these will always have limitations and protocol-related details, like refused messages, timeouts, etc. "Run" can be thought of as a shared/distributed data structure which is kept in sync between the two parties using some sort of communication channel (which is not defined in this proposal).
This is how we define the types:
High-level overview
The client and server (caller and agent) both know the
CommunicationSchema. It's understood that eachRunstarts with no messages and in theidlestate. Then, in theidlestate ofCommunicationSchema, it's defined what messages are valid to be sent in theidlestate -- essentially starting the conversation.A simple agent may just switch between
idleandrunningstates, but more complex agents may have many more possible states, like:done,waiting_for_user,waiting_for_function_call, etc. The idea is that at each point in the conversation, only some messages are valid -- for example when waiting for "human in the loop" confirmation, the only valid way is a message to confirm / deny the request, and not e.g. another query for the agent.Examples
Chat agent
This is a communication schema for a simple chat agent. It has two states:
idleandrunning. In theidlestate, the agent can receive a message from the user. In therunningstate, the agent generates a response and sends it back.{ "idle": [ { "party": "client", "type": "user_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "running" } ], "running": [ { "party": "server", "type": "agent_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "idle" } ] }Researcher agent
Researcher agent does not support long-lasting conversations, so after providing the reply, it transitions to an empty
donestate.{ "idle": [ { "party": "client", "type": "user_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "running" } ], "running": [ { "party": "server", "type": "agent_message", "schema": [ { "contentType": "text/plain", "required": true }, { "name": "/sources/*", "contentType": "text/x-uri", "required": false } ], "nextState": "done" } ], "done": [] }Function-calling agent
Function-calling means executing code on the client. This agent acts like a chat agent, but has a special state for function calls, which request the client to execute a function with a given arguments. The client must respond with the result of the function call in order for the agent to continue.
{ "idle": [ { "party": "client", "type": "user_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "running" } ], "running": [ { "party": "server", "type": "agent_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "idle" }, { "party": "server", "type": "function_call", "schema": [ { "name": "/function", "contentType": "application/json", "required": true } ], "nextState": "running" } ] }Interruptable chat agent
This agent can be interrupted by the user while generating a message.
{ "idle": [ { "party": "client", "type": "user_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "running" } ], "running": [ { "party": "server", "type": "agent_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "idle" }, { "party": "client", "type": "cancel", "schema": [], "nextState": "idle" } ] }Long-running agent
A long-running agent still needs to be started up by a client message, but then it can run indefinitely and report information about its progress.
{ "idle": [ { "party": "client", "type": "user_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "running" } ], "running": [ { "party": "server", "type": "agent_message", "schema": [ { "contentType": "text/plain", "required": true } ], "nextState": "running" } ] }SDK support
Given that we have defined a shared / distributed data structure, the problem of communication breaks down into synchronization of the data structure. The SDK would support this over several protocols (HTTP SSE, WebSocket, etc.), so that the caller could use the best protocol for the given situation, while the agent won't have to implement all of them. From the agent's implementation point of view, the SDK would do the heavy lifting and the agent would be directly provided with the run messages and state.
Let's discuss
To reemphasize, this proposal only provides a general communication framework over a shared "run" data structure, not a way to map messages to actual communication channels like HTTP / WebSocket etc.
Beta Was this translation helpful? Give feedback.
All reactions