Define client/agent protocol #1652

clementguidi · 2023-03-20T13:59:24Z

Let's define the communication protocol between the client and the agent.

From previous discussions (see #1643), and upcoming pull requests (#1644, #1645, #1646, #1647 ATM), we can outline the following points.

Context
- Connection handling
Protocol phases
Internal definition
Message types
Debug info
- Agent output
- Client output

Context

When the agent is started (--agent/-g) in libmcount, it can receive messages from a client at runtime. We're introducing new features in the agent, so it can modify tracing parameters e.g. tracing state (on/off), tracing depth and triggers, but also patch and unpatch the running process.

Connection handling

Currently, the agent can sequentially handle multiple client connections. The next one is processed when the current one is terminated.

Some operations like patching are heavy and require some time. Any incoming connection will hang until heavy operations are achieved.

The agent will support many independent features e.g. changing tracing depth and patching, which have no reason to be done in a particular order.

Do we need to handle multiple concurrent connections? → Not for now

Protocol phases

The aim of the message exchange is to ultimately forward options from a client to the agent. To make this robust, we introduce control phases as follows.

Handshake phase (once)

The handshake phase initializes the connection.

The client checks is the agent is ready and operational
The client checks the agent's capabilities (supported features)

If the handshake fails, the client aborts.

Data exchange phase (repeated)

When the handshake succeeds, the client and the agent can exchange data.

The client is aware of the capabilities of the agent.

Client to agent

The client can instruct the agent to apply user options, as mentioned earlier.

The client skips actions that the agent is known not to support.

Agent to client

The client can fetch info from the agent. The agent can send requested info about its state (e.g. parameter values) to the agent.

Validation phase (repeated)

After exchanging data, the client checks the status of the agent. The agent indicates whether it succeeded or failed, giving a reason in case of failure (error code).

Internal definition

Message type

We use an enumeration to define the agent messages types.

Do we need to define a structure that is holding agent data? → No, use simple data types - e.g. integer for depth or string for filter

Agent capabilities

The agent supports given features. Currently they are defined as bit-shifted flags.

You suggested to return a list of supported features. I am not sure what you mean regarding the list.

How do we define agent features? → Use bit-shifted flags, even if it limits the number of values. Easy to set and check

For example, a capability query would return

FEATURE1 | FEATURE2 | ... if we use bit-shifted flags
{FEATURE1} -> {FEATURE2} -> ... if we use a list

Naming

We use the UFTRACE_AGENT_MSG_ prefix for messages.

This is similar to the following definition:

enum uftrace_msg_type {
    UFTRACE_MSG_REC_START = 1,
    UFTRACE_MSG_REC_END,
    [...]
};

Using abbreviations to make names shorter is confusion IMO.

"Each message should handle variable-length data then."

Message types

Protocol control

We define two messages to handle the state of the protocol:

A handshake message which queries agent capabilities
A close message which terminates the connection

State query

We define a GET_OPT message for the client to fetch info about the agent.

It holds an option name as data. The response holds the value as data.

Option forwarding

Similarly, we define a SET_OPT message to apply user options in the agent.

The agent supports multiple options and related data types. Partial definition of supported options:

struct uftrace_opts {
    char *trigger;
    char *args;
    char *patch;
    int depth;
    uint64_t threshold;
    enum uftrace_pattern_type patt_type;
    enum uftrace_trace_state trace;
};

The options are sent as the data load of the SET_OPT message.

Error checking

After every operation, the agent returns a status message. It sets an error code depending on the situation:

success: the operation succeeded
failure: the agent encountered an error
not supported: the requested action is not applicable

If the agent doesn't answer, it is up to the client to determine when to abort its operation. For example we can use a timeout threshold.

Failure error codes

We can have multiple failure error codes, for example:

invalid input
timeout
blocked concurrent access
runtime error
unknown error

Definition

We ues the following definition for message types:

enum uftrace_agent_msg {
    UFTRACE_AGENT_MSG_CLOSE, /* close the connection */
    UFTRACE_AGENT_MSG_QUERY, /* query supported features/options */
    UFTRACE_AGENT_MSG_GET_OPT, /* get current option value */
    UFTRACE_AGENT_MSG_SET_OPT, /* set new option value */
    UFTRACE_AGENT_MSG_OK, /* ack previous message */
    UFTRACE_AGENT_MSG_ERR, /* signal error on previous message */
};

Sequence diagram

      client                              agent
-------------------            ----------------------------
QUERY []            --->
                          <--- OK [features (DEPTH | TIME)]

SET_OPT [DEPTH, 3]  --->
                          <--- OK []

SET_OPT [SIZE, 100] --->
                          <--- ERR [NOT SUPPORTED]

CLOSE []            --->
                          <--- OK []

Debug info

Agent output

Level 1

Global state changes:
- tracing is toggled
- functions are (un)patched

Level 2

Client connection status
Failures
Parameter change (e.g. tracing depth)

Level 3

Verbose input received from the client

Client output

Level 1

Features supported by the agent
Failures

Level 2

Connection status

Level 3

Verbose data send to the agent

The text was updated successfully, but these errors were encountered:

namhyung · 2023-03-21T06:02:33Z

Do we need to handle multiple concurrent connections?

I don't think we need it, at least for now.

namhyung · 2023-03-21T06:04:50Z

Do we need to define a structure that is holding agent data?

Maybe.. but it could be a simple data type depending on the message or option. For example, setting depth filter needs a integer for the depth, while setting function name filter needs a string for the name.

namhyung · 2023-03-21T06:07:39Z

You suggested to return a list of supported features. I am not sure what you mean regarding the list.

It doesn't need to be a list. It could be a bit-shifted flags.. then it'd be limited by the number of bits. It could an array.. then you also need to pass the length.

namhyung · 2023-03-21T06:18:02Z

I'm not sure the handshake is needed for every message exchange. I don't think we need to maintain a connection state.

I think we can have a simple exchange flow like below:

     client                                         agent
------------------                          --------------------
QUERY []              --->
                                    <---    OK [features (DEPTH, TIME)]

SET_OPT [DEPTH, 3]    --->
                                    <---    OK []

SET_OPT [SIZE, 100]   --->
                                    <---    ERR [NOT SUPPORTED]

clementguidi · 2023-03-21T12:49:36Z

Do we need to define a structure that is holding agent data?

Maybe.. but it could be a simple data type depending on the message or option. For example, setting depth filter needs a integer for the depth, while setting function name filter needs a string for the name.

Agree, this keeps things simple. That's the solution I'm currently using and it works well.

clementguidi · 2023-03-21T13:35:12Z

You suggested to return a list of supported features. I am not sure what you mean regarding the list.

It doesn't need to be a list. It could be a bit-shifted flags.. then it'd be limited by the number of bits. It could an array.. then you also need to pass the length.

I like the bit-shifted flags because they are easy to understand and to manipulate. We already use this for enum trigger_flag.

clementguidi · 2023-03-21T13:37:57Z

I'm not sure the handshake is needed for every message exchange. I don't think we need to maintain a connection state.

Sure, my explanation wasn't clear. The handshake only happens once, when initializing the connection. I do agree with your proposed diagram.

namhyung · 2023-03-23T05:45:00Z

Sounds good. I guess you're gonna update the existing PRs according to this, right?

clementguidi · 2023-03-23T13:53:45Z

Of course. I'm updating #1643 first, and when it's done I'll rebase the others.

clementguidi · 2023-03-28T19:31:31Z

Hi @namhyung,

I have a few questions. Is it ok to use struct uftrace_msg to send messages to the agent? It is used for pipe communication in recv and record commands. The definition is:

/* msg format for communicating by pipe */
struct uftrace_msg {
	unsigned short magic; /* UFTRACE_MSG_MAGIC */
	unsigned short type; /* UFTRACE_MSG_* */
	unsigned int len;
	unsigned char data[];
};

Then I would directly add agent messages to enum uftrace_msg_type:

enum uftrace_msg_type {
	UFTRACE_MSG_REC_START = 1,
	UFTRACE_MSG_REC_END,
	[...]

	UFTRACE_MSG_SEND_START = 100,
	UFTRACE_MSG_SEND_DIR_NAME,
	[...]

+	UFTRACE_MSG_AGENT_CLOSE = 200,
+	UFTRACE_MSG_AGENT_QUERY,
+	UFTRACE_MSG_AGENT_GET_OPT,
+ 	UFTRACE_MSG_AGENT_SET_OPT,
+	UFTRACE_MSG_AGENT_OK,
+	UFTRACE_MSG_AGENT_ERR,
};

Second thing, I need a simple way to tell the agent which option to apply. For example in SET_OPT [DEPTH 3], I cannot rely on strings to send the DEPTH info because I would have to have it at the end of the message, so I wouldn't know how to read the 3. Are you OK with having an enum of the options? Like

enum uftrace_agent_option {
	UFTRACE_AGENT_OPT_DEPTH,
	UFTRACE_AGENT_OPT_TIME,
	UFTRACE_AGENT_OPT_FILTER,
	[...]
};

Or maybe I can use enum uftrace_short_options that getopt uses?

namhyung · 2023-03-29T06:11:03Z

Using and extending the existing uftrace_msg format looks good. I'm ok with having a new enum for agent options. I also thought about the short options but I think it'd be better to split since we don't support all options for agent and the value might be changed later.

clementguidi mentioned this issue Mar 31, 2023

Implement client/agent protocol #1665

Merged

namhyung closed this as completed in #1665 Apr 12, 2023

namhyung added this to the v0.14 milestone Jun 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define client/agent protocol #1652

Define client/agent protocol #1652

clementguidi commented Mar 20, 2023 •

edited

namhyung commented Mar 21, 2023

namhyung commented Mar 21, 2023

namhyung commented Mar 21, 2023

namhyung commented Mar 21, 2023

clementguidi commented Mar 21, 2023

clementguidi commented Mar 21, 2023

clementguidi commented Mar 21, 2023

namhyung commented Mar 23, 2023

clementguidi commented Mar 23, 2023

clementguidi commented Mar 28, 2023

namhyung commented Mar 29, 2023

Define client/agent protocol #1652

Define client/agent protocol #1652

Comments

clementguidi commented Mar 20, 2023 • edited

Context

Connection handling

Protocol phases

Handshake phase (once)

Data exchange phase (repeated)

Client to agent

Agent to client

Validation phase (repeated)

Internal definition

Message type

Agent capabilities

Naming

Message types

Protocol control

State query

Option forwarding

Error checking

Failure error codes

Definition

Sequence diagram

Debug info

Agent output

Level 1

Level 2

Level 3

Client output

Level 1

Level 2

Level 3

namhyung commented Mar 21, 2023

namhyung commented Mar 21, 2023

namhyung commented Mar 21, 2023

namhyung commented Mar 21, 2023

clementguidi commented Mar 21, 2023

clementguidi commented Mar 21, 2023

clementguidi commented Mar 21, 2023

namhyung commented Mar 23, 2023

clementguidi commented Mar 23, 2023

clementguidi commented Mar 28, 2023

namhyung commented Mar 29, 2023

clementguidi commented Mar 20, 2023 •

edited