Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define client/agent protocol #1652

Closed
3 tasks done
clementguidi opened this issue Mar 20, 2023 · 11 comments · Fixed by #1665
Closed
3 tasks done

Define client/agent protocol #1652

clementguidi opened this issue Mar 20, 2023 · 11 comments · Fixed by #1665
Milestone

Comments

@clementguidi
Copy link
Contributor

clementguidi commented Mar 20, 2023

Let's define the communication protocol between the client and the agent.

From previous discussions (see #1643), and upcoming pull requests (#1644, #1645, #1646, #1647 ATM), we can outline the following points.

Context

When the agent is started (--agent/-g) in libmcount, it can receive messages from a client at runtime. We're introducing new features in the agent, so it can modify tracing parameters e.g. tracing state (on/off), tracing depth and triggers, but also patch and unpatch the running process.

Connection handling

Currently, the agent can sequentially handle multiple client connections. The next one is processed when the current one is terminated.

Some operations like patching are heavy and require some time. Any incoming connection will hang until heavy operations are achieved.

The agent will support many independent features e.g. changing tracing depth and patching, which have no reason to be done in a particular order.

  • Do we need to handle multiple concurrent connections? → Not for now

Protocol phases

The aim of the message exchange is to ultimately forward options from a client to the agent. To make this robust, we introduce control phases as follows.

Handshake phase (once)

The handshake phase initializes the connection.

  1. The client checks is the agent is ready and operational
  2. The client checks the agent's capabilities (supported features)

If the handshake fails, the client aborts.

Data exchange phase (repeated)

When the handshake succeeds, the client and the agent can exchange data.

The client is aware of the capabilities of the agent.

Client to agent

The client can instruct the agent to apply user options, as mentioned earlier.

The client skips actions that the agent is known not to support.

Agent to client

The client can fetch info from the agent. The agent can send requested info about its state (e.g. parameter values) to the agent.

Validation phase (repeated)

After exchanging data, the client checks the status of the agent. The agent indicates whether it succeeded or failed, giving a reason in case of failure (error code).

Internal definition

Message type

We use an enumeration to define the agent messages types.

  • Do we need to define a structure that is holding agent data? → No, use simple data types - e.g. integer for depth or string for filter

Agent capabilities

The agent supports given features. Currently they are defined as bit-shifted flags.

You suggested to return a list of supported features. I am not sure what you mean regarding the list.

  • How do we define agent features? → Use bit-shifted flags, even if it limits the number of values. Easy to set and check

For example, a capability query would return

  1. FEATURE1 | FEATURE2 | ... if we use bit-shifted flags
  2. {FEATURE1} -> {FEATURE2} -> ... if we use a list

Naming

We use the UFTRACE_AGENT_MSG_ prefix for messages.

This is similar to the following definition:

enum uftrace_msg_type {
    UFTRACE_MSG_REC_START = 1,
    UFTRACE_MSG_REC_END,
    [...]
};

Using abbreviations to make names shorter is confusion IMO.

"Each message should handle variable-length data then."

Message types

Protocol control

We define two messages to handle the state of the protocol:

  1. A handshake message which queries agent capabilities
  2. A close message which terminates the connection

State query

We define a GET_OPT message for the client to fetch info about the agent.

It holds an option name as data. The response holds the value as data.

Option forwarding

Similarly, we define a SET_OPT message to apply user options in the agent.

The agent supports multiple options and related data types. Partial definition of supported options:

struct uftrace_opts {
    char *trigger;
    char *args;
    char *patch;
    int depth;
    uint64_t threshold;
    enum uftrace_pattern_type patt_type;
    enum uftrace_trace_state trace;
};

The options are sent as the data load of the SET_OPT message.

Error checking

After every operation, the agent returns a status message. It sets an error code depending on the situation:

  • success: the operation succeeded
  • failure: the agent encountered an error
  • not supported: the requested action is not applicable

If the agent doesn't answer, it is up to the client to determine when to abort its operation. For example we can use a timeout threshold.

Failure error codes

We can have multiple failure error codes, for example:

  • invalid input
  • timeout
  • blocked concurrent access
  • runtime error
  • unknown error

Definition

We ues the following definition for message types:

enum uftrace_agent_msg {
    UFTRACE_AGENT_MSG_CLOSE, /* close the connection */
    UFTRACE_AGENT_MSG_QUERY, /* query supported features/options */
    UFTRACE_AGENT_MSG_GET_OPT, /* get current option value */
    UFTRACE_AGENT_MSG_SET_OPT, /* set new option value */
    UFTRACE_AGENT_MSG_OK, /* ack previous message */
    UFTRACE_AGENT_MSG_ERR, /* signal error on previous message */
};

Sequence diagram

      client                              agent
-------------------            ----------------------------
QUERY []            --->
                          <--- OK [features (DEPTH | TIME)]

SET_OPT [DEPTH, 3]  --->
                          <--- OK []

SET_OPT [SIZE, 100] --->
                          <--- ERR [NOT SUPPORTED]

CLOSE []            --->
                          <--- OK []

Debug info

Agent output

Level 1

  • Global state changes:
    • tracing is toggled
    • functions are (un)patched

Level 2

  • Client connection status
  • Failures
  • Parameter change (e.g. tracing depth)

Level 3

  • Verbose input received from the client

Client output

Level 1

  • Features supported by the agent
  • Failures

Level 2

  • Connection status

Level 3

  • Verbose data send to the agent
@namhyung
Copy link
Owner

Do we need to handle multiple concurrent connections?

I don't think we need it, at least for now.

@namhyung
Copy link
Owner

Do we need to define a structure that is holding agent data?

Maybe.. but it could be a simple data type depending on the message or option. For example, setting depth filter needs a integer for the depth, while setting function name filter needs a string for the name.

@namhyung
Copy link
Owner

You suggested to return a list of supported features. I am not sure what you mean regarding the list.

It doesn't need to be a list. It could be a bit-shifted flags.. then it'd be limited by the number of bits. It could an array.. then you also need to pass the length.

@namhyung
Copy link
Owner

I'm not sure the handshake is needed for every message exchange. I don't think we need to maintain a connection state.

I think we can have a simple exchange flow like below:

     client                                         agent
------------------                          --------------------
QUERY []              --->
                                    <---    OK [features (DEPTH, TIME)]

SET_OPT [DEPTH, 3]    --->
                                    <---    OK []

SET_OPT [SIZE, 100]   --->
                                    <---    ERR [NOT SUPPORTED]

@clementguidi
Copy link
Contributor Author

Do we need to define a structure that is holding agent data?

Maybe.. but it could be a simple data type depending on the message or option. For example, setting depth filter needs a integer for the depth, while setting function name filter needs a string for the name.

Agree, this keeps things simple. That's the solution I'm currently using and it works well.

@clementguidi
Copy link
Contributor Author

You suggested to return a list of supported features. I am not sure what you mean regarding the list.

It doesn't need to be a list. It could be a bit-shifted flags.. then it'd be limited by the number of bits. It could an array.. then you also need to pass the length.

I like the bit-shifted flags because they are easy to understand and to manipulate. We already use this for enum trigger_flag.

@clementguidi
Copy link
Contributor Author

I'm not sure the handshake is needed for every message exchange. I don't think we need to maintain a connection state.

Sure, my explanation wasn't clear. The handshake only happens once, when initializing the connection. I do agree with your proposed diagram.

@namhyung
Copy link
Owner

Sounds good. I guess you're gonna update the existing PRs according to this, right?

@clementguidi
Copy link
Contributor Author

Of course. I'm updating #1643 first, and when it's done I'll rebase the others.

@clementguidi
Copy link
Contributor Author

Hi @namhyung,

I have a few questions. Is it ok to use struct uftrace_msg to send messages to the agent? It is used for pipe communication in recv and record commands. The definition is:

/* msg format for communicating by pipe */
struct uftrace_msg {
	unsigned short magic; /* UFTRACE_MSG_MAGIC */
	unsigned short type; /* UFTRACE_MSG_* */
	unsigned int len;
	unsigned char data[];
};

Then I would directly add agent messages to enum uftrace_msg_type:

enum uftrace_msg_type {
	UFTRACE_MSG_REC_START = 1,
	UFTRACE_MSG_REC_END,
	[...]

	UFTRACE_MSG_SEND_START = 100,
	UFTRACE_MSG_SEND_DIR_NAME,
	[...]

+	UFTRACE_MSG_AGENT_CLOSE = 200,
+	UFTRACE_MSG_AGENT_QUERY,
+	UFTRACE_MSG_AGENT_GET_OPT,
+ 	UFTRACE_MSG_AGENT_SET_OPT,
+	UFTRACE_MSG_AGENT_OK,
+	UFTRACE_MSG_AGENT_ERR,
};

Second thing, I need a simple way to tell the agent which option to apply. For example in SET_OPT [DEPTH 3], I cannot rely on strings to send the DEPTH info because I would have to have it at the end of the message, so I wouldn't know how to read the 3. Are you OK with having an enum of the options? Like

enum uftrace_agent_option {
	UFTRACE_AGENT_OPT_DEPTH,
	UFTRACE_AGENT_OPT_TIME,
	UFTRACE_AGENT_OPT_FILTER,
	[...]
};

Or maybe I can use enum uftrace_short_options that getopt uses?

@namhyung
Copy link
Owner

Using and extending the existing uftrace_msg format looks good. I'm ok with having a new enum for agent options. I also thought about the short options but I think it'd be better to split since we don't support all options for agent and the value might be changed later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants