Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

visibility into client-side queuing and flow control #11114

Open
benjaminp opened this issue Apr 19, 2024 · 7 comments
Open

visibility into client-side queuing and flow control #11114

benjaminp opened this issue Apr 19, 2024 · 7 comments

Comments

@benjaminp
Copy link
Contributor

Outgoing messages in a call can be internally queued by gRPC due to HTTP2 flow control. Client calls can additionally be delayed from even starting due to the underlying HTTP2 connection hitting its MAX_CONCURRENT_STREAMS limit. There should be a public API that can observe gRPC-internal queuing of outgoing messages. This could be useful when when debugging a DEADLINE_EXCEEDED RPC; it's very informative to know if the request never left the client's queues.

Basically, I'd like to revive #8945. To recapitulate the findings on that issue:

  • StreamTracer.outboundMessageSent is called after a message is sent to the transport not when the message hits the network.
  • onReady callbacks are strongly correlated with messages being released onto the network. However, unary calls do not send onReady callbacks.

One way to resolve this issue would be to change StreamTracer.outboundMessageSent to be called after the message is actually sent to the network. Alternatively a new StreamTracer callback—outboundMessageTransmitted?—could serve that purpose.

@ejona86
Copy link
Member

ejona86 commented Apr 19, 2024

This could be useful when when debugging a DEADLINE_EXCEEDED RPC

We already package up that information into the DEADLINE_EXCEEDED status message. And we have infrastructure where we could add more, if something is missing.

What are you trying to use this information for? Are you wanting code that reacts dynamically based on the results or are you wanting to gather more details for a human to debug?

@benjaminp
Copy link
Contributor Author

This could be useful when when debugging a DEADLINE_EXCEEDED RPC

We already package up that information into the DEADLINE_EXCEEDED status message.

Where in DEADLINE_EXCEEDED? I see the "delayed" transport can add information to a DEADLINE_EXCEEDED. But the flow control and MAX_CONCURRENT_STREAMS buffering I'm worried about happens within netty.

Improving the DEADLINE_EXCEEDED message is of course good. I'm also interested in obtaining this information for successful but "slow" calls and observing the buffering of outbound messages in streaming calls. I want to build a trace of all the queues a message goes through.

What are you trying to use this information for? Are you wanting code that reacts dynamically based on the results or are you wanting to gather more details for a human to debug?

At the moment, debugging and understanding performance.

(I suppose in the future, it's possible to explore automatically opening another channel if calls are blocked too long on MAX_CONCURRENT_STREAMS.)

@ejona86
Copy link
Member

ejona86 commented Apr 23, 2024

Oh, I missed you wanted messages. There's some bits you can infer from traces, but in general talking about when a message is sent complicated.

MAX_CONCURRENT_STREAMS would fit into our current stream.appendTimeoutInsight() flow for DEADLINE_EXCEEDED, and Netty can supplement it. It might be a bit hard to plumb this into Netty, and even harder to support this in the new HTTP/2 Netty API. But if we figure out the Netty part, the rest of it is easy.

We have support for OpenCensus tracing today, and we're adding OpenTelemetry tracing. There's also Perfmark tracing, which shows more about thread interactions and wouldn't be as good for watching a single RPC you were suspicious about after-the-fact. The current stuff is lacking MAX_CONCURRENT_STREAMS visibility, and flow control is hard to express.

There has been talk (and a private design) of opening new connections when hitting MAX_CONCURRENT_STREAMS.

@sergiitk sergiitk added the Waiting on reporter there was a request for more information without a response or answer or advice has been provided label Apr 25, 2024
@benjaminp
Copy link
Contributor Author

I realize the most detailed efforts would require netty plumbing or forking. But what about my suggestion in the OP to notify the stream tracer in NettyClientHandler.TransportState.onWriteFrameData? That at least would provided better information than today.

@benjaminp benjaminp changed the title visiiblity into client-side queuing and flow control visiblity into client-side queuing and flow control Apr 29, 2024
@benjaminp benjaminp changed the title visiblity into client-side queuing and flow control visibility into client-side queuing and flow control Apr 29, 2024
@sergiitk sergiitk removed the Waiting on reporter there was a request for more information without a response or answer or advice has been provided label May 2, 2024
@ejona86
Copy link
Member

ejona86 commented May 16, 2024

CC @YifeiZhuang

@YifeiZhuang
Copy link
Contributor

I think adding a StreamTracer. outboundMessageTransmitted after flow control and add to Otel Tracing seems a good idea and low hanging fruit. We probably need census from other languages. And when to support it is another question. @ejona86

@YifeiZhuang
Copy link
Contributor

We talked about this in cross language meeting and we had census that have a notification when the last bytes are written to the wire sounds fair. @benjaminp do you want to send a PR for this?

cc.@ejona

@ejona86 ejona86 added this to the Unscheduled milestone Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants