Provide more context to failing RPCs from the connection pool #1292

glbrntt · 2021-10-06T11:44:41Z

Motivation:

When an RPC is started using the connection pool, it waits for a
connection to come up and a stream to be created. It may also time out
waiting for the connection to come up, this may be because the pool is
genuinely too busy to handle the RPC or there is an underlying problem
with the connection.

In the latter case it's not possible for the RPC to see the underlying
reason that it timed out. This makes issue diagnosis more difficult than
necessary when there root cause is at the connection level (e.g. due to TLS
configuration) as important but infrequent connection-level errors are
lost in the noise of RPC-level timeouts.

Modifications:

Give connection pools a notion of a 'last known error', the most
recent connection-level error to occur on a connection managed by the
pool. The error is cleared as soon as any connection within the pool
becomes available. This error is added to existing waiter errors to
provide more context.
Add a '_ConnectivityState' mirroring the public 'ConnectivityState'
but which holds 'Error' as associated data in some cases.
Wire this through the ConnectionManager to the ConnectionPool

Result:

Connection level errors are more obvious.

Motivation: When an RPC is started using the connection pool, it waits for a connection to come up and a stream to be created. It may also time out waiting for the connection to come up, this may be because the pool is genuinely too busy to handle the RPC or there is an underlying problem with the connection. In the latter case it's not possible for the RPC to see the underlying reason that it timed out. This makes issue diagnosis more difficult than necessary when there root cause is at the connection level (e.g. due to TLS configuration) as important but infrequent connection-level errors are lost in the noise of RPC-level timeouts. Modifications: - Give connection pools a notion of a 'last known error', the most recent connection-level error to occur on a connection managed by the pool. The error is cleared as soon as any connection within the pool becomes available. This error is added to existing waiter errors to provide more context. - Add a '_ConnectivityState' mirroring the public 'ConnectivityState' but which holds 'Error' as associated data in some cases. - Wire this through the `ConnectionManager` to the `ConnectionPool` Result: Connection level errors are more obvious.

Lukasa

LGTM, one comment needs updating.

Tests/GRPCTests/ConnectionPool/GRPCChannelPoolTests.swift

) Motivation: When an RPC is started using the connection pool, it waits for a connection to come up and a stream to be created. It may also time out waiting for the connection to come up, this may be because the pool is genuinely too busy to handle the RPC or there is an underlying problem with the connection. In the latter case it's not possible for the RPC to see the underlying reason that it timed out. This makes issue diagnosis more difficult than necessary when there root cause is at the connection level (e.g. due to TLS configuration) as important but infrequent connection-level errors are lost in the noise of RPC-level timeouts. Modifications: - Give connection pools a notion of a 'last known error', the most recent connection-level error to occur on a connection managed by the pool. The error is cleared as soon as any connection within the pool becomes available. This error is added to existing waiter errors to provide more context. - Add a '_ConnectivityState' mirroring the public 'ConnectivityState' but which holds 'Error' as associated data in some cases. - Wire this through the `ConnectionManager` to the `ConnectionPool` Result: Connection level errors are more obvious.

glbrntt added the 🔨 semver/patch No public API change. label Oct 6, 2021

glbrntt requested a review from Lukasa October 6, 2021 11:44

Lukasa approved these changes Oct 6, 2021

View reviewed changes

Tests/GRPCTests/ConnectionPool/GRPCChannelPoolTests.swift Outdated Show resolved Hide resolved

glbrntt commented Oct 6, 2021

View reviewed changes

Tests/GRPCTests/ConnectionPool/GRPCChannelPoolTests.swift Outdated Show resolved Hide resolved

Update Tests/GRPCTests/ConnectionPool/GRPCChannelPoolTests.swift

255149d

glbrntt merged commit 988ee3f into grpc:main Oct 6, 2021

glbrntt deleted the gb-channel-pool-waiter-error branch October 6, 2021 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide more context to failing RPCs from the connection pool #1292

Provide more context to failing RPCs from the connection pool #1292

Uh oh!

glbrntt commented Oct 6, 2021

Uh oh!

Lukasa left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Provide more context to failing RPCs from the connection pool #1292

Provide more context to failing RPCs from the connection pool #1292

Uh oh!

Conversation

glbrntt commented Oct 6, 2021

Uh oh!

Lukasa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants