Skip to content

Conversation

@glbrntt
Copy link
Collaborator

@glbrntt glbrntt commented Oct 6, 2021

Motivation:

When an RPC is started using the connection pool, it waits for a
connection to come up and a stream to be created. It may also time out
waiting for the connection to come up, this may be because the pool is
genuinely too busy to handle the RPC or there is an underlying problem
with the connection.

In the latter case it's not possible for the RPC to see the underlying
reason that it timed out. This makes issue diagnosis more difficult than
necessary when there root cause is at the connection level (e.g. due to TLS
configuration) as important but infrequent connection-level errors are
lost in the noise of RPC-level timeouts.

Modifications:

  • Give connection pools a notion of a 'last known error', the most
    recent connection-level error to occur on a connection managed by the
    pool. The error is cleared as soon as any connection within the pool
    becomes available. This error is added to existing waiter errors to
    provide more context.
  • Add a '_ConnectivityState' mirroring the public 'ConnectivityState'
    but which holds 'Error' as associated data in some cases.
  • Wire this through the ConnectionManager to the ConnectionPool

Result:

Connection level errors are more obvious.

Motivation:

When an RPC is started using the connection pool, it waits for a
connection to come up and a stream to be created. It may also time out
waiting for the connection to come up, this may be because the pool is
genuinely too busy to handle the RPC or there is an underlying problem
with the connection.

In the latter case it's not possible for the RPC to see the underlying
reason that it timed out. This makes issue diagnosis more difficult than
necessary when there root cause is at the connection level (e.g. due to TLS
configuration) as important but infrequent connection-level errors are
lost in the noise of RPC-level timeouts.

Modifications:

- Give connection pools a notion of a 'last known error', the most
  recent connection-level error to occur on a connection managed by the
  pool. The error is cleared as soon as any connection within the pool
  becomes available. This error is added to existing waiter errors to
  provide more context.
- Add a '_ConnectivityState' mirroring the public 'ConnectivityState'
  but which holds 'Error' as associated data in some cases.
- Wire this through the `ConnectionManager` to the `ConnectionPool`

Result:

Connection level errors are more obvious.
@glbrntt glbrntt added the 🔨 semver/patch No public API change. label Oct 6, 2021
@glbrntt glbrntt requested a review from Lukasa October 6, 2021 11:44
Copy link
Collaborator

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment needs updating.

@glbrntt glbrntt merged commit 988ee3f into grpc:main Oct 6, 2021
@glbrntt glbrntt deleted the gb-channel-pool-waiter-error branch October 6, 2021 13:54
bimawa pushed a commit to StreamLayer/grpc-swift that referenced this pull request Nov 10, 2021
)

Motivation:

When an RPC is started using the connection pool, it waits for a
connection to come up and a stream to be created. It may also time out
waiting for the connection to come up, this may be because the pool is
genuinely too busy to handle the RPC or there is an underlying problem
with the connection.

In the latter case it's not possible for the RPC to see the underlying
reason that it timed out. This makes issue diagnosis more difficult than
necessary when there root cause is at the connection level (e.g. due to TLS
configuration) as important but infrequent connection-level errors are
lost in the noise of RPC-level timeouts.

Modifications:

- Give connection pools a notion of a 'last known error', the most
  recent connection-level error to occur on a connection managed by the
  pool. The error is cleared as soon as any connection within the pool
  becomes available. This error is added to existing waiter errors to
  provide more context.
- Add a '_ConnectivityState' mirroring the public 'ConnectivityState'
  but which holds 'Error' as associated data in some cases.
- Wire this through the `ConnectionManager` to the `ConnectionPool`

Result:

Connection level errors are more obvious.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants