Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new requirement of completing protocol handshaking before 'READY' #2406

Closed
dfawley opened this issue Oct 26, 2018 · 1 comment · Fixed by #2904
Closed

Implement new requirement of completing protocol handshaking before 'READY' #2406

dfawley opened this issue Oct 26, 2018 · 1 comment · Fixed by #2904
Assignees
Labels
fixit P1 Type: Behavior Change Behavior changes not categorized as bugs

Comments

@dfawley
Copy link
Member

dfawley commented Oct 26, 2018

Detailed description of upcoming changes:

gRPC-Go clients currently (in the 1.17 release and earlier) do not wait for the HTTP/2 handshake to complete before considering a connection ready for sending RPCs. However, the client does not consider the connection to be "successful" unless the server's handshake is received before a connection error occurs. An "unsuccessful" connection results in a backoff being performed before redialing.

This behavior will be changed during development for the 1.18 release. The new default behavior will be to wait for the server handshake before considering the connection ready for sending RPCs. This behavior can be tested in 1.17 by setting the environment variable GRPC_GO_REQUIRE_HANDSHAKE=on. (This is the current behavior of gRPC-Java.)

Setting the environment variable to "off" will cause gRPC-Go clients to not wait for the handshake before considering the connection ready. However, if a connection error occurs, backoff will not be performed and the client will attempt to connect to the first address returned by the name resolver. This can lead to DOS problems and is not recommended. (This is the current behavior of gRPC-C and was the behavior of gRPC-Go ~1 yr ago before #1648.)

During development for the 1.19 release, support for changing this behavior via the environment variable will be removed entirely. Also, the grpc.WithWaitForHandshake() DialOption (was "experimental"; now "deprecated") will be removed.

Users impacted: as far as we are aware, the only usage that may be impacted by the new behavior is cmux. cmux has a workaround for Java using MatchWithWriters to allow it to continue working in the face of this behavior.

For more information, see: grpc/grpc#17006

When security is disabled, not waiting for the HTTP/2 handshake can lead to DoS-style behavior. For details, see: #954. This requirement will incur an extra half-RTT latency before the first RPC can be sent under plaintext, but this is negligible and unencrypted connections are rarer than secure ones.

Under TLS, the server will effectively send its part of the HTTP/2 handshake along with its final TLS "server finished" message, which the client must wait for before transmitting any data securely. This means virtually no extra latency is incurred by this requirement.

Go had attempted to separate "connection ready" with "connection successful" (Issue: #1444 PR: #1648). However, this is confusing to users and introduces an arbitrary distinction between these two events. It has led to several bugs in our reconnection logic (e.g.s #2380, #2391, #2392), due to the complexity, and it makes custom transports (grpc/proposal#103) more difficult for users to implement.

We are aware of some use cases (in particular, soheilhy/cmux) expecting the behavior of transmitting an RPC before the HTTP/2 handshake is completed. Before making behavior changes to implement this, we will reach out to our users to the best of our abilities.

@dfawley
Copy link
Member Author

dfawley commented Feb 26, 2019

Minor update:

During development for the 1.19 release, support for changing this behavior via the environment variable will be removed entirely. Also, the grpc.WithWaitForHandshake() DialOption (was "experimental"; now "deprecated") will be removed.

This will be removed shortly, but not in the 1.19 release.

@menghanl menghanl added the fixit label Jun 5, 2019
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jul 25, 2019
This PR upgrades gRPC from 1.13.0 to 1.21.2. The primary motivation for this
upgrade is to eliminate the disconnections caused by
grpc/grpc-go#1882. These failures manifest themselves
as the following set of errors:

```
ajwerner-test-0001> I190722 22:15:01.203008 12054 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322  [n1] circuitbreaker: rpc [::]:26257 [n2] tripped: failed to check for ready connection to n2 at ajwerner-test-0002:26257: connection not ready: TRANSIENT_FAILURE
```
Which then lead to tripped breakers and general badness. I suspect that there
are several other good bug fixes in here, including some purported leaks and
correctness fixes on shutdown.

I have verified that with this upgrade I no longer see connections break in
overload scenarios which reliably reproduced the situation in the above log.

This commit removes one condition from grpcutil.IsClosedConnection which should
be subsumed by the status check above. The `transport` subpackage has not been
around for many releases.

This does not upgrade to the current release 1.22.0 because the maintainer
mentions that it contains a bug
(grpc/grpc-go#2663 (comment)).

This change also unfortunately updates the keepalive behavior to be more spec
compliant (grpc/grpc-go#2642). This change mandates
a minimum ping time of 10s to the client. Given grpc/grpc-go#2638
this means that the rpc test for keepalives now takes over 20s.

I would be okay skipping it but leave that discussion for review.

Also updated the acceptance test to look out for an HTTP/2.0 header because
grpc now does not send RPCs until after the HTTP handshake has completed
(see grpc/grpc-go#2406).

Release note (bug fix): Upgrade grpc library to fix connection state management
bug.
@dfawley dfawley reopened this Aug 1, 2019
@dfawley dfawley closed this as completed Aug 1, 2019
csweichel pushed a commit to csweichel/jaeger that referenced this issue Aug 23, 2019
csweichel pushed a commit to csweichel/jaeger that referenced this issue Aug 23, 2019
breaking change: grpc/grpc-go#2406
workaround described in:
- soheilhy/cmux#64
- https://github.com/soheilhy/cmux#limitations

Signed-off-by: Christian Weichel <christian.weichel@typefox.io>
csweichel pushed a commit to csweichel/jaeger that referenced this issue Aug 25, 2019
breaking change: grpc/grpc-go#2406
workaround described in:
- soheilhy/cmux#64
- https://github.com/soheilhy/cmux#limitations

Signed-off-by: Christian Weichel <christian.weichel@typefox.io>
yurishkuro pushed a commit to yurishkuro/jaeger that referenced this issue Aug 25, 2019
breaking change: grpc/grpc-go#2406
workaround described in:
- soheilhy/cmux#64
- https://github.com/soheilhy/cmux#limitations

Signed-off-by: Christian Weichel <christian.weichel@typefox.io>
pavolloffay pushed a commit to jaegertracing/jaeger that referenced this issue Sep 2, 2019
* Add unit test for gRPC over cmux

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Fix tests

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Fix gRPC query service cmux

breaking change: grpc/grpc-go#2406
workaround described in:
- soheilhy/cmux#64
- https://github.com/soheilhy/cmux#limitations

Signed-off-by: Christian Weichel <christian.weichel@typefox.io>

* Fix asertions

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Use DialContext

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Clean-up timeouts

Signed-off-by: Yuri Shkuro <ys@uber.com>
bookmoons pushed a commit to bookmoons/jaeger that referenced this issue Sep 10, 2019
* Add unit test for gRPC over cmux

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Fix tests

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Fix gRPC query service cmux

breaking change: grpc/grpc-go#2406
workaround described in:
- soheilhy/cmux#64
- https://github.com/soheilhy/cmux#limitations

Signed-off-by: Christian Weichel <christian.weichel@typefox.io>

* Fix asertions

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Use DialContext

Signed-off-by: Yuri Shkuro <ys@uber.com>

* Clean-up timeouts

Signed-off-by: Yuri Shkuro <ys@uber.com>
Allda added a commit to Allda/clair that referenced this issue Nov 27, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Nov 28, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Nov 28, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Nov 29, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Dec 4, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Dec 9, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Dec 18, 2019
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Jan 20, 2020
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
Allda added a commit to Allda/clair that referenced this issue Jan 20, 2020
The new plugin requires dependency which requites newer version of grpc.

The cmux source was also updated because old version contained a bug.
grpc/grpc-go#2406
@lock lock bot locked as resolved and limited conversation to collaborators Jan 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
fixit P1 Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants