Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure the HTTP2 transport to handle broken connections #189

Merged
merged 13 commits into from
Jun 21, 2021

Conversation

tabboud
Copy link
Contributor

@tabboud tabboud commented Jun 17, 2021

Before this PR

CGR configured all HTTP transports without the PING health checks, which forced the HTTP client to re-use connections that were already closed. For example the underlying TCP connection could be broken (due to connectivity or upstream issues) and individual requests would timeout due to the broken connection, however Go will not close these idle connections and the requests cannot make progress.
We observed this issue on some internal products where HTTP2 was enabled with an unlimited retries and the requests would never succeed due to the fact that the underlying connection was broken and not cleaned up.

After this PR

==COMMIT_MSG==
Configure the HTTP2 transport to handle broken/idle connections

Two timeout values were added, ReadIdleTimeout and PingTimeout, that are both interrelated.
The ReadIdleTimeout enables the ping health check every configured duration if no frame has been received on the HTTP/2 connection. The PingTimeout can be used to configure the total amount of time to wait for a ping response before closing the connection. Both of these timeout values assist in cleaning up broken or idle connections forcing the client to re-connect.
This is the current known workaround for the following Go issue:
golang/go#36026
==COMMIT_MSG==

Notes

  • The only open question I have is how this interacts with the other client timeouts and whether this takes precedence or is sustained over the course of a request.

Possible downsides?

  • This is only enabled for those using go1.16+

This change is Reviewable

@changelog-app
Copy link

changelog-app bot commented Jun 17, 2021

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Configure the HTTP2 transport to handle broken/idle connections

Two timeout values were added, ReadIdleTimeout and PingTimeout, that are both interrelated.
The ReadIdleTimeout enables the ping health check every configured duration if no frame has been received on the HTTP/2 connection. The PingTimeout can be used to configure the total amount of time to wait for a ping response before closing the connection. Both of these timeout values assist in cleaning up broken or idle connections forcing the client to re-connect.
This is the current known workaround for the following Go issue:
golang/go#36026

Check the box to generate changelog(s)

  • Generate changelog entry

@tabboud tabboud requested a review from bmoylan June 17, 2021 15:01
@najork
Copy link
Contributor

najork commented Jun 17, 2021

👍

Copy link
Contributor

@bmoylan bmoylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending additional testing/verification we want to do on this. If we trust the stdlib's docs and tests to do what they say, this may be the thing that will unblock h2 in places where we have disabled it

@tabboud tabboud changed the title Configure the HTTP2 transport with a ReadIdleTimeout Configure the HTTP2 transport to handle broken connections Jun 17, 2021
@bmoylan
Copy link
Contributor

bmoylan commented Jun 17, 2021


conjure-go-client/httpclient/config.go, line 85 at r5 (raw file):

	// HTTP2ReadIdleTimeout sets the maximum time to wait before sending periodic health checks (pings) for an HTTP/2 connection.
	// If unset, the client defaults to 30s for HTTP/2 clients.
	HTTP2ReadIdleTimeout *time.Duration `json:"http2-read-idle-timeout" yaml:"http2-read-idle-timeout"`

you'll need to convert these to params in configToParams

@bmoylan
Copy link
Contributor

bmoylan commented Jun 17, 2021


conjure-go-client/httpclient/config.go, line 295 at r6 (raw file):

		params = append(params, WithExpectContinueTimeout(*c.ExpectContinueTimeout))
	}
	if c.HTTP2ReadIdleTimeout != nil && *c.HTTP2ReadIdleTimeout != 0 {

we actually do want to set them when they are 0, right? because we are defaulting them to enabled?

@tabboud tabboud force-pushed the ta/http2-read-idle-timeout branch from 0e62859 to d17a02e Compare June 17, 2021 22:09
Copy link
Contributor Author

@tabboud tabboud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 44 files reviewed, 2 unresolved discussions (waiting on @bmoylan and @tabboud)


conjure-go-client/httpclient/config.go, line 295 at r6 (raw file):

Previously, bmoylan (Brad Moylan) wrote…

we actually do want to set them when they are 0, right? because we are defaulting them to enabled?

In this case we probably want to set them whenever its >= 0 as long as it's explicitly set. Good catch

@tabboud tabboud force-pushed the ta/http2-read-idle-timeout branch from c0e523f to 4bed9e9 Compare June 18, 2021 02:09
Comment on lines +140 to +141
require.True(t, strings.Contains(err.Error(), "use of closed network connection"))
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmoylan I believe this missing return was the reason for the flaky test since the first test case expects a failure here due to the "use of closed network" error, but we continue on to try and dial the test server which caused the panic. Now we'll just return after getting an Accept error and bail out.

@@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.

// +build go1.16
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to guard this file with the build tag since pre go1.16 the second test case would not succeed, since the fields are not available, and in reality would basically test the same functionality of the first test with wrong assertions.

Copy link
Contributor

@bmoylan bmoylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 36 of 39 files at r1, 4 of 5 files at r3, 1 of 1 files at r4, 2 of 2 files at r5, 2 of 2 files at r7.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @tabboud)

@bulldozer-bot bulldozer-bot bot merged commit dc3fd3e into develop Jun 21, 2021
@bulldozer-bot bulldozer-bot bot deleted the ta/http2-read-idle-timeout branch June 21, 2021 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants