server: wait to close connection until incoming socket is drained (with timeout) #6977

dfawley · 2024-02-09T21:58:55Z

Replaces #6957

Fixes #5358 (A modified version of the test in that issue runs for 5+ minutes without failing with these changes. On a broken branch it fails reliably within 30s, and often much faster.)

See also grpc/grpc-java#9566 and golang/net@cd69bc3 / golang/go#18701

This change also adjusts the ping timer after sending the initial GOAWAY from 1 minute to 5 seconds at @ejona86's recommendation.

RELEASE NOTES:

server: wait to close connection until incoming socket is drained (with timeout) to prevent data loss on client-side

…th timeout)

codecov · 2024-02-09T22:01:17Z

Codecov Report

Merging #6977 (ff4b564) into master (f135e98) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6977      +/-   ##
==========================================
+ Coverage   82.38%   82.45%   +0.07%     
==========================================
  Files         296      296              
  Lines       31432    31450      +18     
==========================================
+ Hits        25896    25933      +37     
+ Misses       4477     4460      -17     
+ Partials     1059     1057       -2

Files	Coverage Δ
internal/transport/controlbuf.go	`87.30% <ø> (-1.72%)`	⬇️
internal/transport/http2_client.go	`90.64% <100.00%> (-0.51%)`	⬇️
internal/transport/http2_server.go	`89.98% <100.00%> (+0.41%)`	⬆️

... and 14 files with indirect coverage changes

arvindbr8

Nice! LGTM

…th timeout) (grpc#6977)

ash2k · 2024-02-13T04:45:34Z

Can we get a release with this fix please? Keen to try and see if it fixes our troubles.

dfawley · 2024-02-13T18:36:32Z

It will be in 1.62.0 next week. I'll see if it's easy to backport to 1.61.x and push a 1.61 patch release if so.

…th timeout) (grpc#6977)

dfawley · 2024-02-14T00:11:21Z

I pushed a patch release with the fix: https://github.com/grpc/grpc-go/releases/tag/v1.61.1

ash2k · 2024-02-14T01:02:23Z

Thank you!

mtekeli · 2024-02-14T14:37:20Z

internal/transport/http2_server.go

+			// https://github.com/grpc/grpc-go/issues/5358
+			select {
+			case <-t.readerDone:
+			case <-time.After(time.Second):


Note that no one can stop this scheduled timer when the reader is done first (which would probably happen most of the times). I would suggest to change it to a timer object so it can be stopped manually (and therefore released) when the reader is done first.

In a scenario where this happens with many clients (or the same client) repeatedly wouldn't you just schedule many timers that cannot be stopped until they fire (i.e a temporary memory leak)?

True that it isn't stopped, but it is fixed for 1 second, so I don't believe this could be a real problem for anyone.

Even when many connections having non IO errors? Do you think it's unlikely or even not possible?

I think the key here is that it's a non-I/O error, which means it would be initiated by the server itself. If the server is terminating already-handshaked connections at a rate of >1 per second, then that seems like a real problem.

That said, I think it's OK to just forbid the use of time.After in our repo (outside of tests) since it's always a potential concern. I'll add a vet.sh check for this and change it on master.

server: wait to close connection until incoming socket is drained (wi…

f416f95

…th timeout)

dfawley added the Type: Bug label Feb 9, 2024

dfawley added this to the 1.62 Release milestone Feb 9, 2024

dfawley requested a review from ejona86 February 9, 2024 22:01

dfawley assigned ejona86 Feb 9, 2024

don't close for i/o errors to match prior behavior

ff4b564

ejona86 approved these changes Feb 9, 2024

View reviewed changes

dfawley mentioned this pull request Feb 9, 2024

Delay closing connection on the server after all streams are completed #6957

Closed

dfawley requested a review from arvindbr8 February 9, 2024 23:35

dfawley assigned arvindbr8 and unassigned ejona86 Feb 9, 2024

arvindbr8 approved these changes Feb 10, 2024

View reviewed changes

arvindbr8 assigned dfawley and unassigned arvindbr8 Feb 10, 2024

dfawley merged commit 05db80f into grpc:master Feb 12, 2024
14 checks passed

dfawley deleted the drain branch February 12, 2024 16:39

dfawley added a commit to dfawley/grpc-go that referenced this pull request Feb 12, 2024

server: wait to close connection until incoming socket is drained (wi…

a046599

…th timeout) (grpc#6977)

dfawley mentioned this pull request Feb 12, 2024

gRPC connection closing with unexpected EOF on high-latency connections #5358

Closed

dfawley added a commit that referenced this pull request Feb 12, 2024

cherry-pick #6977 to 1.62.x release branch (#6979)

8a4ca0c

dfawley added a commit to dfawley/grpc-go that referenced this pull request Feb 13, 2024

server: wait to close connection until incoming socket is drained (wi…

467a4ed

…th timeout) (grpc#6977)

dfawley added a commit that referenced this pull request Feb 13, 2024

cherry-pick #6977 to 1.61.x release branch (#6980)

dbd4cbc

narqo mentioned this pull request Feb 14, 2024

vendor: Bump grpc-go to 1.61 latest grafana/mimir#7380

Merged

4 tasks

mtekeli reviewed Feb 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: wait to close connection until incoming socket is drained (with timeout) #6977

server: wait to close connection until incoming socket is drained (with timeout) #6977

dfawley commented Feb 9, 2024 •

edited

codecov bot commented Feb 9, 2024 •

edited

arvindbr8 left a comment

ash2k commented Feb 13, 2024

dfawley commented Feb 13, 2024

dfawley commented Feb 14, 2024 •

edited

ash2k commented Feb 14, 2024

mtekeli Feb 14, 2024 •

edited

dfawley Feb 14, 2024

mtekeli Feb 15, 2024

dfawley Feb 15, 2024

server: wait to close connection until incoming socket is drained (with timeout) #6977

server: wait to close connection until incoming socket is drained (with timeout) #6977

Conversation

dfawley commented Feb 9, 2024 • edited

codecov bot commented Feb 9, 2024 • edited

Codecov Report

arvindbr8 left a comment

Choose a reason for hiding this comment

ash2k commented Feb 13, 2024

dfawley commented Feb 13, 2024

dfawley commented Feb 14, 2024 • edited

ash2k commented Feb 14, 2024

mtekeli Feb 14, 2024 • edited

Choose a reason for hiding this comment

dfawley Feb 14, 2024

Choose a reason for hiding this comment

mtekeli Feb 15, 2024

Choose a reason for hiding this comment

dfawley Feb 15, 2024

Choose a reason for hiding this comment

dfawley commented Feb 9, 2024 •

edited

codecov bot commented Feb 9, 2024 •

edited

dfawley commented Feb 14, 2024 •

edited

mtekeli Feb 14, 2024 •

edited