-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement transparent retries for gRFC A6 #1597
Conversation
// Check to make sure the context has expired. This will prevent us from | ||
// looping forever if an error occurs for wait-for-ready RPCs where no data | ||
// is sent on the wire. | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no reason to use a select here, and doing so can result in worse performance than the simpler alternative:
if err := ctx.Err(); err != nil {
return toRPCErr(err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out this is not true. I made some benchmarks, and found the following results:
BenchmarkCancelContextErrNoErr-12 100000000 52.2 ns/op
BenchmarkCancelContextErrGotErr-12 100000000 51.9 ns/op
BenchmarkCancelContextChannelNoErr-12 200000000 19.9 ns/op
BenchmarkCancelContextChannelGotErr-12 100000000 36.1 ns/op
Using the channel is faster as of today. I believe the problem is the defer
in context.Err()
:
https://github.com/golang/net/blob/cd69bc3fc700721b709c3a59e16e24c67b58f6ff/context/pre_go17.go#L162
These numbers line up well with our benchmarks for mutexes vs. channels and defer vs no-defer:
BenchmarkMutexWithDefer-12 20000000 51.2 ns/op
BenchmarkMutex-12 100000000 14.5 ns/op
BenchmarkSelectClosed-12 50000000 20.3 ns/op
BenchmarkSelectOpen-12 300000000 5.74 ns/op
Note how the select is faster than the mutex with a defer, but slower than the mutex with an inline Unlock()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite surprising, thanks for the details. Your comment points to pre_go17.go but I assume these benchmarks were run on 1.9?
Also interesting that selecting on the open channel is faster than the mutex with inline Unlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, there's something fishy here - how can BenchmarkCancelContextChannelGotErr
ever be faster than BenchmarkCancelContextErrGotErr
? The former does strictly more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite surprising, thanks for the details. Your comment points to pre_go17.go but I
assume these benchmarks were run on 1.9?
Yes -- presumably the std context package is implemented identically (based on the results that makes sense).
The former does strictly more work.
Good catch. Actually, it does not do strictly more work, because I forgot to call Err() in this case. Adding it results in:
BenchmarkCancelContextErrGotErr-12 100000000 52.4 ns/op
BenchmarkCancelContextChannelGotErr-12 50000000 94.8 ns/op
BenchmarkTimerContextErrGotErr-12 100000000 52.5 ns/op
BenchmarkTimerContextChannelGotErr-12 50000000 86.7 ns/op
I would still lean toward the existing approach, as it optimizes the non-error case. But maybe the Go team should think about removing that defer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I filed golang/go#22401 for this.
// Check to make sure the context has expired. This will prevent us from | ||
// looping forever if an error occurs for wait-for-ready RPCs where no data | ||
// is sent on the wire. | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto:
if err := ctx.Err(); err != nil {
return toRPCErr(err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes LGTM.
Some possible goaway cleanup in transport.
transport/transport.go
Outdated
@@ -689,6 +672,12 @@ var ( | |||
// ErrStreamDrain indicates that the stream is rejected by the server because | |||
// the server stops accepting new RPCs. | |||
ErrStreamDrain = streamErrorf(codes.Unavailable, "the server stops accepting new RPCs") | |||
// StatusStreamRefused indicates that the server sent a RST_STREAM with the | |||
// REFUSED_STREAM code, meaning the stream was entirely unprocessed. | |||
StatusStreamRefused = status.New(codes.Unavailable, "the server refused the stream") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted entirely (it was unused).
transport/http2_client.go
Outdated
// The stream was unprocessed by the server. | ||
stream.mu.Lock() | ||
stream.unprocessed = true | ||
stream.finish(StatusGoAway) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this finish()
?
finish
closes s.done
, which may result in race in wait()
.
See following comment on ErrStreamDrain
.
If we do a cleanup on stream goaway, this finish()
may become necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave both the finish
and the closing of s.done
for now; we will clean things up in the future.
Unexported statusGoAway
.
transport/transport.go
Outdated
@@ -689,6 +672,12 @@ var ( | |||
// ErrStreamDrain indicates that the stream is rejected by the server because | |||
// the server stops accepting new RPCs. | |||
ErrStreamDrain = streamErrorf(codes.Unavailable, "the server stops accepting new RPCs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this anymore? Seems it's only returned by transport but never check.
IIRC, the purpose of this error is exactly what Unprocessed()
is for.
And if we no longer need this error, we can also (maybe) remove s.goaway
channel.
But should probably make this a TODO for another cleanup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unexported, added TODO. We have a lot more cleaning up we can do.
0928cf8
to
12d4904
Compare
Ping: rebased & resolved conflicts. |
test/end2end_test.go
Outdated
for ; state == connectivity.Ready && cc.WaitForStateChange(ctx, state); state = cc.GetState() { | ||
} | ||
if state == connectivity.Ready { | ||
t.Fatalf("Want connection state to be Ready, got %v", state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want state to be not Ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
test/end2end_test.go
Outdated
if state == connectivity.Ready { | ||
t.Fatalf("Want connection state to be Ready, got %v", state) | ||
} | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context will be leaked if t.Fafalf
before this.
Move this before the if
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/end2end_test.go
Outdated
t.Fatalf("TestService/EmptyCall(_, _) = _, %v, want _, <nil>", err) | ||
} | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not cancel on t.Fatalf
?
defer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL.
test/end2end_test.go
Outdated
for ; state == connectivity.Ready && cc.WaitForStateChange(ctx, state); state = cc.GetState() { | ||
} | ||
if state == connectivity.Ready { | ||
t.Fatalf("Want connection state to be Ready, got %v", state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
test/end2end_test.go
Outdated
if state == connectivity.Ready { | ||
t.Fatalf("Want connection state to be Ready, got %v", state) | ||
} | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/end2end_test.go
Outdated
t.Fatalf("TestService/EmptyCall(_, _) = _, %v, want _, <nil>", err) | ||
} | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Also fixes #1532