Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc: support channel idleness #6263

Merged
merged 21 commits into from May 22, 2023
Merged

Conversation

easwars
Copy link
Contributor

@easwars easwars commented May 9, 2023

This PR adds support for channel idleness. Summary of changes:

  • Added a new component idlenessManager that
    • keeps track of RPC activity on the channel, and
    • instructs the channel to enter or exit idle mode
  • ClientConn
    • methods to enter and exit idle mode
      • these are invoked by the newly added idlenessManager
      • these take care of shutting down and recreating the name resolver, load balancer, and blocking picker
      • these also set connectivity state appropriately
    • call into the idlenessManager from Invoke() and NewStream()
      • these are the RPC entry points
    • refactor DialContext a little bit for better code flow
    • addTraceEvent helper to emit channelz trace events
  • Add a WithIdleTimeout dial option to set idle_timeout
    • Defaults to 30m if unset (current default used by Java)
    • Disables channel idleness if explicitly set to 0
  • balancer wrapper
    • methods to enter and exit idle mode by shutting down and recreating the balancer respectively
    • not forwarding calls from the balancer to grpc when the channel is in idle mode
  • picker wrapper (or blocking picker)
    • methods to enter and exit idle mode
    • when in idle mode, drops picker updates
  • resolver wrapper
    • methods to enter and exit idle mode by shutting down and recreating the name resolver respectively

RELEASE NOTES:

  • grpc: support channel idleness using WithIdleTimeout dial option

@easwars easwars requested a review from dfawley May 9, 2023 00:58
@easwars easwars added this to the 1.56 Release milestone May 9, 2023
@easwars
Copy link
Contributor Author

easwars commented May 9, 2023

I will be adding e2e tests soon and will push a commit for the same. But the PR is ready to looked at.

@easwars
Copy link
Contributor Author

easwars commented May 9, 2023

Looks like there is a race around closing and entering/exiting idle. I hit it with my first e2e test. Will ping the PR when this is ready to be looked at again. Sorry.

@easwars easwars removed the request for review from dfawley May 10, 2023 16:39
@easwars easwars force-pushed the channel_idleness_support_2 branch from 41500ec to 3649ed2 Compare May 12, 2023 00:42
@easwars easwars marked this pull request as draft May 12, 2023 00:43
@easwars easwars force-pushed the channel_idleness_support_2 branch 10 times, most recently from 8ee9c19 to 7d21f4f Compare May 12, 2023 19:20
@easwars easwars marked this pull request as ready for review May 12, 2023 19:30
@easwars easwars requested a review from dfawley May 12, 2023 19:33
@easwars easwars force-pushed the channel_idleness_support_2 branch from 7d21f4f to a3771e2 Compare May 12, 2023 20:02
Copy link
Contributor

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'll be able to finish a pass today, but here are my thoughts on the idle manager so far...

idle.go Outdated
)

// For overriding in unit tests.
var newTimer = func(d time.Duration, f func()) *time.Timer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: afterFunc or timeAfterFunc please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

idle.go Outdated

enforcer idlenessEnforcer // Functionality provided by grpc.ClientConn.
timeout int64 // Idle timeout duration nanos stored as an int64.
isDisabled bool // Disabled if idle_timeout is set to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, but how about returning a nil *idlenessManager instead and make the receivers short-circuit via nil? Or don't call from cc if nil. Or make this implement an interface and have a disabledIdlenessManager type that is returned instead. It seems unusual to have a field that indicates the whole struct shouldn't even be used, given that it doesn't change dynamically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Used the short-circuit via nil option.

idle.go Outdated Show resolved Hide resolved
idle.go Outdated
Comment on lines 129 to 130
i.timer = newTimer(time.Duration(i.timeout), i.handleIdleTimeout)
i.isIdle = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These things seem like they'd belong in exitIdleMode, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the switch to the atomic idleness manager, this is not applicable anymore.

idle.go Outdated Show resolved Hide resolved
@easwars easwars force-pushed the channel_idleness_support_2 branch from 0e5b271 to 6c1a3ea Compare May 15, 2023 16:32
@dfawley dfawley assigned easwars and unassigned dfawley May 15, 2023
Comment on lines 292 to 294
// Reset the current balancer name so that we act on the next call to
// switchTo by creating a new balancer specified by the new resolver.
ccb.curBalancerName = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to do this when entering idle instead of exiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@easwars easwars assigned dfawley and unassigned easwars May 18, 2023
clientconn.go Outdated
Comment on lines 269 to 273
if cc.dopts.idleTimeout == 0 {
cc.idlenessMgr = newDisabledIdlenessManager()
} else {
cc.idlenessMgr = newAtomicIdlenessManager(cc, cc.dopts.idleTimeout)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify to cc.idlenessMgr = newIdlenessManager(cc, cc.dopts.idleTimeout) and leave the complexity of the implementation up to the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

call.go Outdated
}
return invoke(ctx, method, args, reply, cc, opts...)
cc.idlenessMgr.onCallEnd()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: defer instead and leave the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sCtx, sCancel := context.WithTimeout(ctx, 3*defaultTestShortIdleTimeout)
defer sCancel()
go func() {
for ; sCtx.Err() == nil; <-time.After(defaultTestShortTimeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe defaultTestShortIdleTimeout/2 for this instead? Otherwise these things that are otherwise not dependent on each other are actually dependent on each other for this test's correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

// restarted when exiting idle, it will push the same address to grpc again.
r := manual.NewBuilderWithScheme("whatever")
backend := stubserver.StartTestService(t, nil)
t.Cleanup(func() { backend.Stop() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: t.Cleanup(backend.Stop)

It seems most of the other cleanups can't be simplified similarly (cc.Close errors and things that take parameters have to b e wrapped). 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks. I've gotten so used to using t.Cleanup(func() { ... }) that I didn't even stop to think about the signature of this method.

Comment on lines 122 to 126
sCtx, sCancel := context.WithTimeout(ctx, 3*defaultTestShortIdleTimeout)
defer sCancel()
if cc.WaitForStateChange(sCtx, connectivity.Ready) {
t.Fatalf("Connectivity state changed to %q when expected to stay in READY", cc.GetState())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use these instead!

func awaitState(ctx context.Context, t *testing.T, cc *grpc.ClientConn, stateWant connectivity.State)
func awaitNotState(ctx context.Context, t *testing.T, cc *grpc.ClientConn, stateDoNotWant connectivity.State)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or factor out similarly, next to these (in clientconn_state_transition_test.go), since this particular test doesn't seem to be able to reuse those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And also, added a awaitNoStateChange alongside the existing ones. At some point, we can move them to the testutils package.

@dfawley dfawley assigned easwars and unassigned dfawley May 19, 2023
Copy link
Contributor Author

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!!

call.go Outdated
}
return invoke(ctx, method, args, reply, cc, opts...)
cc.idlenessMgr.onCallEnd()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

clientconn.go Outdated
Comment on lines 269 to 273
if cc.dopts.idleTimeout == 0 {
cc.idlenessMgr = newDisabledIdlenessManager()
} else {
cc.idlenessMgr = newAtomicIdlenessManager(cc, cc.dopts.idleTimeout)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sCtx, sCancel := context.WithTimeout(ctx, 3*defaultTestShortIdleTimeout)
defer sCancel()
go func() {
for ; sCtx.Err() == nil; <-time.After(defaultTestShortTimeout) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

// restarted when exiting idle, it will push the same address to grpc again.
r := manual.NewBuilderWithScheme("whatever")
backend := stubserver.StartTestService(t, nil)
t.Cleanup(func() { backend.Stop() })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks. I've gotten so used to using t.Cleanup(func() { ... }) that I didn't even stop to think about the signature of this method.

Comment on lines 122 to 126
sCtx, sCancel := context.WithTimeout(ctx, 3*defaultTestShortIdleTimeout)
defer sCancel()
if cc.WaitForStateChange(sCtx, connectivity.Ready) {
t.Fatalf("Connectivity state changed to %q when expected to stay in READY", cc.GetState())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And also, added a awaitNoStateChange alongside the existing ones. At some point, we can move them to the testutils package.

@easwars
Copy link
Contributor Author

easwars commented May 20, 2023

Pulled the final set of changes into g3 and the tests seem healthy. Will merge on Monday.

@easwars easwars merged commit 9b7a947 into grpc:master May 22, 2023
11 checks passed
@easwars easwars deleted the channel_idleness_support_2 branch May 22, 2023 19:42
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants