Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: terminate watch with rate limiting during shutdown #114925

Merged
merged 3 commits into from Feb 27, 2023

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Jan 9, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #111886 (it addresses termination of active watches during shutdown)

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Jan 9, 2023
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 9, 2023
@tkashem
Copy link
Contributor Author

tkashem commented Jan 9, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 9, 2023
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 9, 2023
@tkashem
Copy link
Contributor Author

tkashem commented Jan 9, 2023

/cc @wojtek-t @marseel

@wojtek-t
Copy link
Member

wojtek-t commented Jan 9, 2023

@tkashem - I think this is achieving roughly the same thing as: #113546

If so - I think I would prefer Marcel`s one as this is more consistent with our existing shutdown sequence, no?

@wojtek-t
Copy link
Member

wojtek-t commented Jan 9, 2023

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2023
@wojtek-t
Copy link
Member

wojtek-t commented Jan 9, 2023

Or actually wait - you're not storing additional info here, so that is more efficient in terms of memory. Let me take a deeper look.

// too late, the net/http server has shutdown and
// probably has returned an error to the caller
return
case <-s.ServerTerminationWindow.ShuttingDown():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkashem I have a feeling that it should be possible to do that more consistently with the current support for short-running requests.

Currently we have WithWaitGroup that is waiting for short-running requests. And only that is a signal for further steps of graceful shutdown. I think that we should be consistent with that for long-running.

So what I would try is to:

  • plumb to exact same place (just introduce a second SafeWaitGroup for long-running requests
  • expose a channel of IsShuttingDown() from the SafeWaitGroup that will trigger exactly this code path
  • introduce the late-limiter into this SafeWaitGroup and change here to sth like:
  s.AcquireToken()`
  return

[The above suggests that it shouldn't be SafeWaitGroup but a new type being a wrapper around it.]

Now - you additionally don't need to pass the ServerTerminationWindow across the codebase - all you need is to add that to the context in the WithWaitGroup filter.

So finally, this probably no longer belongs to WithWaitGroup, it should be a separate filter doing that.

I think that would be cleaner both than this PR and than @marseel PR...

@tkashem @marseel
thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to POC that anyway - I would love to try it, though I probably won't get to it for at least next two weeks...

Copy link
Contributor Author

@tkashem tkashem Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all you need is to add that to the context in the WithWaitGroup filter.

This sounds good to me, i wired the context

plumb to exact same place (just introduce a second SafeWaitGroup for long-running requests

do we need to keep track of the active watches? currently all active watches return at the same moment when net/http server closes, I thought the goal was to make sure these active watches return with a rate limiter or random sleep.

I think it would make sense to POC that anyway - I would love to try it, though I probably won't get to it for at least next two weeks...

I am happy to tackle it, if you and @marseel think this approach is in the right direction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to keep track of the active watches? currently all active watches return at the same moment when net/http server closes, I thought the goal was to make sure these active watches return with a rate limiter or random sleep.

Yes - the goal is to rate-limit them.
But if there are very few watches and I'm setting a window of say 1m (I expect setting around this number to be most common), I don't want to wait 1m, but rather shut them down quicker, because that would be safe.

So while i don't need to know the exact watches, I need to know their number to know when they are done.

I am happy to tackle it, if you and @marseel think this approach is in the right direction

I'm not 100% convinced because I'm not sure if we won't find something unexpected in the code that would make this harder. But we also won't know until we POC it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if there are very few watches and I'm setting a window of say 1m (I expect setting around this number to be most common), I don't want to wait 1m, but rather shut them down quicker, because that would be safe.

Yea, I think it makes sense. We could just rate limit to max(some constant like 100, #watches/timeout) QPS.

I am happy to tackle it, if you and Marcel Zięba think this approach is in the right direction

I personally like your approach more than mine :). Once we have working POC I can run some scale tests to see if everything works as expected on large scale.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like your approach more than mine :). Once we have working POC I can run some scale tests to see if everything works as expected on large scale.

I think due to the fact of not maintaining a single cache of all connections I like it more too. And by addressing my comments, I think we can make it even cleaner, I hope.

So yes - if you have some time Abu, if you could POC it, it would be great !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wojtek-t @marseel

Yes, I am going to work on it, I am also going to add active watch requests to the graceful termination unit tests we have and and make sure it works as expected. Once the new unit tests pass you can test it at scale.

@wojtek-t wojtek-t self-assigned this Jan 9, 2023
@tkashem tkashem force-pushed the watch-termination branch 4 times, most recently from 984a956 to dbf863a Compare January 10, 2023 02:55
"k8s.io/apiserver/pkg/util/wsstream"
)

// nothing will ever be sent down this channel
var neverExitWatch <-chan time.Time = make(chan time.Time)

// nothing will ever be sent down this channel, used when the request
// context does not have a server shutdown signal associated
var neverShuttingDownCh <-chan struct{} = make(chan struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why is this global?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even just set nil as the shutdown channel. Nil blocks on select.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i don't thin we need it, i removed it, thanks

@@ -156,7 +167,8 @@ type WatchServer struct {
// used to correct the object before we send it to the serializer
Fixup func(runtime.Object) runtime.Object

TimeoutFactory TimeoutFactory
TimeoutFactory TimeoutFactory
ServerShuttingdownCh <-chan struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ServerShuttingDownCh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -260,6 +266,20 @@ type GenericAPIServer struct {
// If enabled, after ShutdownDelayDuration elapses, any incoming request is
// rejected with a 429 status code and a 'Retry-After' response.
ShutdownSendRetryAfter bool

// ShutdownWatchTerminationGracePeriod, if set to a positive value,
// is the maximum grace period the apiserver will allow for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "maximum" mean here? Is there another grace period that might overrule this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated the comments, let me know if it's more clear now

// Wait for all active watches to finish
grace := s.ShutdownWatchTerminationGracePeriod
activeBefore, activeAfter, err := s.WatchRequestWaitGroup.Wait(func(count int) (utilwaitgroup.RateLimiter, context.Context, context.CancelFunc) {
// TODO: this is back of the envelope for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's written on the envelop? What's the oneliner of intuition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment fixed

@@ -950,6 +1011,7 @@ func newGenericAPIServer(t *testing.T, fAudit *fakeAudit, keepListening bool) *G
config, _ := setUp(t)
config.ShutdownDelayDuration = 100 * time.Millisecond
config.ShutdownSendRetryAfter = keepListening
config.ShutdownWatchTerminationGracePeriod = 2 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: // much smaller than normal 30s grace period for other non-long running requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just need to enable watch draining here, so any positive value is fine here, added a comment

@sttts
Copy link
Contributor

sttts commented Feb 27, 2023

few nits. Sgtm overall.

@tkashem
Copy link
Contributor Author

tkashem commented Feb 27, 2023

@sttts I addressed the feedback, please take a look when you have a moment

@wojtek-t
Copy link
Member

This looks great.
I was chatting with Stefan in the morning and he confirmed that all of his comments where nits.

So I'm going ahead and tagging it so merge it as soon as possible to get some reasonable soak time.
If needed, we can always adjust things in a follow up if he will consider that the comment updates (which is the only potentially unresolved thing) requires further tweaking.

/lgtm
/approve

/hold cancel

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 27, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 380cd25e56fe248241b3bbb2d9b8e67f732d3f68

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tkashem, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2023
@tkashem
Copy link
Contributor Author

tkashem commented Feb 27, 2023

I had to fix a failing unit test from my rename operation, this is the diff:

--- a/staging/src/k8s.io/apiserver/pkg/endpoints/watch_test.go
+++ b/staging/src/k8s.io/apiserver/pkg/endpoints/watch_test.go
@@ -647,9 +647,8 @@ func TestWatchHTTPErrors(t *testing.T) {
                Encoder:         newCodec,
                EmbeddedEncoder: newCodec,
 
-               Fixup:                func(obj runtime.Object) runtime.Object { return obj },
-               TimeoutFactory:       &fakeTimeoutFactory{timeoutCh, done},
-               ServerShuttingdownCh: make(chan struct{}),
+               Fixup:          func(obj runtime.Object) runtime.Object { return obj },
+               TimeoutFactory: &fakeTimeoutFactory{timeoutCh, done},
        }

@wojtek-t
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 27, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7bd7e81c1b315de3226b196b358a6a7842a5212e

@k8s-ci-robot k8s-ci-robot merged commit a16fd54 into kubernetes:master Feb 27, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-apiserver: some HTTP requests might hang indefinitely
6 participants