New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send watch bookmarks every minute #90249
Send watch bookmarks every minute #90249
Conversation
/hold This requires scale testing. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/assign @jpbetz |
We've run scale tests against this change. I went through the metrics, focusing on the SLOs - api call latencies and pod startup time and the differences are statistically insignificant (some metrics are slightly better, some metrics are slightly worse) So I strongly believe it's actually worth proceeding with this change, as it will allow us to reduce number of relists if something bad is happening in the cluster. /hold cancel @mm4tt - PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
Just some minor NITs and a question.
|
||
func (c *cacheWatcher) nextBookmarkTime(now time.Time) time.Time { | ||
// We try to send bookmarks: | ||
// (a) roughly every minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: every minute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// We try to send bookmarks: | ||
// (a) roughly every minutes | ||
// (b) right before the watcher timeout - for now we simply set it 2s before | ||
// the deadline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: one more space to adjust indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// The former gives us periodicity if the watch breaks due to unexpected | ||
// conditions, the later ensures that on timeout the watcher is as close to | ||
// now as possible - this covers 99% of cases. | ||
heartbeatTime := now.Add(bookmarkHeartbeatFrequency) | ||
if c.deadline.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Is it possible (watch without deadline/timeout) or is it just a safe check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using our framework like reflector - it will always be set:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reflector.go#L387
We also try to default it it is't not provided:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go#L249
But you may configure it in a way to not provide minRequestTimeout, and then it may happen (though in properly configuder custer it shouldn't happen).
4bc660b
to
d4b532e
Compare
@mm4tt - PTAL |
Nice! /lgtm |
/retest |
1 similar comment
/retest |
We've rerun the tests. With this run, the "prometheus simple" metrics look pretty much the same as our regular runs, "prometheus" - most of things look pretty similar, modulo PUT leases, which is around MAX from the last 20 runs. /hold cancel |
Actually, this change broke the metrics, e.g.: The reason is that we don't readd watcher after another bookmark. I'm going to open a fix later today. |
Actually, given it will be a slightly bigger change, I will revert this one for now. |
/lgtm |
Fix #90160