Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send watch bookmarks every minute #90249

Merged
merged 1 commit into from Apr 24, 2020

Conversation

wojtek-t
Copy link
Member

Fix #90160

Try to send watch bookmarks (if requested) periodically in addition to sending them right before timeout

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 17, 2020
@wojtek-t
Copy link
Member Author

/hold

This requires scale testing.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 17, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2020
@wojtek-t
Copy link
Member Author

/retest

@fedebongio
Copy link
Contributor

/assign @jpbetz

@wojtek-t wojtek-t added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 23, 2020
@wojtek-t
Copy link
Member Author

We've run scale tests against this change. I went through the metrics, focusing on the SLOs - api call latencies and pod startup time and the differences are statistically insignificant (some metrics are slightly better, some metrics are slightly worse)

So I strongly believe it's actually worth proceeding with this change, as it will allow us to reduce number of relists if something bad is happening in the cluster.

/hold cancel

@mm4tt - PTAL

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2020
Copy link
Contributor

@mm4tt mm4tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
Just some minor NITs and a question.


func (c *cacheWatcher) nextBookmarkTime(now time.Time) time.Time {
// We try to send bookmarks:
// (a) roughly every minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: every minute

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// We try to send bookmarks:
// (a) roughly every minutes
// (b) right before the watcher timeout - for now we simply set it 2s before
// the deadline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: one more space to adjust indent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// The former gives us periodicity if the watch breaks due to unexpected
// conditions, the later ensures that on timeout the watcher is as close to
// now as possible - this covers 99% of cases.
heartbeatTime := now.Add(bookmarkHeartbeatFrequency)
if c.deadline.IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Is it possible (watch without deadline/timeout) or is it just a safe check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're using our framework like reflector - it will always be set:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reflector.go#L387
We also try to default it it is't not provided:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go#L249

But you may configure it in a way to not provide minRequestTimeout, and then it may happen (though in properly configuder custer it shouldn't happen).

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Apr 23, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 23, 2020
@wojtek-t
Copy link
Member Author

@mm4tt - PTAL

@mm4tt
Copy link
Contributor

mm4tt commented Apr 23, 2020

Nice!

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 23, 2020
@wojtek-t
Copy link
Member Author

/retest

1 similar comment
@wojtek-t
Copy link
Member Author

/retest

@wojtek-t
Copy link
Member Author

Heh - it appared that the tests were incorrectly run - @jkaniuk @mm4tt - FYI
Holding until we analyze the correct run.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2020
@wojtek-t
Copy link
Member Author

We've rerun the tests. With this run, the "prometheus simple" metrics look pretty much the same as our regular runs, "prometheus" - most of things look pretty similar, modulo PUT leases, which is around MAX from the last 20 runs.
That said, there is no reason why PUT leases could be longer via this change (other than some contention on system-level resources like cpu or memory allocations). Neither of those seem to be problematic looking into profiles, so I'm going to cancel the hold.
We will be monitoring the next runs.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2020
@k8s-ci-robot k8s-ci-robot merged commit c91455d into kubernetes:master Apr 24, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Apr 24, 2020
@wojtek-t
Copy link
Member Author

Actually, this change broke the metrics, e.g.:
http://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=APIServer&metricname=LoadInitEventsCount&resource=*core.Secret

The reason is that we don't readd watcher after another bookmark. I'm going to open a fix later today.
@mm4tt @jpbetz ^^

@wojtek-t
Copy link
Member Author

Actually, given it will be a slightly bigger change, I will revert this one for now.

@mm4tt
Copy link
Contributor

mm4tt commented Apr 27, 2020

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adjust watch bookmark policy to match watch-cache size policy
5 participants