[WIP] cacher: short circuit conditional progress notifier #124928

MadhavJivrajani · 2024-05-17T10:41:25Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

There can exist an extremely unlikely scenario where
the progress requester can repeatedly request progress
notifications even when the feature might not be supported.

The feature checker returns true if any node in an etcd
cluster has the feature enabled. However, it may change this
to false if it eventually detects a node which does not
have this enabled. Ideally, till we detect false we work
under the assumption that this feature is supported, however
we should stop requesting progress notifications as soon as
we get a false in order to prevent loading etcd with
redundant requests.

Which issue(s) this PR fixes:

xref: #124867 (comment)

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig api-machinery scalability
/cc @p0lyn0mial @wojtek-t

k8s-ci-robot · 2024-05-17T10:41:33Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-05-17T10:41:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MadhavJivrajani
Once this PR has been reviewed and has the lgtm label, please assign liggitt for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

staging/src/k8s.io/apiserver/pkg/storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

There can exist an extremely unlikely scenario where the progress requester can repeatedly request progress notifications even when the feature might not be supported. The feature checker returns `true` if any node in an etcd cluster has the feature enabled. However, it may change this to `false` if it eventually detects a node which does not have this enabled. Ideally, till we detect `false` we work under the assumption that this feature is supported, however we should stop requesting progress notifications as soon as we get a `false` in order to prevent loading etcd with redundant requests. Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>

p0lyn0mial · 2024-05-20T10:34:54Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/watch_progress.go

+			// redundant requests.
+			if !etcdfeature.DefaultFeatureSupportChecker.Supports(storage.RequestWatchProgress) {
+				pr.mux.Lock()
+				pr.stopped = true


I'm not sure if we should stop the requestor here, maybe we could just change the shouldRequest function.

The question is whether we want to check the feature gate in the requestor or simply reserve/delegate the checking of that flag to the higher layers.

/cc @wojtek-t @serathius

Yup - this is not a place where we should check it.

We're aware of the issue where it may change, but it was conscious decision and we generally don't expect setups where there are multiple etcd clusters backing a single cluster being in different versions.
So we consciously went for simplicity and accept a corner case we're not handling that should never happen.

@MadhavJivrajani does the above make sense to you? We could always revisit this PR in the future if our assumptions are wrong.

That makes sense to me @p0lyn0mial @wojtek-t, thanks.

For my curiosity, if we were to revisit this in the future sometime, what would be the right place to check this if not the progress checker? I would have assumed that since it can change at runtime, its better to check it at the place where its periodically requesting the additional request.

/close

k8s-ci-robot · 2024-05-26T04:19:47Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-05-27T06:28:23Z

@MadhavJivrajani: Closed this PR.

In response to this:

That makes sense to me @p0lyn0mial @wojtek-t, thanks.

For my curiosity, if we were to revisit this in the future sometime, what would be the right place to check this if not the progress checker? I would have assumed that since it can change at runtime, its better to check it at the place where its periodically requesting the additional request.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot requested review from p0lyn0mial and wojtek-t May 17, 2024 10:41

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 17, 2024

MadhavJivrajani mentioned this pull request May 17, 2024

storage/cacher: waitUntilWatchCacheFreshAndForceAllEvents checks if storage.RequestWatchProgress is supported #124867

Merged

k8s-ci-robot added the area/apiserver label May 17, 2024

MadhavJivrajani force-pushed the feature-progress-notify branch from 32a08f1 to cc80c7c Compare May 17, 2024 10:43

MadhavJivrajani force-pushed the feature-progress-notify branch from cc80c7c to 349a7ea Compare May 17, 2024 10:50

p0lyn0mial reviewed May 20, 2024

View reviewed changes

k8s-ci-robot requested a review from serathius May 20, 2024 10:34

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2024

k8s-ci-robot closed this May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] cacher: short circuit conditional progress notifier #124928

[WIP] cacher: short circuit conditional progress notifier #124928

MadhavJivrajani commented May 17, 2024 •

edited

k8s-ci-robot commented May 17, 2024

k8s-ci-robot commented May 17, 2024

p0lyn0mial May 20, 2024

wojtek-t May 20, 2024

p0lyn0mial May 21, 2024

MadhavJivrajani May 27, 2024

k8s-ci-robot commented May 26, 2024

k8s-ci-robot commented May 27, 2024

[WIP] cacher: short circuit conditional progress notifier #124928

[WIP] cacher: short circuit conditional progress notifier #124928

Conversation

MadhavJivrajani commented May 17, 2024 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented May 17, 2024

k8s-ci-robot commented May 17, 2024

p0lyn0mial May 20, 2024

Choose a reason for hiding this comment

wojtek-t May 20, 2024

Choose a reason for hiding this comment

p0lyn0mial May 21, 2024

Choose a reason for hiding this comment

MadhavJivrajani May 27, 2024

Choose a reason for hiding this comment

k8s-ci-robot commented May 26, 2024

k8s-ci-robot commented May 27, 2024

MadhavJivrajani commented May 17, 2024 •

edited