Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage #123935

Merged

Conversation

serathius
Copy link
Contributor

@serathius serathius commented Mar 14, 2024

/kind bug

With discovery of etcd watch correctness under the extreme cases of watch congestion (#123072 (comment)), #123532 is no longer a sufficient mitigation.

Even with watch cache protected, direct watches to etcd will still suffer loss of events and cause etcd memory blot. Rolling forward with improvements to etcd watch like etcd-io/etcd#17555 might be more riskly as will cause watch compaction affect new code paths.

Serving watches directly from etcd was reintroduced in 1.27 as a simple correctness fix #115096, however we didn't anticipate such big impact on stability and correctness of watch.

I propose to revert the #115096 and leave a WatchFromStorageWithoutResourceVersion feature flag to get the old behavior. At least for as long we implement ConsistentWatchFromCache as part of KEP-2340

kube-apiserver: fixes a 1.27+ regression in watch stability by serving watch requests without a resourceVersion from the watch cache by default, as in <1.27 (disabling the change in #115096 by default). This mitigates the impact of an etcd watch bug (https://github.com/etcd-io/etcd/pull/17555). If the 1.27 change in #115096 to serve these requests from underlying storage is still desired despite the impact on watch stability, it can be re-enabled with a `WatchFromStorageWithoutResourceVersion` feature gate.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 14, 2024
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 14, 2024
@dims
Copy link
Member

dims commented Mar 14, 2024

@serathius @liggitt we want this in 1.30 and backports all the way back to 1.27 right?

@dims
Copy link
Member

dims commented Mar 14, 2024

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 14, 2024
@dims
Copy link
Member

dims commented Mar 14, 2024

/sig etcd

@k8s-ci-robot k8s-ci-robot added the sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. label Mar 14, 2024
@serathius serathius changed the title Allow escape hatch to disable direct watches to etcd Introduce a ConsistentWatchFromStorage feature gate, enabled by default, that can be used to disable direct watches to etcd Mar 14, 2024
@liggitt
Copy link
Member

liggitt commented Mar 14, 2024

@serathius @liggitt we want this in 1.30 and backports all the way back to 1.27 right?

yup :-/

@serathius serathius changed the title Introduce a ConsistentWatchFromStorage feature gate, enabled by default, that can be used to disable direct watches to etcd Introduce a WatchFromCacheWithoutResourceVersion feature gate, that can be used to disable direct watches to storage. Mar 14, 2024
@serathius serathius force-pushed the consistent-watch-from-etcd branch 3 times, most recently from 13f3948 to 09d6b1e Compare March 17, 2024 11:38
@serathius
Copy link
Contributor Author

/retest

@serathius serathius force-pushed the consistent-watch-from-etcd branch 2 times, most recently from 0e722fc to b692ace Compare March 17, 2024 12:27
@serathius
Copy link
Contributor Author

Yey! Got through test issues! cc @liggitt

Note; I have updated getWatchCacheResourceVersion and it's test to make sure it passes and treated TestWatchSemantics test as authoritative to Watch behavior.

func (c *Cacher) getWatchCacheResourceVersion(ctx context.Context, parsedWatchResourceVersion uint64, opts storage.ListOptions) (uint64, error) {
if len(opts.ResourceVersion) != 0 {
return parsedWatchResourceVersion, nil
}
// legacy case
if !utilfeature.DefaultFeatureGate.Enabled(features.WatchFromStorageWithoutResourceVersion) && opts.SendInitialEvents == nil && opts.ResourceVersion == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add it here?

If we're already at this point, we will be serving watch from cache. So the problem we're facing [being watches on etcd overloading etcd] is not a problem here. I don't think this is needed here - I would revert changes to this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactors for WatchList have changed the default behavior of watch cache, that was not a problem was previously this response was served from etcd, however to pass the tests (WatchSemantics and conformance test) I need to update this function.

Reason is that getWatchCacheResourceVersion returns a ResourceVersion to which the watch cache must be synchronized to. Without this change getWatchCacheResourceVersion will call GetCurrentResourceVersionFromStorage which is meant for consistent read, and wait for such revision.

With my change I will return zero, which means serve the data available from cache, which is consistent with the old behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - we will need to update tests when we will switch FG default, but that's fine.

So the tests should effectively be [if FG set, then 100, otherwise 0].
So maybe please add a short comment for those tests that when we switch FG, those should be reverted to 100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time following what this bit does, but at least the 1.27 / 1.28 / 1.29 backports won't need this and will be a straight gating of the etcd pass-through, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - this bit will not be part of backports

@wojtek-t
Copy link
Member

/hold

…romStorageWithoutResourceVersion feature gate to allow serving watch from storage.
@wojtek-t
Copy link
Member

/lgtm
/approve

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1ee9a24abc8f29c9648098950fee946f000bcadb

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu, liggitt, serathius, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius
Copy link
Contributor Author

/retest

defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.WatchFromStorageWithoutResourceVersion, false)()
store, terminate := testSetupWithEtcdAndCreateWrapper(t)
t.Cleanup(terminate)
storagetesting.RunWatchSemantics(context.TODO(), t, store)
Copy link
Member

@liggitt liggitt Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doubling of the RunWatchSemantics test is responsible for the increased timeout rate in https://testgrid.k8s.io/sig-release-master-blocking#ci-kubernetes-unit&width=20 and #123850

k8s-ci-robot added a commit that referenced this pull request Mar 20, 2024
…1.29

Cherry-pick of #120897 #123935 #123887 #123994: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage.
k8s-ci-robot added a commit that referenced this pull request Mar 20, 2024
…1.28

Cherry-pick of #123935: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage.
k8s-ci-robot added a commit that referenced this pull request Mar 20, 2024
…1.27

Cherry-pick of #123935: Serve watch without resourceVersion from cache and introduce a WatchFromStorageWithoutResourceVersion feature gate to allow serving watch from storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/etcd Categorizes an issue or PR as relevant to SIG Etcd. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants