Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver/storage: improve RunWatchSemanticInitialEventsExtended test #122676

Conversation

p0lyn0mial
Copy link
Contributor

changes the test to populate the underlying data store with more data to trigger potential ordering issues.

What type of PR is this?

/kind feature

What this PR does / why we need it:

changes the RunWatchSemanticInitialEventsExtended test to populate the underlying data store with
more data to trigger potential ordering issues.

As of today the test is run for etcd and cache implementations and shows inconsistencies between the implementations.

The etcd implementation sorts the results and the test passes.

❯ go run -mod=mod golang.org/x/tools/cmd/stress ./etcd3.test -test.run TestEtcdWatchSemanticInitialEventsExtended
5s: 11 runs so far, 0 failures
10s: 31 runs so far, 0 failures
15s: 50 runs so far, 0 failures
20s: 70 runs so far, 0 failures
25s: 87 runs so far, 0 failures
30s: 105 runs so far, 0 failures
35s: 124 runs so far, 0 failures
40s: 143 runs so far, 0 failures
45s: 162 runs so far, 0 failures
50s: 180 runs so far, 0 failures
55s: 199 runs so far, 0 failures
1m0s: 217 runs so far, 0 failures
1m5s: 236 runs so far, 0 failures
1m10s: 253 runs so far, 0 failures
1m15s: 271 runs so far, 0 failures
1m20s: 290 runs so far, 0 failures
1m25s: 311 runs so far, 0 failures
1m30s: 325 runs so far, 0 failures
1m35s: 346 runs so far, 0 failures
1m40s: 364 runs so far, 0 failures
1m45s: 382 runs so far, 0 failures
1m50s: 400 runs so far, 0 failures
1m55s: 421 runs so far, 0 failures
2m0s: 439 runs so far, 0 failures
2m5s: 458 runs so far, 0 failures
2m10s: 478 runs so far, 0 failures

The cache implementation fails immediately with ordering issues and the test fails.

--- FAIL: TestCacherWatchSemanticInitialEventsExtended (4.21s)
    utils.go:155: incorrect event:   watch.Event{
          	Type: "ADDED",
          	Object: &example.Pod{
          		TypeMeta: {},
          		ObjectMeta: v1.ObjectMeta{
        - 			Name:              "pod-1",
        + 			Name:              "pod-2",
          			GenerateName:      "",
          			Namespace:         "ns-foo",
          			SelfLink:          "",
          			UID:               "",
        - 			ResourceVersion:   "2",
        + 			ResourceVersion:   "3",
          			Generation:        0,
          			CreationTimestamp: {},
          			... // 7 identical fields
          		},
          		Spec:   {},
          		Status: {},
          	},
          }
    utils.go:155: incorrect event:   watch.Event{
          	Type: "ADDED",
          	Object: &example.Pod{
          		TypeMeta: {},
          		ObjectMeta: v1.ObjectMeta{
        - 			Name:              "pod-2",
        + 			Name:              "pod-3",
          			GenerateName:      "",
          			Namespace:         "ns-foo",
          			SelfLink:          "",
          			UID:               "",
        - 			ResourceVersion:   "3",
        + 			ResourceVersion:   "4",
          			Generation:        0,
          			CreationTimestamp: {},
          			... // 7 identical fields
          		},
          		Spec:   {},
          		Status: {},

…
5s: 6 runs so far, 6 failures (100.00%)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 10, 2024
@p0lyn0mial
Copy link
Contributor Author

/hold

until we have a fix (possibly #120897)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2024
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jan 10, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 10, 2024
@p0lyn0mial
Copy link
Contributor Author

/assign @wojtek-t

@cici37
Copy link
Contributor

cici37 commented Jan 16, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 16, 2024
@wojtek-t
Copy link
Member

wojtek-t commented Feb 1, 2024

@p0lyn0mial - I'm leaning towards saying that we wait for cache to be synced and only then send events.
But yeah - let's get back to it once we fix the actual problem.

changes the test to populate the underlying data store with
more data to trigger potential ordering issues.
@p0lyn0mial p0lyn0mial force-pushed the upstream-watch-cache-init-events-ordering branch from 705dd2e to 20ded27 Compare February 28, 2024 09:57
@p0lyn0mial
Copy link
Contributor Author

/retest

@p0lyn0mial
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 29, 2024
@wojtek-t
Copy link
Member

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 29, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 746bab81a6baf63d7d2aa56b9086dc8664d7af37

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 29, 2024
@k8s-ci-robot
Copy link
Contributor

@p0lyn0mial: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-conformance-kind-ga-only-parallel 20ded27 link unknown /test pull-kubernetes-conformance-kind-ga-only-parallel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@p0lyn0mial
Copy link
Contributor Author

/test pull-kubernetes-conformance-kind-ga-only-parallel

@k8s-ci-robot k8s-ci-robot merged commit 234f0fc into kubernetes:master Feb 29, 2024
13 of 14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants