Shorten waits in TestWatchStreamSeparation #123888

serathius · 2024-03-12T09:06:40Z

/kind cleanup

NONE

Before

kubernetes $ go test ./staging/src/k8s.io/apiserver/pkg/storage/cacher --run TestWatchStreamSeparation --count 1
ok      k8s.io/apiserver/pkg/storage/cacher     13.804s

After

go test ./staging/src/k8s.io/apiserver/pkg/storage/cacher --run TestWatchStreamSeparation --count 1
ok      k8s.io/apiserver/pkg/storage/cacher     7.783s

k8s-ci-robot · 2024-03-12T09:06:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: serathius
Once this PR has been reviewed and has the lgtm label, please assign wojtek-t for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

staging/src/k8s.io/apiserver/pkg/storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

p0lyn0mial · 2024-03-12T09:17:51Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

 			err = cacher.storage.RequestWatchProgress(metadata.NewOutgoingContext(context.Background(), contextMetadata))
 			if err != nil {
 				t.Fatal(err)
 			}
 			// Give time for bookmark to arrive
-			time.Sleep(time.Second)
+			time.Sleep(100 * time.Millisecond)


would it make sense to stress the test to make sure it is not flaky ?

Run the test couple of times, however I forgot to use stress let me test it again.

(stress with a binary compiled with / without -race is most helpful)

Done:

kubernetes $ rm cacher.test kubernetes $ go test ./staging/src/k8s.io/apiserver/pkg/storage/cacher/ --run TestWatchStreamSeparation -c -race kubernetes $ stress ./cacher.test --test.run TestWatchStreamSeparation 5s: 0 runs so far, 0 failures 10s: 12 runs so far, 0 failures 15s: 12 runs so far, 0 failures 20s: 24 runs so far, 0 failures 25s: 29 runs so far, 0 failures 30s: 36 runs so far, 0 failures 35s: 46 runs so far, 0 failures 40s: 48 runs so far, 0 failures 45s: 59 runs so far, 0 failures 50s: 63 runs so far, 0 failures 55s: 72 runs so far, 0 failures 1m0s: 78 runs so far, 0 failures 1m5s: 84 runs so far, 0 failures 1m10s: 94 runs so far, 0 failures 1m15s: 98 runs so far, 0 failures 1m20s: 108 runs so far, 0 failures 1m25s: 116 runs so far, 0 failures 1m30s: 122 runs so far, 0 failures 1m35s: 129 runs so far, 0 failures 1m40s: 135 runs so far, 0 failures 1m45s: 143 runs so far, 0 failures ^C

p0lyn0mial · 2024-03-12T09:19:35Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

@@ -2396,7 +2396,7 @@ func TestWatchStreamSeparation(t *testing.T) {
 				defer cacher.watchCache.RUnlock()
 				return cacher.watchCache.resourceVersion
 			}
-			waitContext, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+			waitContext, cancel := context.WithTimeout(context.Background(), time.Second)


this (ctx) only matters when something goes wrong and the fails, right ?

agreed, this is the upper-bound we will wait for before failing the test by returning a blank RV. I don't think shortening this will actually make the test pass faster, just increase the possibility of failing

It will because there are subtests where we expect to not get bookmark.

ah, I see. are we confident a second is long enough to not flake on the tests where we do expect a bookmark event?

It will because there are subtests where we expect to not get bookmark.

Actually, if we choose our wait time differently for these two types of tests, we can leave the wait as-is (2 seconds) when we do expect to receive a bookmark, and shorten the wait more aggressively (500ms? 250ms?) when we don't expect to receive a bookmark.

Isn't such asymmetry make the test less accurate? I understand that we can lower the timeout when we don't expect the bookmark as we don't need to wait, however on the other hand it makes it easy to miss if a regression is introduce later. In my opinion making a comparison test like this, we should keep the setup symmetrical to be sure.

I don't think we'd lower it so much it would be easy to miss a regression. We should pick a time large enough that consistently would receive an incorrect bookmark event and fail the test.

p0lyn0mial · 2024-03-12T10:04:12Z

xref: #123685

liggitt · 2024-03-12T16:30:11Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

 			err = cacher.storage.RequestWatchProgress(metadata.NewOutgoingContext(context.Background(), contextMetadata))
 			if err != nil {
 				t.Fatal(err)
 			}
 			// Give time for bookmark to arrive
-			time.Sleep(time.Second)
+			time.Sleep(100 * time.Millisecond)


why do we need to sleep at all if the next thing we do is call waitForEtcdBookmark() which blocks until the event is received?

This is for the bookmark in watch cache.

but... waitForEtcdBookmark() will wait until we receive it, right? why do we need to sleep?

not sure if this is related but 100ms wasn't enough for #123926

Right, increased to 200ms

jiahuif · 2024-03-12T20:10:44Z

/triage accepted

liggitt · 2024-03-18T14:08:41Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

historically, millisecond-level timing has proven flaky in resource-constrained CI environments

I really don't think we should try to time these sleeps that tightly.

For positive tests, where we expect to receive an event, can we eliminate the sleep entirely and switch to event-based or poll-based checks (on a short interval, e.g. 10-50ms)?

For negative tests, where we expect not to receive an event, can we pick a sleep period ~10x the normal time it takes to receive the event that we will wait to ensure we didn't get an unexpected event?

k8s-ci-robot · 2024-04-24T23:38:29Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 12, 2024

k8s-ci-robot requested review from caesarxuchao and liggitt March 12, 2024 09:06

p0lyn0mial reviewed Mar 12, 2024

View reviewed changes

liggitt reviewed Mar 12, 2024

View reviewed changes

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 12, 2024

pacoxu mentioned this pull request Mar 14, 2024

[Flaking Test] UT k8s.io/apiserver/pkg/storage cacher #123850

Closed

Shorten waits in TestWatchStreamSeparation

741a58d

serathius force-pushed the shorter-stream-separation branch from a80adcc to 741a58d Compare March 17, 2024 16:40

liggitt reviewed Mar 18, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shorten waits in TestWatchStreamSeparation #123888

Shorten waits in TestWatchStreamSeparation #123888

serathius commented Mar 12, 2024

k8s-ci-robot commented Mar 12, 2024

p0lyn0mial Mar 12, 2024

serathius Mar 12, 2024

liggitt Mar 12, 2024

serathius Mar 17, 2024 •

edited

p0lyn0mial Mar 12, 2024 •

edited

serathius Mar 12, 2024

liggitt Mar 12, 2024

serathius Mar 12, 2024

liggitt Mar 12, 2024

liggitt Mar 13, 2024

serathius Mar 17, 2024

liggitt Mar 17, 2024

p0lyn0mial commented Mar 12, 2024

liggitt Mar 12, 2024

serathius Mar 12, 2024

liggitt Mar 12, 2024

p0lyn0mial Mar 14, 2024

serathius Mar 17, 2024

jiahuif commented Mar 12, 2024

liggitt Mar 18, 2024

k8s-ci-robot commented Apr 24, 2024

Shorten waits in TestWatchStreamSeparation #123888

Are you sure you want to change the base?

Shorten waits in TestWatchStreamSeparation #123888

Conversation

serathius commented Mar 12, 2024

k8s-ci-robot commented Mar 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius Mar 17, 2024 • edited

Choose a reason for hiding this comment

p0lyn0mial Mar 12, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial commented Mar 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiahuif commented Mar 12, 2024

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 24, 2024

serathius Mar 17, 2024 •

edited

p0lyn0mial Mar 12, 2024 •

edited