Fix TestStartingResourceVersion flakiness #96662

wojtek-t · 2020-11-18T07:37:20Z

NONE

/kind flake
/priority important-soon
/milestone v1.20

k8s-ci-robot · 2020-11-18T07:38:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t · 2020-11-18T10:32:57Z

/retest

wojtek-t · 2020-11-18T11:22:47Z

/retest

jpbetz

The code looks right, just a couple nits around how the test is described in the comments.

jpbetz · 2020-11-18T19:38:22Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

@@ -1006,6 +1006,84 @@ func (f *fakeTimeBudget) takeAvailable() time.Duration {

 func (f *fakeTimeBudget) returnUnused(_ time.Duration) {}

+func TestStartingResourceVersion(t *testing.T) {


nit: Moving this function makes for a diff that is difficult to review.

Yes - but in order to fake dispatchBudget (which is private field) it has to be here :(

jpbetz · 2020-11-18T19:40:54Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	// When using the official `timeBudgetImpl` we were observing flakiness
+	// due under the following conditions:


nit: Took me a while to understand what this meant. Maybe explain what the current test does instead of what the PR change? I.e. "Use a fake timeBudget to prevent this test from flaking under the following conditions:"?

BTW - the comment was copy-pasted from other test and wasn't accurate for this test - fixed that too.

jpbetz · 2020-11-18T19:43:58Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	// 3) if the test was cpu-starved and we weren't able to consume events
+	//    from w2 ResultCh it could have happened that its buffer was also
+	//    filling in and given we no longer had timeBudget (consumed in (1))
+	//    trying to put next item was simply breaking the watch


Can we clarify why this works okay in production but not in test? It's not obvious to me from reading this.

This may happen and watch can be resumed by the client. But this won't test what we want to test.
Added a comment.

wojtek-t

@jpbetz - thanks for comments; PTAL

wojtek-t · 2020-11-19T06:48:15Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

@@ -1006,6 +1006,84 @@ func (f *fakeTimeBudget) takeAvailable() time.Duration {

 func (f *fakeTimeBudget) returnUnused(_ time.Duration) {}

+func TestStartingResourceVersion(t *testing.T) {


Yes - but in order to fake dispatchBudget (which is private field) it has to be here :(

wojtek-t · 2020-11-19T06:56:24Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	// When using the official `timeBudgetImpl` we were observing flakiness
+	// due under the following conditions:


wojtek-t · 2020-11-19T06:56:49Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	// When using the official `timeBudgetImpl` we were observing flakiness
+	// due under the following conditions:


BTW - the comment was copy-pasted from other test and wasn't accurate for this test - fixed that too.

wojtek-t · 2020-11-19T06:57:20Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	// 3) if the test was cpu-starved and we weren't able to consume events
+	//    from w2 ResultCh it could have happened that its buffer was also
+	//    filling in and given we no longer had timeBudget (consumed in (1))
+	//    trying to put next item was simply breaking the watch


This may happen and watch can be resumed by the client. But this won't test what we want to test.
Added a comment.

wojtek-t · 2020-11-19T09:45:24Z

/retest

jpbetz · 2020-11-19T18:06:35Z

/lgtm

Thanks for the flake fix!

jeremyrickard · 2020-11-19T19:27:35Z

/test pull-kubernetes-e2e-kind-ipv6

jeremyrickard · 2020-11-19T20:45:45Z

/test pull-kubernetes-e2e-kind-ipv6

fedebongio · 2020-11-19T21:07:30Z

/triage accepted

jeremyrickard · 2020-11-19T22:44:15Z

/retest

fejta-bot · 2020-11-20T01:45:22Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

wojtek-t assigned liggitt Nov 18, 2020

k8s-ci-robot added this to the v1.20 milestone Nov 18, 2020

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 18, 2020

k8s-ci-robot requested review from mbohlool and mml November 18, 2020 07:37

wojtek-t mentioned this pull request Nov 18, 2020

[Flaky test] TestStartingResourceVersion #96649

Closed

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 18, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2020

wojtek-t force-pushed the fix_starting_rv_test branch from 5d2dcc1 to deb9c3b Compare November 18, 2020 08:35

wojtek-t assigned jpbetz Nov 18, 2020

jpbetz suggested changes Nov 18, 2020

View reviewed changes

Fix TestStartingResourceVersion flakiness

37b0004

wojtek-t force-pushed the fix_starting_rv_test branch from deb9c3b to 37b0004 Compare November 19, 2020 06:57

wojtek-t commented Nov 19, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 19, 2020

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 19, 2020

k8s-ci-robot merged commit 06b0179 into kubernetes:master Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TestStartingResourceVersion flakiness #96662

Fix TestStartingResourceVersion flakiness #96662

wojtek-t commented Nov 18, 2020

k8s-ci-robot commented Nov 18, 2020

wojtek-t commented Nov 18, 2020

wojtek-t commented Nov 18, 2020

jpbetz left a comment

jpbetz Nov 18, 2020

wojtek-t Nov 19, 2020

jpbetz Nov 18, 2020

wojtek-t Nov 19, 2020

wojtek-t Nov 19, 2020

jpbetz Nov 18, 2020

wojtek-t Nov 19, 2020

wojtek-t left a comment

wojtek-t Nov 19, 2020

wojtek-t Nov 19, 2020

wojtek-t Nov 19, 2020

wojtek-t Nov 19, 2020

wojtek-t commented Nov 19, 2020

jpbetz commented Nov 19, 2020

jeremyrickard commented Nov 19, 2020

jeremyrickard commented Nov 19, 2020

fedebongio commented Nov 19, 2020

jeremyrickard commented Nov 19, 2020

fejta-bot commented Nov 20, 2020

		@@ -1006,6 +1006,84 @@ func (f *fakeTimeBudget) takeAvailable() time.Duration {

		func (f *fakeTimeBudget) returnUnused(_ time.Duration) {}

		func TestStartingResourceVersion(t *testing.T) {

		// When using the official `timeBudgetImpl` we were observing flakiness
		// due under the following conditions:

Fix TestStartingResourceVersion flakiness #96662

Fix TestStartingResourceVersion flakiness #96662

Conversation

wojtek-t commented Nov 18, 2020

k8s-ci-robot commented Nov 18, 2020

wojtek-t commented Nov 18, 2020

wojtek-t commented Nov 18, 2020

jpbetz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Nov 19, 2020

jpbetz commented Nov 19, 2020

jeremyrickard commented Nov 19, 2020

jeremyrickard commented Nov 19, 2020

fedebongio commented Nov 19, 2020

jeremyrickard commented Nov 19, 2020

fejta-bot commented Nov 20, 2020