Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TestStartingResourceVersion flakiness #96662

Merged
merged 1 commit into from Nov 20, 2020

Conversation

wojtek-t
Copy link
Member

Fix #96649

NONE

/kind flake
/priority important-soon
/milestone v1.20

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 18, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Nov 18, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 18, 2020
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 18, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2020
@wojtek-t
Copy link
Member Author

/retest

1 similar comment
@wojtek-t
Copy link
Member Author

/retest

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks right, just a couple nits around how the test is described in the comments.

@@ -1006,6 +1006,84 @@ func (f *fakeTimeBudget) takeAvailable() time.Duration {

func (f *fakeTimeBudget) returnUnused(_ time.Duration) {}

func TestStartingResourceVersion(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Moving this function makes for a diff that is difficult to review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - but in order to fake dispatchBudget (which is private field) it has to be here :(

Comment on lines 1021 to 1022
// When using the official `timeBudgetImpl` we were observing flakiness
// due under the following conditions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Took me a while to understand what this meant. Maybe explain what the current test does instead of what the PR change? I.e. "Use a fake timeBudget to prevent this test from flaking under the following conditions:"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - the comment was copy-pasted from other test and wasn't accurate for this test - fixed that too.

// 3) if the test was cpu-starved and we weren't able to consume events
// from w2 ResultCh it could have happened that its buffer was also
// filling in and given we no longer had timeBudget (consumed in (1))
// trying to put next item was simply breaking the watch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify why this works okay in production but not in test? It's not obvious to me from reading this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may happen and watch can be resumed by the client. But this won't test what we want to test.
Added a comment.

Copy link
Member Author

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz - thanks for comments; PTAL

@@ -1006,6 +1006,84 @@ func (f *fakeTimeBudget) takeAvailable() time.Duration {

func (f *fakeTimeBudget) returnUnused(_ time.Duration) {}

func TestStartingResourceVersion(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - but in order to fake dispatchBudget (which is private field) it has to be here :(

Comment on lines 1021 to 1022
// When using the official `timeBudgetImpl` we were observing flakiness
// due under the following conditions:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 1021 to 1022
// When using the official `timeBudgetImpl` we were observing flakiness
// due under the following conditions:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - the comment was copy-pasted from other test and wasn't accurate for this test - fixed that too.

// 3) if the test was cpu-starved and we weren't able to consume events
// from w2 ResultCh it could have happened that its buffer was also
// filling in and given we no longer had timeBudget (consumed in (1))
// trying to put next item was simply breaking the watch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may happen and watch can be resumed by the client. But this won't test what we want to test.
Added a comment.

@wojtek-t
Copy link
Member Author

/retest

@jpbetz
Copy link
Contributor

jpbetz commented Nov 19, 2020

/lgtm

Thanks for the flake fix!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 19, 2020
@jeremyrickard
Copy link
Contributor

/test pull-kubernetes-e2e-kind-ipv6

1 similar comment
@jeremyrickard
Copy link
Contributor

/test pull-kubernetes-e2e-kind-ipv6

@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 19, 2020
@jeremyrickard
Copy link
Contributor

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 06b0179 into kubernetes:master Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky test] TestStartingResourceVersion
7 participants