Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize unit test to prevent flaky failures #2394

Merged
merged 1 commit into from Jun 17, 2022

Conversation

mboersma
Copy link
Contributor

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

The parallelized test TestAzureMachinePool_Validate calls SetFeatureGateDuringTest which doesn't appear to be goroutine-safe. Removing t.Parallel() makes it reliable and does not affect the time the test takes to run.

Which issue(s) this PR fixes:

Fixes #2391

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 16, 2022
@@ -134,7 +134,7 @@ func TestAzureMachinePool_Validate(t *testing.T) {
for _, c := range cases {
c := c
t.Run(c.Name, func(t *testing.T) {
t.Parallel()
// Don't add t.Parallel() here or the test will fail.
// NOTE: AzureMachinePool is behind MachinePool feature gate flag; the web hook
// must prevent creating new objects in case the feature flag is disabled.
defer utilfeature.SetFeatureGateDuringTest(t, feature.Gates, capifeature.MachinePool, true)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this defer be run before the test cases and not in the loop so it is set for all following runs? then removed when this set of cases are done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that (the pattern used in azuremachinepool_webhook_test.go) but it still fails with t.Parallel() in the loop. So this PR represents the smallest change needed.

I think it is telling that none of our other uses of defer utilfeature.SetFeatureGateDuringTest use t.Parallel(), nor do the related unit tests in kubernetes/component-base.

@jackfrancis
Copy link
Contributor

jackfrancis commented Jun 17, 2022

This repro'd for me today:

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/2397/pull-cluster-api-provider-azure-test/1537781533702623232

It's not clear (to me) why we didn't see this non-determinism UT outcome earlier...

@CecileRobertMichon
Copy link
Contributor

This was introduced in #2376 which just merged recently (the test didn't fail on the PR CI itself so it wasn't caught)

Copy link
Contributor

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2022
@k8s-ci-robot k8s-ci-robot merged commit b4783d4 into kubernetes-sigs:main Jun 17, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Jun 17, 2022
@mboersma mboersma deleted the fix-flaky-test branch June 17, 2022 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestAzureMachinePool_Validate is flaky
5 participants