Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rearrange parallelism in AKS machine pool e2e tests #4874

Merged
merged 1 commit into from
May 24, 2024

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented May 22, 2024

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

I have been seeing some flakes in the e2e tests for AKS machine pool scaling recently and it's hard to know exactly what's going wrong because of the way Ginkgo handles how these tests are parallelized (I think).

The tests are currently structured like this:

for each machine pool in parallel:
  scale up
wait

for each machine pool in parallel:
  scale down
wait

for each machine pool in parallel:
  scale to zero
wait

for each machine pool in parallel:
  scale to where we started
wait

It seems when Ginkgo sees a failure in any of these steps, the test continues to the next step anyway, which can cause some misleading error messages and the collected artifacts don't reflect the state of the resources at the time the test failed.

This PR reformulates the tests to work like this:

for each machine pool in parallel:
  scale up
  scale down
  scale to zero
  scale to where we started
wait

This way, Ginkgo does short-circuit when any of the intermediate steps fail because all the steps are done in the same goroutine for an individual MachinePool.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  • cherry-pick candidate <-- for improved signal on release branches.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 22, 2024
@@ -47,109 +47,92 @@ func AKSMachinePoolSpec(ctx context.Context, inputGetter func() AKSMachinePoolSp
input := inputGetter()
var wg sync.WaitGroup

originalReplicas := map[types.NamespacedName]int32{}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes in this file are indentation, so I'd suggest ticking the "hide whitespace" checkbox.

Copy link

codecov bot commented May 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.03%. Comparing base (76240e7) to head (1da8b2c).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4874      +/-   ##
==========================================
+ Coverage   62.01%   62.03%   +0.02%     
==========================================
  Files         201      201              
  Lines       16860    16878      +18     
==========================================
+ Hits        10455    10470      +15     
- Misses       5622     5625       +3     
  Partials      783      783              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 23, 2024

/cherry-pick release-1.15

Skipping 1.14 because I haven't seen the flakes there and this will probably not cherry pick cleanly and I'm lazy.

@k8s-infra-cherrypick-robot

@nojnhuh: once the present PR merges, I will cherry-pick it on top of release-1.15 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.15

Skipping 1.14 because I haven't seen the flakes there and this will probably not cherry pick cleanly and I'm lazy.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 23, 2024

AKS job failure looks like the same flake I've been seeing recently and these changes definitely helped narrow it down! I have a fix for that open in #4875.

In the meantime,
/retest

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 544128ceca1d553fecf79d1dee8299175a28e3b0

@mboersma mboersma added this to the v1.16 milestone May 23, 2024
@jackfrancis
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2024
@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 24, 2024

/retest

@k8s-ci-robot k8s-ci-robot merged commit 4b436c1 into kubernetes-sigs:main May 24, 2024
19 checks passed
@k8s-infra-cherrypick-robot

@nojnhuh: new pull request created: #4877

In response to this:

/cherry-pick release-1.15

Skipping 1.14 because I haven't seen the flakes there and this will probably not cherry pick cleanly and I'm lazy.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nojnhuh nojnhuh deleted the aks-mp-e2e branch May 24, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants