Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not include customData in AzureMachinePool hash calculation #1197

Merged
merged 1 commit into from Mar 3, 2021

Conversation

devigned
Copy link
Contributor

@devigned devigned commented Feb 24, 2021

What type of PR is this?
/kind bug

What this PR does / why we need it:
AzureMachinePool will occasionally enter into a cycle where the hash tag calculated from the last model update will be different than the hash calculated for a newly reconciled AzureMachinePool since the bootstrap data secret has been updated since the previous reconcile causing the customData of the VMSS to change, thus changing the hash for the entire model.

This PR removes the customData from the hash calculation and increases test controller log verbosity.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related: #1190, #1186, #1182

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Do not include VMSS customData in the hash tag calculation to correct a reconcile cycle caused when the bootstrap token refreshes.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 24, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 24, 2021
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 24, 2021
@devigned
Copy link
Contributor Author

Pass 1, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@jsturtevant
Copy link
Contributor

jsturtevant commented Feb 24, 2021

Pass1 was successful

Pass 2, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 2 was successful

Pass 3, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@jsturtevant
Copy link
Contributor

Pass 3 was successful

Pass 4, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 4 was successful

Pass 5, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 5 was partially successful with Windows failing with an unrelated error:

make[1]: *** [Makefile:249: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.19.2] Error 56

Pass 6, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 6 was partially successful with Windows with VMSS error. Rebased on master to get the latest from v1alpha4 and hopefully cluster data from test clusters that failed to come up.

@devigned
Copy link
Contributor Author

Pass 7 was successful

Pass 8, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 8 was successful

Pass 9, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@jsturtevant
Copy link
Contributor

Pass 9 failed with

e[1]: *** [Makefile:249: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/hack/tools/bin/kubectl-v1.19.2] Error 56

again.

Looking that looks like it is related to curl having a network issue. created #1202

Pass 10, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

/retest

@devigned
Copy link
Contributor Author

Pass 11, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 11 success

Pass 12, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

Pass 12 success

Pass 13, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

devigned commented Mar 2, 2021

Pass 18 success.

Pass 19, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

devigned commented Mar 2, 2021

Pass 19 Fail; IPv6

Pass 20, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

devigned commented Mar 2, 2021

Pass 20 success

Pass 21, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

devigned commented Mar 3, 2021

Pass 21 success

Pass 22, retry
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-windows

@devigned
Copy link
Contributor Author

devigned commented Mar 3, 2021

Pass 22 success

@nader-ziada, @CecileRobertMichon I think this is looking pretty solid. wdyt?

@nader-ziada
Copy link
Contributor

looks good to go! passed most of the times

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2021
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2021
@@ -24,6 +24,7 @@ spec:
- --leader-elect
- "--metrics-bind-addr=127.0.0.1:8080"
- "--feature-gates=MachinePool=${EXP_MACHINE_POOL:=false},AKS=${EXP_AKS:=false}"
- "--v=2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this checked in by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased verbosity of the test logging. To make it easier, I add an explicit verbosity to the controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't 1 the default though? if so this increases verbosity for all default clusterctl deployments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the default is 0. I did not intend on increasing the default. I'll change it.

@@ -75,6 +75,9 @@ providers:
targetName: "cluster-template-windows.yaml"
- sourcePath: "${PWD}/templates/test/cluster-template-prow-machine-pool-windows.yaml"
targetName: "cluster-template-machine-pool-windows.yaml"
replacements:
- old: "--v=2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- old: "--v=2"
- old: "--v=0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!! Fixed

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

nothing was changed in the last few commits that should affect the stability of the fix (added a code comment and changed default verbosity) so we're still good to merge this

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants