Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always consider spot instance node readiness in cluster validation #8159

Merged
merged 1 commit into from
Dec 27, 2019

Conversation

johngmyers
Copy link
Member

/kind bug

When performing cluster validation on a cluster with spot instance nodes (nodes annotated with node-role.kubernetes.io/spot-worker: "true"), kops will ignore the node readiness roughly 50% of the time, instead printing the message:

W1219 06:35:47.885028    1605 validate_cluster.go:291] ignoring node with role "spot-worker"

This means that cluster validation can randomly incorrectly succeed.

This change has cluster validation determine the node's role using the instancegroup spec, not an annotation chosen depending on the randomization of Go map ordering.

Fixes #5038

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 19, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @johngmyers. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 19, 2019
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 19, 2019
@gjtempleton
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 19, 2019
@@ -267,7 +266,7 @@ func (v *ValidationCluster) validateNodes(cloudGroups map[string]*cloudinstances
continue
}

role := util.GetNodeRole(node)
role := strings.ToLower(string(cloudGroup.InstanceGroup.Spec.Role))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than performing this ToLower could we not change the if n.Role == "..." comparisons lower down to something like if cloudGroup.InstanceGroup.Spec.Role == kops.InstanceGroupRoleMaster instead as we already do with bastion role checks on line 254?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value gets returned to the caller in the ValidationNode struct (and the caller makes it visible to the user), so has to be lowercased in order to not do an API break.

@johngmyers
Copy link
Member Author

/test pull-kops-verify-staticcheck

@rifelpet
Copy link
Member

Thanks @johngmyers
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 26, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers, rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 26, 2019
@johngmyers
Copy link
Member Author

/test pull-kops-e2e-kubernetes-aws

@gjtempleton
Copy link
Member

/lgtm

/test pull-kops-e2e-kubernetes-aws

@johngmyers
Copy link
Member Author

/test pull-kops-e2e-kubernetes-aws

1 similar comment
@rifelpet
Copy link
Member

/test pull-kops-e2e-kubernetes-aws

@k8s-ci-robot k8s-ci-robot merged commit 8436a3e into kubernetes:master Dec 27, 2019
@johngmyers johngmyers deleted the validate-spot branch December 27, 2019 00:40
k8s-ci-robot added a commit that referenced this pull request Feb 25, 2020
#8039-#8159-#8600-upstream-release-1.17

Automated cherry pick of #7925: Extract the list of instance groups earlier in validation #8039: Vendor github.com/stretchr/testify/require #8159: Determine node role from instancegroup spec #8600: Fail cluster validation if a master missing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

validate_cluster ignoring node with role "spot-worker"
4 participants