New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1891551: Ensure the node template include up to date and informative labels #178
Bug 1891551: Ensure the node template include up to date and informative labels #178
Conversation
@JoelSpeed: This pull request references Bugzilla bug 1891551, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@JoelSpeed: An error was encountered adding this pull request to the external tracker bugs for bug 1891551 on the Bugzilla server at https://bugzilla.redhat.com:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me, i just have a question about how we are building the labels for the node group template.
@@ -140,7 +140,7 @@ func newMachineSetScalableResource(controller *machineController, machineSet *Ma | |||
} | |||
|
|||
func (r machineSetScalableResource) Labels() map[string]string { | |||
return r.machineSet.Spec.Template.Spec.Labels | |||
return r.machineSet.Spec.Template.Spec.ObjectMeta.Labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change just to make it more explicit or is there a programmatic reason to this?
mind you, i don't have an objection i'm just curious =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be more explicit, though it will be removed in your refactor anyways!
if node != nil { | ||
labels = cloudprovider.JoinStringMaps(labels, extractNodeLabels(node)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm slightly concerned about this clause because it seems like it would be possible for a user to add a label to a node which is not on the machineset, and then we would return that as part of the labels for the node group. this could lead to situations where the autoscaler simulator is evaluating labels that do not exist on the machineset. i'm wondering if this might have implications for how things like --balance-similar-node-groups
could work in edge cases.
i do see that the extractNodeLables
does not copy all labels, so this might be more of a theoretical question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extractNodeLabels
deliberately only copies a number of well known labels deliberately so that users can't just roll their own random labels on a node, I don't think we'd be able to reliably copy random labels and guarantee the scaled up node would contain those labels. I think with the current implementation we should be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks!
d75f538
to
ba53b06
Compare
/bugzlila refresh |
ba53b06
to
12c4cc7
Compare
…s are also included
…ode if one exists
12c4cc7
to
89b16bf
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elmiko The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/bugzilla refresh |
@elmiko: This pull request references Bugzilla bug 1891551, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a minor note to tests.
} | ||
|
||
for resource, quantity := range nodeInfo.Node().Status.Allocatable { | ||
expectedCapacity, ok := config.expectedCapacity[resource] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we shoud test that all expectedCapacity
labels are included on the Node resource as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think nodes got labelled with capacity? I thought that was machinesets no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, those are the annotations on the node group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean whatever the check is, it is not checking anything if the nodeInfo.Node().Status.Allocatable contains no values from the map. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean, I thought I had a check on here to check the lengths of the two were the same, let me add that, good catch!
} | ||
|
||
for resource, quantity := range nodeInfo.Node().Status.Capacity { | ||
expectedCapacity, ok := config.expectedCapacity[resource] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we shoud test that all expectedCapacity
labels are included on the Node resource as well.
89b16bf
to
22838a3
Compare
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
6 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@JoelSpeed: All pull requests linked via external trackers have merged: Bugzilla bug 1891551 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This should resolve issues where nodegroups are scaling from zero and need to schedule pods based on some well known labels that are applied to nodes.
We have discovered that if there are no healthy nodes within a node group (eg they are all cordoned) then this also counts as scaling from zero in the eyes of the autoscaler, so there is actually a chance we can copy labels from a node that exists, hence this has also been added.
There are a few options we could consider in this solution:
I will add unit tests to this before we merge