Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1891551: Ensure the node template include up to date and informative labels #178

Merged
merged 3 commits into from Nov 13, 2020

Conversation

JoelSpeed
Copy link

This should resolve issues where nodegroups are scaling from zero and need to schedule pods based on some well known labels that are applied to nodes.

We have discovered that if there are no healthy nodes within a node group (eg they are all cordoned) then this also counts as scaling from zero in the eyes of the autoscaler, so there is actually a chance we can copy labels from a node that exists, hence this has also been added.

There are a few options we could consider in this solution:

  • Copying all labels from the existing node and not using the machinset/generic labels in that case
  • Having the nodegroup store a copy of valid node labels from a node that it has seen in the past to improve the scale from zero experience when there genuinely are no nodes available to get labels from

I will add unit tests to this before we merge

@JoelSpeed JoelSpeed changed the title Ensure the node template include up to date and informative labels Bug 1891551: Ensure the node template include up to date and informative labels Oct 28, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Oct 28, 2020
@openshift-ci-robot
Copy link

@JoelSpeed: This pull request references Bugzilla bug 1891551, which is invalid:

  • expected the bug to target the "4.7.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1891551: Ensure the node template include up to date and informative labels

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed
Copy link
Author

/bugzilla refresh

@openshift-ci-robot
Copy link

@JoelSpeed: An error was encountered adding this pull request to the external tracker bugs for bug 1891551 on the Bugzilla server at https://bugzilla.redhat.com:

JSONRPC error 32000: There was an error reported for the RPC call to Jira: There was an error reported for a GitHub REST call. URL: https://api.github.com/repos/openshift/kubernetes-autoscaler/pulls/178 Error: 403 Forbidden at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 111. at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 111. eval {...} called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 98 Bugzilla::Extension::ExternalBugs::Type::GitHub::_do_rest_call('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x559eb2...', 'https://api.github.com/repos/openshift/kubernetes-autoscaler/...', 'GET') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 62 Bugzilla::Extension::ExternalBugs::Type::GitHub::get_data('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x559eb2...', 'Bugzilla::Extension::ExternalBugs::Bug=HASH(0x559eb30252f0)') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 eval {...} called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x559eb30252f0)', 1) called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Bug.pm line 125 Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x559eb2a29930)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 940 Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x559eb328d8f8)', 'HASH(0x559eb3b9bfe8)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21 Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x559eb3b9bfe8)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1173 Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x559eb352b0b0)') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/WebService.pm line 88 Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x559ea5991f98)') called at (eval 2683) line 1 eval ' $procedure->{code}->($self, @params) ;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220 JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x559eb29b3528)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 297 Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x559eb29b3528)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126 JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70 Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31 ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x559eb2a5bf00)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x559eb3ba1b30)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173 ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x559eb3ba1b30)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32 ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x559eb2a5bf00)') called at /var/www/html/bugzilla/mod_perl.pl line 139 Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x559eb2a5bf00)') called at (eval 2683) line 0 eval {...} called at (eval 2683) line 0 at /var/www/html/bugzilla/Bugzilla/Error.pm line 130. Bugzilla::Error::_throw_error('global/user-error.html.tmpl', 'ext_bz_rest_error', 'HASH(0x559eb3914718)') called at /var/www/html/bugzilla/Bugzilla/Error.pm line 193 Bugzilla::Error::ThrowUserError('ext_bz_rest_error', 'HASH(0x559eb3914718)') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 120 Bugzilla::Extension::ExternalBugs::Type::GitHub::_do_rest_call('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x559eb2...', 'https://api.github.com/repos/openshift/kubernetes-autoscaler/...', 'GET') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 62 Bugzilla::Extension::ExternalBugs::Type::GitHub::get_data('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x559eb2...', 'Bugzilla::Extension::ExternalBugs::Bug=HASH(0x559eb30252f0)') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 eval {...} called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x559eb30252f0)', 1) called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/Bug.pm line 125 Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x559eb2a29930)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 940 Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x559eb328d8f8)', 'HASH(0x559eb3b9bfe8)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21 Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x559eb3b9bfe8)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1173 Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x559eb352b0b0)') called at /loader/0x559ea4273e00/Bugzilla/Extension/ExternalBugs/WebService.pm line 88 Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x559ea5991f98)') called at (eval 2683) line 1 eval ' $procedure->{code}->($self, @params) ;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220 JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x559eb29b3528)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 297 Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x559eb29b3528)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126 JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70 Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31 ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x559eb2a5bf00)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x559eb3ba1b30)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173 ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x559eb3ba1b30)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32 ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x559eb2a5bf00)') called at /var/www/html/bugzilla/mod_perl.pl line 139 Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x559eb2a5bf00)') called at (eval 2683) line 0 eval {...} called at (eval 2683) line 0
Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me, i just have a question about how we are building the labels for the node group template.

@@ -140,7 +140,7 @@ func newMachineSetScalableResource(controller *machineController, machineSet *Ma
}

func (r machineSetScalableResource) Labels() map[string]string {
return r.machineSet.Spec.Template.Spec.Labels
return r.machineSet.Spec.Template.Spec.ObjectMeta.Labels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change just to make it more explicit or is there a programmatic reason to this?

mind you, i don't have an objection i'm just curious =)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be more explicit, though it will be removed in your refactor anyways!

if node != nil {
labels = cloudprovider.JoinStringMaps(labels, extractNodeLabels(node))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm slightly concerned about this clause because it seems like it would be possible for a user to add a label to a node which is not on the machineset, and then we would return that as part of the labels for the node group. this could lead to situations where the autoscaler simulator is evaluating labels that do not exist on the machineset. i'm wondering if this might have implications for how things like --balance-similar-node-groups could work in edge cases.

i do see that the extractNodeLables does not copy all labels, so this might be more of a theoretical question.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extractNodeLabels deliberately only copies a number of well known labels deliberately so that users can't just roll their own random labels on a node, I don't think we'd be able to reliably copy random labels and guarantee the scaled up node would contain those labels. I think with the current implementation we should be safe.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

@JoelSpeed
Copy link
Author

/bugzlila refresh

@elmiko
Copy link

elmiko commented Oct 30, 2020

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2020
@elmiko
Copy link

elmiko commented Oct 30, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link

@elmiko: This pull request references Bugzilla bug 1891551, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Oct 30, 2020
@JoelSpeed
Copy link
Author

/retest

Copy link

@Danil-Grigorev Danil-Grigorev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a minor note to tests.

}

for resource, quantity := range nodeInfo.Node().Status.Allocatable {
expectedCapacity, ok := config.expectedCapacity[resource]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we shoud test that all expectedCapacity labels are included on the Node resource as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think nodes got labelled with capacity? I thought that was machinesets no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, those are the annotations on the node group

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean whatever the check is, it is not checking anything if the nodeInfo.Node().Status.Allocatable contains no values from the map. Is this expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean, I thought I had a check on here to check the lengths of the two were the same, let me add that, good catch!

}

for resource, quantity := range nodeInfo.Node().Status.Capacity {
expectedCapacity, ok := config.expectedCapacity[resource]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we shoud test that all expectedCapacity labels are included on the Node resource as well.

@Danil-Grigorev
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2020
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 86bccb4 into openshift:master Nov 13, 2020
@openshift-ci-robot
Copy link

@JoelSpeed: All pull requests linked via external trackers have merged:

Bugzilla bug 1891551 has been moved to the MODIFIED state.

In response to this:

Bug 1891551: Ensure the node template include up to date and informative labels

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed JoelSpeed deleted the nodegroup-labels branch November 13, 2020 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants