-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform remote service filtering before selecting nodes when scaling in cluster #539
Conversation
d0e3e3f
to
3b7ee57
Compare
…ng actions when scaling in
3b7ee57
to
ab8fd7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and a nice improvement. The logging is a great addition as well.
@@ -114,6 +204,12 @@ func (c *ClusterScaleUtils) IdentifyScaleInNodes(cfg map[string]string, num int) | |||
// Filter out the Nomad node ID where this autoscaler instance is running. | |||
filteredNodes = filterOutNodeID(filteredNodes, c.curNodeID) | |||
|
|||
if c.log.IsDebug() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice; this is pretty much my favourite feature of the logging lib.
This PR makes a few changes in the way nodes are selected for cluster scale in action.
Match instances in target before selecting which nodes to remove
When scaling in a cluster, we must find a set of clients that meet two criteria:
node_class
and/ordatacenter
).Selecting which nodes to remove can only be reliably executed once both filters are applied since clients in different remote services can have matching filtering criteria (for example, clients with the same
node_class
but in different AWS ASGs).Currently, the list of nodes that match the filtering criteria is being reduced prematurely to the number of next count, which increases the odds of nodes that don't belong to the remote service being picked. These nodes will cause the scaling action to fail since they won't exist in the remote service target.
This PR refactors
RunPreScaleInTasks
to apply both filters before reducing the pool of selected nodes by the desired amount.This refactoring is done in a new function (
RunPreScaleInTasksWithRemoteCheck
) to not break external plugins and to maintain our SDK backwards compatible. In a future release, a breaking change to rename these functions should be done.Don't skip ineligible nodes
Originally, the Autoscaler would fail a scaling action unless the cluster nodes were stable: no node could be draining or set as ineligible. This was done to prevent multiple scaling actions over the same set of clients from interfering with each other, or for the Autoscaler to override manual actions by operators.
But in real life, this turned out to be a very restrictive requirement, specially if the Autoscaler happens to crash or fail during a scaling action, leaving behind ineligible nodes and never being able to perform a scaling action again.
Being an automated system that is supposed to fully manage the lifecycle of clients, it's expected that the Autoscaler will do what ever is necessary to meet policy requirements. If the Autoscaler actions impact operator manual interventions, those policies must be changed or temporarily disabled.
This PR changes the node filtering logic to not skip ineligible nodes. If a node is ineligible it may not receive more workloads, but it's still present and active in the cluster. If a policy evaluation requires that nodes be removed, they should also be considered.
Check for node readiness instead of scheduling eligibility
The original flow only checked for scheduling eligibility, which may be
true
even if the node isdown
, so the Autoscaler could pick nodes that were already removed, so non-existing in the remote service, but not pruned, still registered in Nomad. This would cause scaling actions to fail repeatedly and indefinitely until those nodes are removed.This PR changes the logic to check for readiness instead.
Log scale in filtering process
There are several steps required to select which nodes, out of all registered in Nomad, should be selected for termination. It's hard to follow which nodes are being considered and which ones have been dropped and why.
This PR adds more log lines at the
DEBUG
level to provide more visibility during this process. Sample output:Maybe they should be at the
TRACE
level since some clusters may have thousands of nodes? I'm not sure, but they are useful for debugging in general.Closes #477 #511