-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ScaleDownNodeProcessor #2233
Conversation
/assign @MaciekPytel |
@@ -0,0 +1,75 @@ | |||
/* | |||
Copyright 2016 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2019?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
limitations under the License. | ||
*/ | ||
|
||
package core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should live under processors/, not core/.
We made an exception for filterOutSchedulablePodListProcessor, but IIRC it was only because it needed stuff from core/ and it was the easiest way of dealing with cyclic dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
) | ||
|
||
// DefaultScaleDownNodeProcessor filters out scale down candidates from nodegroup with | ||
// size <= minimum number of nodes for that nodegroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only half of what it does - it also filters out node from non-autoscaled nodegroups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// DefaultScaleDownNodeProcessor filters out scale down candidates from nodegroup with | ||
// size <= minimum number of nodes for that nodegroup | ||
type DefaultScaleDownNodeProcessor struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a more descriptive name? Maybe something along the lines of PreFilteringScaleDownNodeProcessor?
Other default processors have such names, example: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/status/scale_up_status_processor.go#L79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
apiv1 "k8s.io/api/core/v1" | ||
|
||
testprovider "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/test" | ||
. "k8s.io/autoscaler/cluster-autoscaler/utils/test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sort out the import order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cluster-autoscaler/main.go
Outdated
@@ -292,6 +292,7 @@ func buildAutoscaler() (core.Autoscaler, error) { | |||
processors.NodeGroupSetProcessor = &nodegroupset.BalancingNodeGroupSetProcessor{ | |||
Comparator: nodegroupset.IsAzureNodeInfoSimilar} | |||
} | |||
processors.ScaleDownNodeProcessor = core.NewDefaultScaleDownNodeProcessor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that happen in ca_processors.DefaultProcessors()? It does for all other processors..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was put in to avoid cyclic dependency. Now that the processor is not in core, I moved this.
type ScaleDownNodeProcessor interface { | ||
// GetHarborCandidates returns nodes that potentially could harbor the pods that would become | ||
// unscheduled after a scale down. | ||
GetHarborCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this name too much - it's not intuitive at all to me what this does. I'd prefer something like GetPodDestinationCandidates().
It may be just me though - let's get another opinion.
@aleksandra-malinowska @krzysztof-jastrzebski - WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not great with names. Also, I don't know if there can be a very intuitive name for the concept. Changed to GetPodDestinationCandidates for now
GetHarborCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError) | ||
// GetScaleDownCandidates returns nodes that potentially could be scaled down. | ||
GetScaleDownCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError) | ||
// Reset resets the properties if ScaleDownNodeProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sentence doesn't parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
potentiallyUnneeded := getPotentiallyUnneededNodes(autoscalingContext, allNodes) | ||
|
||
var scaleDownCandidates []*apiv1.Node | ||
var harborNodes []*apiv1.Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we no longer call them 'harborCandidates' in method name we shouldn't do it in variable name either (it's very confusing without historical context).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -702,9 +706,18 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p | |||
return scaleDownStatus, errors.ToAutoscalerError(errors.CloudProviderError, errCP) | |||
} | |||
|
|||
var tempNodes []*apiv1.Node | |||
if scaleDownNodeProcessor != nil { | |||
tempNodes, err := scaleDownNodeProcessor.GetTemporaryNodes(nodesWithoutMaster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have tempNodes list in static_autoscaler (you pass it to UpdateUnneededNodes). Why not pass this as a parameter to TryToScaleDown, rather than call the processor again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Somehow missed this
if err != nil { | ||
klog.Errorf("Error filtering out temporary nodes: %v", err) | ||
} | ||
nodesWithoutMaster = filterOutTemporaryNodes(nodesWithoutMaster, tempNodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just filterOutNodes? It's a generic set operation on nodes. I wonder if there is one already somewhere that we can just reuse? If not I'd still rather put it in utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find a generic one. Added in utils
@@ -746,7 +759,8 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p | |||
continue | |||
} | |||
|
|||
if size-sd.nodeDeletionTracker.GetDeletionsInProgress(nodeGroup.Id()) <= nodeGroup.MinSize() { | |||
deletionsInProgress := sd.nodeDeletionTracker.GetDeletionsInProgress(nodeGroup.Id()) | |||
if size-deletionsInProgress-len(tempNodes) <= nodeGroup.MinSize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to subtract tempNodes in this NodeGroup, not all temp nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,98 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you won't like this question, but do we even need this mock if you pass temporary nodes list (and not whole processor) to TryToScaleDown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,39 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention is to use types.go for API definition. I find it confusing to name this file like that and not have an API in there - can you rename to scale_down_node_processor.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types.go is used for the basic API types of a package. ScaleDownNodeProcessor would belong in this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it? I tend to think of it as an external API and not just public types in package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some files than do do this in k8s, One of which: kubernetes/pkg/util/ipset/types.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I still think it's inconsistent with the other processors, but maybe it's just me.
hack/verify-golint.sh
Outdated
'/vendor/' | ||
'vertical-pod-autoscaler/pkg/client' | ||
'cluster-autoscaler/cloudprovider/magnum/gophercloud' | ||
'cluster-autoscaler/processors/nodes/mock' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if you remove mocks, let's keep other changes to this script. A bit of a drive-by fix I guess, but a welcome one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept this
@@ -399,22 +400,23 @@ func (sd *ScaleDown) CleanUpUnneededNodes() { | |||
// node utilization level. Timestamp is the current timestamp. The computations are made only for the nodes | |||
// managed by CA. | |||
func (sd *ScaleDown) UpdateUnneededNodes( | |||
nodes []*apiv1.Node, | |||
nodesToCheck []*apiv1.Node, | |||
podDestinations []*apiv1.Node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
destinationNodes? This name makes me think of pod->node map (similar to podLocationHints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
utilizationMap := make(map[string]simulator.UtilizationInfo) | ||
|
||
sd.updateUnremovableNodes(nodes) | ||
sd.updateUnremovableNodes(podDestinations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely shouldn't take podDestinations. It updates the list of nodes that were recently checked as candidates and rejected (which is used to avoid re-checking all nodes every loop). This has nothing to do with potential destinations, it's about candidates.
That being said I wonder if passing allNodes wouldn't be a more correct thing here (just because a node is not a candidate now doesn't mean it wasn't one recently). I guess chances of pod being candidate, not being candidate and being candidate again very quickly are slim enough that maybe it is ok to pass scaleDownCandidates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -496,7 +498,7 @@ func (sd *ScaleDown) UpdateUnneededNodes( | |||
additionalCandidatesCount = len(currentNonCandidates) | |||
} | |||
// Limit the additional candidates pool size for better performance. | |||
additionalCandidatesPoolSize := int(math.Ceil(float64(len(nodes)) * sd.context.ScaleDownCandidatesPoolRatio)) | |||
additionalCandidatesPoolSize := int(math.Ceil(float64(len(podDestinations)) * sd.context.ScaleDownCandidatesPoolRatio)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be based on podDestinations either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,39 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it? I tend to think of it as an external API and not just public types in package.
@@ -731,10 +737,6 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p | |||
} | |||
|
|||
nodeGroup, err := sd.context.CloudProvider.NodeGroupForNode(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove check for error? Seems incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifact of the previous cache. Added back
. "k8s.io/autoscaler/cluster-autoscaler/utils/test" | ||
) | ||
|
||
func TestPreFilteringScaleDownNodeProcessor_GetHarborCandidateNodes(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/GetHarbor/GetPodDestination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,39 @@ | |||
/* | |||
Copyright 2019 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I still think it's inconsistent with the other processors, but maybe it's just me.
This is lgtm once you add back what seems to be an accidentally deleted error check. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
No description provided.