Adding ScaleDownNodeProcessor #2233

vivekbagade · 2019-08-02T14:47:05Z

No description provided.

vivekbagade · 2019-08-02T14:47:20Z

MaciekPytel · 2019-08-02T15:02:11Z

cluster-autoscaler/core/scale_down_node_processor.go

@@ -0,0 +1,75 @@
+/*
+Copyright 2016 The Kubernetes Authors.


MaciekPytel · 2019-08-02T15:06:19Z

cluster-autoscaler/core/scale_down_node_processor.go

+limitations under the License.
+*/
+
+package core


I think this should live under processors/, not core/.
We made an exception for filterOutSchedulablePodListProcessor, but IIRC it was only because it needed stuff from core/ and it was the easiest way of dealing with cyclic dependencies.

MaciekPytel · 2019-08-02T15:11:45Z

cluster-autoscaler/core/scale_down_node_processor.go

+)
+
+// DefaultScaleDownNodeProcessor filters out scale down candidates from nodegroup with
+// size <= minimum number of nodes for that nodegroup


That's only half of what it does - it also filters out node from non-autoscaled nodegroups

MaciekPytel · 2019-08-02T15:12:31Z

cluster-autoscaler/core/scale_down_node_processor.go

+
+// DefaultScaleDownNodeProcessor filters out scale down candidates from nodegroup with
+// size <= minimum number of nodes for that nodegroup
+type DefaultScaleDownNodeProcessor struct {


Can we use a more descriptive name? Maybe something along the lines of PreFilteringScaleDownNodeProcessor?

Other default processors have such names, example: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/status/scale_up_status_processor.go#L79

MaciekPytel · 2019-08-02T15:17:58Z

cluster-autoscaler/core/scale_down_node_processor_test.go

+	apiv1 "k8s.io/api/core/v1"
+
+	testprovider "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/test"
+	. "k8s.io/autoscaler/cluster-autoscaler/utils/test"


Please sort out the import order.

cluster-autoscaler/core/static_autoscaler.go

MaciekPytel · 2019-08-02T15:22:01Z

cluster-autoscaler/main.go

@@ -292,6 +292,7 @@ func buildAutoscaler() (core.Autoscaler, error) {
 		processors.NodeGroupSetProcessor = &nodegroupset.BalancingNodeGroupSetProcessor{
 			Comparator: nodegroupset.IsAzureNodeInfoSimilar}
 	}
+	processors.ScaleDownNodeProcessor = core.NewDefaultScaleDownNodeProcessor()


Shouldn't that happen in ca_processors.DefaultProcessors()? It does for all other processors..

Was put in to avoid cyclic dependency. Now that the processor is not in core, I moved this.

MaciekPytel · 2019-08-02T15:26:10Z

cluster-autoscaler/processors/nodes/scale_down_node_processor.go

+type ScaleDownNodeProcessor interface {
+	// GetHarborCandidates returns nodes that potentially could harbor the pods that would become
+	// unscheduled after a scale down.
+	GetHarborCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError)


I don't like this name too much - it's not intuitive at all to me what this does. I'd prefer something like GetPodDestinationCandidates().
It may be just me though - let's get another opinion.
@aleksandra-malinowska @krzysztof-jastrzebski - WDYT?

Not great with names. Also, I don't know if there can be a very intuitive name for the concept. Changed to GetPodDestinationCandidates for now

MaciekPytel · 2019-08-02T15:28:11Z

cluster-autoscaler/processors/nodes/scale_down_node_processor.go

+	GetHarborCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError)
+	// GetScaleDownCandidates returns nodes that potentially could be scaled down.
+	GetScaleDownCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError)
+	// Reset resets the properties if ScaleDownNodeProcessor


this sentence doesn't parse

MaciekPytel · 2019-08-05T13:39:22Z

cluster-autoscaler/core/static_autoscaler.go

-		potentiallyUnneeded := getPotentiallyUnneededNodes(autoscalingContext, allNodes)
+
+		var scaleDownCandidates []*apiv1.Node
+		var harborNodes []*apiv1.Node


If we no longer call them 'harborCandidates' in method name we shouldn't do it in variable name either (it's very confusing without historical context).

MaciekPytel · 2019-08-07T17:01:14Z

cluster-autoscaler/core/scale_down.go

@@ -702,9 +706,18 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p
 		return scaleDownStatus, errors.ToAutoscalerError(errors.CloudProviderError, errCP)
 	}

+	var tempNodes []*apiv1.Node
+	if scaleDownNodeProcessor != nil {
+		tempNodes, err := scaleDownNodeProcessor.GetTemporaryNodes(nodesWithoutMaster)


You already have tempNodes list in static_autoscaler (you pass it to UpdateUnneededNodes). Why not pass this as a parameter to TryToScaleDown, rather than call the processor again?

Done. Somehow missed this

MaciekPytel · 2019-08-07T18:39:28Z

cluster-autoscaler/core/scale_down.go

+		if err != nil {
+			klog.Errorf("Error filtering out temporary nodes: %v", err)
+		}
+		nodesWithoutMaster = filterOutTemporaryNodes(nodesWithoutMaster, tempNodes)


Just filterOutNodes? It's a generic set operation on nodes. I wonder if there is one already somewhere that we can just reuse? If not I'd still rather put it in utils.

Couldn't find a generic one. Added in utils

MaciekPytel · 2019-08-07T18:44:16Z

cluster-autoscaler/core/scale_down.go

@@ -746,7 +759,8 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p
 				continue
 			}

-			if size-sd.nodeDeletionTracker.GetDeletionsInProgress(nodeGroup.Id()) <= nodeGroup.MinSize() {
+			deletionsInProgress := sd.nodeDeletionTracker.GetDeletionsInProgress(nodeGroup.Id())
+			if size-deletionsInProgress-len(tempNodes) <= nodeGroup.MinSize() {


You need to subtract tempNodes in this NodeGroup, not all temp nodes.

cluster-autoscaler/core/scale_down_test.go

MaciekPytel · 2019-08-07T18:54:18Z

cluster-autoscaler/processors/nodes/mock/mock_pre_filtering_processor.go

@@ -0,0 +1,98 @@
+/*
+Copyright 2019 The Kubernetes Authors.


I know you won't like this question, but do we even need this mock if you pass temporary nodes list (and not whole processor) to TryToScaleDown?

MaciekPytel · 2019-08-07T18:58:26Z

cluster-autoscaler/processors/nodes/types.go

@@ -0,0 +1,39 @@
+/*
+Copyright 2019 The Kubernetes Authors.


Convention is to use types.go for API definition. I find it confusing to name this file like that and not have an API in there - can you rename to scale_down_node_processor.go?

types.go is used for the basic API types of a package. ScaleDownNodeProcessor would belong in this.

Does it? I tend to think of it as an external API and not just public types in package.

I found some files than do do this in k8s, One of which: kubernetes/pkg/util/ipset/types.go

Fair enough. I still think it's inconsistent with the other processors, but maybe it's just me.

MaciekPytel · 2019-08-07T19:00:00Z

hack/verify-golint.sh

+  '/vendor/'
+  'vertical-pod-autoscaler/pkg/client'
+  'cluster-autoscaler/cloudprovider/magnum/gophercloud'
+  'cluster-autoscaler/processors/nodes/mock'


Even if you remove mocks, let's keep other changes to this script. A bit of a drive-by fix I guess, but a welcome one.

MaciekPytel · 2019-08-09T15:44:02Z

cluster-autoscaler/core/scale_down.go

@@ -399,22 +400,23 @@ func (sd *ScaleDown) CleanUpUnneededNodes() {
 // node utilization level. Timestamp is the current timestamp. The computations are made only for the nodes
 // managed by CA.
 func (sd *ScaleDown) UpdateUnneededNodes(
-	nodes []*apiv1.Node,
-	nodesToCheck []*apiv1.Node,
+	podDestinations []*apiv1.Node,


destinationNodes? This name makes me think of pod->node map (similar to podLocationHints).

MaciekPytel · 2019-08-09T15:46:48Z

cluster-autoscaler/core/scale_down.go

 	utilizationMap := make(map[string]simulator.UtilizationInfo)

-	sd.updateUnremovableNodes(nodes)
+	sd.updateUnremovableNodes(podDestinations)


This definitely shouldn't take podDestinations. It updates the list of nodes that were recently checked as candidates and rejected (which is used to avoid re-checking all nodes every loop). This has nothing to do with potential destinations, it's about candidates.
That being said I wonder if passing allNodes wouldn't be a more correct thing here (just because a node is not a candidate now doesn't mean it wasn't one recently). I guess chances of pod being candidate, not being candidate and being candidate again very quickly are slim enough that maybe it is ok to pass scaleDownCandidates?

MaciekPytel · 2019-08-09T15:49:08Z

cluster-autoscaler/core/scale_down.go

@@ -496,7 +498,7 @@ func (sd *ScaleDown) UpdateUnneededNodes(
 		additionalCandidatesCount = len(currentNonCandidates)
 	}
 	// Limit the additional candidates pool size for better performance.
-	additionalCandidatesPoolSize := int(math.Ceil(float64(len(nodes)) * sd.context.ScaleDownCandidatesPoolRatio))
+	additionalCandidatesPoolSize := int(math.Ceil(float64(len(podDestinations)) * sd.context.ScaleDownCandidatesPoolRatio))


I don't think this should be based on podDestinations either.

MaciekPytel · 2019-08-09T15:59:41Z

cluster-autoscaler/processors/nodes/types.go

@@ -0,0 +1,39 @@
+/*
+Copyright 2019 The Kubernetes Authors.


Does it? I tend to think of it as an external API and not just public types in package.

cluster-autoscaler/core/scale_down.go

MaciekPytel · 2019-08-12T16:52:16Z

cluster-autoscaler/core/scale_down.go

@@ -731,10 +737,6 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p
 			}

 			nodeGroup, err := sd.context.CloudProvider.NodeGroupForNode(node)


Why remove check for error? Seems incorrect.

Artifact of the previous cache. Added back

MaciekPytel · 2019-08-12T17:19:30Z

cluster-autoscaler/processors/nodes/pre_filtering_processor_test.go

+	. "k8s.io/autoscaler/cluster-autoscaler/utils/test"
+)
+
+func TestPreFilteringScaleDownNodeProcessor_GetHarborCandidateNodes(t *testing.T) {


s/GetHarbor/GetPodDestination

MaciekPytel · 2019-08-12T17:23:12Z

cluster-autoscaler/processors/nodes/types.go

@@ -0,0 +1,39 @@
+/*
+Copyright 2019 The Kubernetes Authors.


Fair enough. I still think it's inconsistent with the other processors, but maybe it's just me.

MaciekPytel · 2019-08-12T17:28:24Z

This is lgtm once you add back what seems to be an accidentally deleted error check.
Left some other comments, but those are super-minor. Feel free to ignore them if you disagree.

/approve

k8s-ci-robot · 2019-08-12T17:28:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]
~~hack/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MaciekPytel · 2019-08-19T10:58:10Z

/lgtm

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 2, 2019

k8s-ci-robot assigned MaciekPytel Aug 2, 2019

k8s-ci-robot requested review from Jeffwan and piosz August 2, 2019 14:47

MaciekPytel reviewed Aug 2, 2019

View reviewed changes

MaciekPytel reviewed Aug 7, 2019

View reviewed changes

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 9, 2019

MaciekPytel reviewed Aug 9, 2019

View reviewed changes

MaciekPytel reviewed Aug 12, 2019

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2019

Adding ScaleDownNodeProcessor

dc64d0a

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2019

k8s-ci-robot merged commit 3f0a5fa into kubernetes:master Aug 19, 2019

vivekbagade mentioned this pull request Aug 19, 2019

REQUEST: New membership for vivekbagade kubernetes/org#1117

Closed

6 tasks

danielmellado mentioned this pull request Oct 24, 2019

Rebase to upstream/cluster-autoscaler-release-1.16 openshift/kubernetes-autoscaler#119

Closed

		@@ -731,10 +737,6 @@ func (sd ScaleDown) TryToScaleDown(allNodes []apiv1.Node, pods []*apiv1.Pod, p
		}

		nodeGroup, err := sd.context.CloudProvider.NodeGroupForNode(node)

Adding ScaleDownNodeProcessor #2233

Adding ScaleDownNodeProcessor #2233

Conversation

vivekbagade commented Aug 2, 2019

vivekbagade commented Aug 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vivekbagade Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaciekPytel commented Aug 12, 2019

k8s-ci-robot commented Aug 12, 2019

MaciekPytel commented Aug 19, 2019

vivekbagade Aug 8, 2019 •

edited

Loading