Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ScaleDownNodeProcessor #2233

Merged
merged 1 commit into from
Aug 19, 2019
Merged

Adding ScaleDownNodeProcessor #2233

merged 1 commit into from
Aug 19, 2019

Conversation

vivekbagade
Copy link
Contributor

No description provided.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 2, 2019
@vivekbagade
Copy link
Contributor Author

/assign @MaciekPytel

@@ -0,0 +1,75 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

limitations under the License.
*/

package core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should live under processors/, not core/.
We made an exception for filterOutSchedulablePodListProcessor, but IIRC it was only because it needed stuff from core/ and it was the easiest way of dealing with cyclic dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)

// DefaultScaleDownNodeProcessor filters out scale down candidates from nodegroup with
// size <= minimum number of nodes for that nodegroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only half of what it does - it also filters out node from non-autoscaled nodegroups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// DefaultScaleDownNodeProcessor filters out scale down candidates from nodegroup with
// size <= minimum number of nodes for that nodegroup
type DefaultScaleDownNodeProcessor struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a more descriptive name? Maybe something along the lines of PreFilteringScaleDownNodeProcessor?

Other default processors have such names, example: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/status/scale_up_status_processor.go#L79

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

apiv1 "k8s.io/api/core/v1"

testprovider "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/test"
. "k8s.io/autoscaler/cluster-autoscaler/utils/test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort out the import order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cluster-autoscaler/core/static_autoscaler.go Show resolved Hide resolved
@@ -292,6 +292,7 @@ func buildAutoscaler() (core.Autoscaler, error) {
processors.NodeGroupSetProcessor = &nodegroupset.BalancingNodeGroupSetProcessor{
Comparator: nodegroupset.IsAzureNodeInfoSimilar}
}
processors.ScaleDownNodeProcessor = core.NewDefaultScaleDownNodeProcessor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that happen in ca_processors.DefaultProcessors()? It does for all other processors..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was put in to avoid cyclic dependency. Now that the processor is not in core, I moved this.

type ScaleDownNodeProcessor interface {
// GetHarborCandidates returns nodes that potentially could harbor the pods that would become
// unscheduled after a scale down.
GetHarborCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name too much - it's not intuitive at all to me what this does. I'd prefer something like GetPodDestinationCandidates().
It may be just me though - let's get another opinion.
@aleksandra-malinowska @krzysztof-jastrzebski - WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not great with names. Also, I don't know if there can be a very intuitive name for the concept. Changed to GetPodDestinationCandidates for now

GetHarborCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError)
// GetScaleDownCandidates returns nodes that potentially could be scaled down.
GetScaleDownCandidates(*context.AutoscalingContext, []*apiv1.Node) ([]*apiv1.Node, errors.AutoscalerError)
// Reset resets the properties if ScaleDownNodeProcessor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence doesn't parse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 2, 2019
potentiallyUnneeded := getPotentiallyUnneededNodes(autoscalingContext, allNodes)

var scaleDownCandidates []*apiv1.Node
var harborNodes []*apiv1.Node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we no longer call them 'harborCandidates' in method name we shouldn't do it in variable name either (it's very confusing without historical context).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -702,9 +706,18 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p
return scaleDownStatus, errors.ToAutoscalerError(errors.CloudProviderError, errCP)
}

var tempNodes []*apiv1.Node
if scaleDownNodeProcessor != nil {
tempNodes, err := scaleDownNodeProcessor.GetTemporaryNodes(nodesWithoutMaster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have tempNodes list in static_autoscaler (you pass it to UpdateUnneededNodes). Why not pass this as a parameter to TryToScaleDown, rather than call the processor again?

Copy link
Contributor Author

@vivekbagade vivekbagade Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Somehow missed this

if err != nil {
klog.Errorf("Error filtering out temporary nodes: %v", err)
}
nodesWithoutMaster = filterOutTemporaryNodes(nodesWithoutMaster, tempNodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just filterOutNodes? It's a generic set operation on nodes. I wonder if there is one already somewhere that we can just reuse? If not I'd still rather put it in utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find a generic one. Added in utils

@@ -746,7 +759,8 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p
continue
}

if size-sd.nodeDeletionTracker.GetDeletionsInProgress(nodeGroup.Id()) <= nodeGroup.MinSize() {
deletionsInProgress := sd.nodeDeletionTracker.GetDeletionsInProgress(nodeGroup.Id())
if size-deletionsInProgress-len(tempNodes) <= nodeGroup.MinSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to subtract tempNodes in this NodeGroup, not all temp nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cluster-autoscaler/core/scale_down_test.go Show resolved Hide resolved
@@ -0,0 +1,98 @@
/*
Copyright 2019 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you won't like this question, but do we even need this mock if you pass temporary nodes list (and not whole processor) to TryToScaleDown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,39 @@
/*
Copyright 2019 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention is to use types.go for API definition. I find it confusing to name this file like that and not have an API in there - can you rename to scale_down_node_processor.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types.go is used for the basic API types of a package. ScaleDownNodeProcessor would belong in this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? I tend to think of it as an external API and not just public types in package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some files than do do this in k8s, One of which: kubernetes/pkg/util/ipset/types.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I still think it's inconsistent with the other processors, but maybe it's just me.

'/vendor/'
'vertical-pod-autoscaler/pkg/client'
'cluster-autoscaler/cloudprovider/magnum/gophercloud'
'cluster-autoscaler/processors/nodes/mock'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you remove mocks, let's keep other changes to this script. A bit of a drive-by fix I guess, but a welcome one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept this

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 9, 2019
@@ -399,22 +400,23 @@ func (sd *ScaleDown) CleanUpUnneededNodes() {
// node utilization level. Timestamp is the current timestamp. The computations are made only for the nodes
// managed by CA.
func (sd *ScaleDown) UpdateUnneededNodes(
nodes []*apiv1.Node,
nodesToCheck []*apiv1.Node,
podDestinations []*apiv1.Node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destinationNodes? This name makes me think of pod->node map (similar to podLocationHints).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

utilizationMap := make(map[string]simulator.UtilizationInfo)

sd.updateUnremovableNodes(nodes)
sd.updateUnremovableNodes(podDestinations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely shouldn't take podDestinations. It updates the list of nodes that were recently checked as candidates and rejected (which is used to avoid re-checking all nodes every loop). This has nothing to do with potential destinations, it's about candidates.
That being said I wonder if passing allNodes wouldn't be a more correct thing here (just because a node is not a candidate now doesn't mean it wasn't one recently). I guess chances of pod being candidate, not being candidate and being candidate again very quickly are slim enough that maybe it is ok to pass scaleDownCandidates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -496,7 +498,7 @@ func (sd *ScaleDown) UpdateUnneededNodes(
additionalCandidatesCount = len(currentNonCandidates)
}
// Limit the additional candidates pool size for better performance.
additionalCandidatesPoolSize := int(math.Ceil(float64(len(nodes)) * sd.context.ScaleDownCandidatesPoolRatio))
additionalCandidatesPoolSize := int(math.Ceil(float64(len(podDestinations)) * sd.context.ScaleDownCandidatesPoolRatio))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be based on podDestinations either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,39 @@
/*
Copyright 2019 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? I tend to think of it as an external API and not just public types in package.

cluster-autoscaler/core/scale_down.go Show resolved Hide resolved
@@ -731,10 +737,6 @@ func (sd *ScaleDown) TryToScaleDown(allNodes []*apiv1.Node, pods []*apiv1.Pod, p
}

nodeGroup, err := sd.context.CloudProvider.NodeGroupForNode(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove check for error? Seems incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifact of the previous cache. Added back

. "k8s.io/autoscaler/cluster-autoscaler/utils/test"
)

func TestPreFilteringScaleDownNodeProcessor_GetHarborCandidateNodes(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/GetHarbor/GetPodDestination

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,39 @@
/*
Copyright 2019 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I still think it's inconsistent with the other processors, but maybe it's just me.

@MaciekPytel
Copy link
Contributor

This is lgtm once you add back what seems to be an accidentally deleted error check.
Left some other comments, but those are super-minor. Feel free to ignore them if you disagree.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2019
@MaciekPytel
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2019
@k8s-ci-robot k8s-ci-robot merged commit 3f0a5fa into kubernetes:master Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants