Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value #5890

Conversation

Bryce-Soghigian
Copy link
Member

@Bryce-Soghigian Bryce-Soghigian commented Jun 26, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

Cluster Autoscaler is in a migrating state where we make two decisions to filter down deletion candidates. We filter candidates in the Planner using the legacy maxEmptyBulkDelete flag, and we also filter down nodes in the cropNodesForDeletion function in the actuator that uses the maxScaleDownParallelism value.

This leads to the following scenario

  1. Neither MaxEmptyBulkDelete or MaxScaleDownParallelism is respected and the minimum of the two is taken, so if we aim to Delete 1000 nodes with MaxEmptyBulkDelete, but MaxScaleDownParallelism is set to 10(its default) then we only delete 10 Nodes, and the 1000 nodes are removed from the unneeded nodes map, causing them to have to wait the full unneeded time again before being canidates for deletion, and then we will only delete 10 of them. This cycle continues 100 more times before we delete all 1000 nodes.

Which issue(s) this PR fixes:

Fixes #5870, #5618

Special notes for your reviewer:

Does this PR introduce a user-facing change?

setting the value of max-scale-down-parallelism to the value of max-empty-bulk-delete if one of the flags is not set. And Vice Versa. 
The max-empty-bulk-delete flag will be deprecated as of 1.29

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 26, 2023
@k8s-ci-robot k8s-ci-robot requested a review from x13n June 26, 2023 14:50
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 26, 2023
@x13n
Copy link
Member

x13n commented Jul 3, 2023

/assign

I think this will be problematic if someone explicitly sets one of the flags to be less than the default value. Instead of using Max, it seems safer to check which flag was provided (e.g. like this).

@Bryce-Soghigian
Copy link
Member Author

Bryce-Soghigian commented Jul 3, 2023

We have 4 cases here.

  1. Max-empty-bulk-delete is set but max-scale-down-parallelism is not
  2. max-empty-bulk-delete is not set but max-scale-down-parallelism is
  3. both are set.
  4. neither are set and we just use the default value 10, currently thats fine.

In cases 1 and 2, the solution is easy, we just set the value equal to the value of the one we set.

For 3, its an interesting case, because if I set max-empty-bulk-delete to 3, then max-scale-down-parallelism to 11, which should we respect?

// cases we have 3, 1. Mebd was set, but msdp was not, 2 msdp was set, but mebd was not, 3. both were set
	if isFlagPassed("max-empty-bulk-delete") && !isFlagPassed("max-scale-down-parallelism") {
		// Set maxScaleDownParallelismFlag = maxEmptyBulkDeleteFlag to avoid inconsistent deletion thresholds for the legacy planner and the new actuator.
		*maxScaleDownParallelismFlag = *maxEmptyBulkDeleteFlag
	} else if !isFlagPassed("max-empty-bulk-delete") && isFlagPassed("max-scale-down-parallelism") {
		// Set maxEmptyBulkDeleteFlag = maxScaleDownParallelismFlag to avoid inconsistent deletion thresholds for the legacy planner and the new actuator.
		*maxEmptyBulkDeleteFlag = *maxScaleDownParallelismFlag
	} else {
		maxDeletionSize := int(math.Max(float64(*maxEmptyBulkDeleteFlag), float64(*maxScaleDownParallelismFlag)))
		*maxEmptyBulkDeleteFlag = maxDeletionSize
		*maxScaleDownParallelismFlag = maxDeletionSize

	}
Maybe something like this could work, where we only set maxDeletionSize, if we have both flags and have a conflict. Or since we are trying to deprecate the maxEmptyBulkDeleteFlag, we just respect maxScaleDownParallelismFlag. 

@x13n
Copy link
Member

x13n commented Jul 3, 2023

If both are set ideally we'd just exit the process, but that's can break people already using such setup, especially if this is meant to be cherrypicked. I think max is ok in that case, but I'd log the problem to limit confusion.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 4, 2023
@Bryce-Soghigian Bryce-Soghigian force-pushed the bsoghigian/respecting-bulk-delete branch from 6a1eb24 to 0163710 Compare July 4, 2023 16:50
@Bryce-Soghigian
Copy link
Member Author

@x13n I am thinking we should merge this in to 1.25, 1.26 and 1.27, and maybe for 1.28(IE Master), but maybe we should have another followup PR that panics if both flags are set for 1.28 or 1.29. This panic can take two paths.

  1. Panic whenever max-empty-bulk-delete is set. Maybe i should have a followup pr changing the planner to use max-scale-down-parallelism rather than max-empty-bulk-delete for cropping nodes down, having a single source of truth for the deletion canidates number.

  2. Panic whenever we see both flags set, and still support setting max-empty-bulk-delete and keep this logic currently in the autoscaler.

@Bryce-Soghigian
Copy link
Member Author

Bryce-Soghigian commented Jul 4, 2023

https://github.com/kubernetes/autoscaler/pull/5926/files something like this could be useful as well, after this pr we are on is merged.

This way we only set the total number of nodes we plan on removing/deleting based on one flag. With the pr we are on, we will have that value set to the intended value always. I am thinking maybe either in 1.28 or 1.29, we enforce max-empty-bulk-delete should not be set.

What is the best way to announce flag deprecations to the community?

@Bryce-Soghigian Bryce-Soghigian force-pushed the bsoghigian/respecting-bulk-delete branch from 0163710 to af1aab9 Compare July 5, 2023 20:22
cluster-autoscaler/main.go Outdated Show resolved Hide resolved
cluster-autoscaler/main.go Outdated Show resolved Hide resolved
@x13n
Copy link
Member

x13n commented Jul 7, 2023

About the deprecation - I'd start with emitting a warning whenever the old flag is used. And then delete it after a version or two. It should be also a part of the release notes.

@Bryce-Soghigian Bryce-Soghigian force-pushed the bsoghigian/respecting-bulk-delete branch 3 times, most recently from 59bb45d to c9f3f2e Compare July 8, 2023 21:41
…xEmptyBulkDelete, and maxScaleDownParallelism to be the larger of the two flags in the case both are set
@Bryce-Soghigian Bryce-Soghigian force-pushed the bsoghigian/respecting-bulk-delete branch from c9f3f2e to 1e45078 Compare July 8, 2023 21:43
@Bryce-Soghigian Bryce-Soghigian changed the title fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be th… fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value Jul 8, 2023
Copy link
Member

@x13n x13n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment on wording. Putting on hold, but feel free to just cancel the hold if you disagree.

/lgtm
/approve
/hold

cluster-autoscaler/main.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 12, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bryce-Soghigian, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2023
@Bryce-Soghigian
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2023
@k8s-ci-robot k8s-ci-robot merged commit da96d89 into kubernetes:master Jul 12, 2023
6 checks passed
Bryce-Soghigian pushed a commit to Bryce-Soghigian/autoscaler that referenced this pull request Jul 18, 2023
…specting-bulk-delete

fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value
Bryce-Soghigian pushed a commit to Bryce-Soghigian/autoscaler that referenced this pull request Jul 18, 2023
…specting-bulk-delete

fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value
Bryce-Soghigian pushed a commit to Bryce-Soghigian/autoscaler that referenced this pull request Jul 18, 2023
…specting-bulk-delete

fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value
k8s-ci-robot added a commit that referenced this pull request Jul 19, 2023
…ulk-delete

cherry-pick: Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-bulk-delete
k8s-ci-robot added a commit that referenced this pull request Jul 19, 2023
…e-1.27

Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-bulk-delete
k8s-ci-robot added a commit that referenced this pull request Jul 19, 2023
…ulk-delete-1.26

cherry-pick: Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-bulk-delete
@yaohuatj
Copy link

cause default value was set for max-scale-down-parallelism, maybe the function !isFlagPassed("max-scale-down-parallelism")" won't work ?

@yaohuatj
Copy link

yaohuatj commented Nov 11, 2023

cause default value was set for max-scale-down-parallelism, maybe the function !isFlagPassed("max-scale-down-parallelism")" won't work ?

I have figure out, flag.Parse() is missing, so
isFlagPassed("max-empty-bulk-delete") = false and isFlagPassed("max-scale-down-parallelism") = false

jigish pushed a commit to airbnb/autoscaler that referenced this pull request Feb 6, 2024
…specting-bulk-delete

fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
4 participants