Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ReplicaSet deletion rank for multiple colocation nodes #102949

Closed

Conversation

FillZpp
Copy link

@FillZpp FillZpp commented Jun 17, 2021

Signed-off-by: FillZpp FillZpp.pub@gmail.com

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fix ReplicaSet deletion rank for multiple colocation nodes.

Which issue(s) this PR fixes:

Fixes #102948

Special notes for your reviewer:

Now for a ReplicaSet or Deployment which has five Pods on two Nodes:

node01: pod-a, pod-b, pod-c
node02: pod-d, pod-e

When replicas changed from 5 to 2, the deletion rank that controller will get for Pods:

node01:
- pod-a: 3
- pod-b: 3
- pod-c: 3
node02:
- pod-d: 2
- pod-e: 2

So ReplicaSet will choose all three Pods on node01 to delete.

This PR fixes the rank to be:

node01:
- pod-a: 3
- pod-b: 2
- pod-c: 1
node02:
- pod-d: 2
- pod-e: 1

So that controller will delete Pods more evenly.

Does this PR introduce a user-facing change?

No user-facing changed.

/sig apps

@k8s-ci-robot
Copy link
Contributor

@FillZpp: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 17, 2021
@k8s-ci-robot
Copy link
Contributor

@FillZpp: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 17, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @FillZpp. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 17, 2021
@FillZpp
Copy link
Author

FillZpp commented Jun 17, 2021

/cc @janetkuo @kow3ns @soltysh

@gjkim42
Copy link
Member

gjkim42 commented Jun 17, 2021

/ok-to-test

lgtm

Hope folks at sig-apps can approve this.

https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#scaling-a-replicaset

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 17, 2021
@@ -819,7 +819,9 @@ func getPodsRankedByRelatedPodsOnSameNode(podsToRank, relatedPods []*v1.Pod) con
}
ranks := make([]int, len(podsToRank))
for i, pod := range podsToRank {
ranks[i] = podsOnNode[pod.Spec.NodeName]
if ranks[i] = podsOnNode[pod.Spec.NodeName]; ranks[i] > 0 {
podsOnNode[pod.Spec.NodeName]--
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removing the usefulness of other criteria, like the time the pods have been running.
Can you think of an alternative to not lose that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @damemi who has been looking into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would probably be better to try to tie this into the sort function that is already called on these podsWithRanks, as done in getPodsToDelete(). See

func (s ActivePodsWithRanks) Less(i, j int) bool {
// 1. Unassigned < assigned
// If only one of the pods is unassigned, the unassigned one is smaller
if s.Pods[i].Spec.NodeName != s.Pods[j].Spec.NodeName && (len(s.Pods[i].Spec.NodeName) == 0 || len(s.Pods[j].Spec.NodeName) == 0) {
return len(s.Pods[i].Spec.NodeName) == 0
}
// 2. PodPending < PodUnknown < PodRunning
if podPhaseToOrdinal[s.Pods[i].Status.Phase] != podPhaseToOrdinal[s.Pods[j].Status.Phase] {
return podPhaseToOrdinal[s.Pods[i].Status.Phase] < podPhaseToOrdinal[s.Pods[j].Status.Phase]
}
// 3. Not ready < ready
// If only one of the pods is not ready, the not ready one is smaller
if podutil.IsPodReady(s.Pods[i]) != podutil.IsPodReady(s.Pods[j]) {
return !podutil.IsPodReady(s.Pods[i])
}
// 4. higher pod-deletion-cost < lower pod-deletion cost
if utilfeature.DefaultFeatureGate.Enabled(features.PodDeletionCost) {
pi, _ := helper.GetDeletionCostFromPodAnnotations(s.Pods[i].Annotations)
pj, _ := helper.GetDeletionCostFromPodAnnotations(s.Pods[j].Annotations)
if pi != pj {
return pi < pj
}
}
// 5. Doubled up < not doubled up
// If one of the two pods is on the same node as one or more additional
// ready pods that belong to the same replicaset, whichever pod has more
// colocated ready pods is less
if s.Rank[i] != s.Rank[j] {
return s.Rank[i] > s.Rank[j]
}
// TODO: take availability into account when we push minReadySeconds information from deployment into pods,
// see https://github.com/kubernetes/kubernetes/issues/22065
// 6. Been ready for empty time < less time < more time
// If both pods are ready, the latest ready one is smaller
if podutil.IsPodReady(s.Pods[i]) && podutil.IsPodReady(s.Pods[j]) {
readyTime1 := podReadyTime(s.Pods[i])
readyTime2 := podReadyTime(s.Pods[j])
if !readyTime1.Equal(readyTime2) {
if !utilfeature.DefaultFeatureGate.Enabled(features.LogarithmicScaleDown) {
return afterOrZero(readyTime1, readyTime2)
} else {
if s.Now.IsZero() || readyTime1.IsZero() || readyTime2.IsZero() {
return afterOrZero(readyTime1, readyTime2)
}
rankDiff := logarithmicRankDiff(*readyTime1, *readyTime2, s.Now)
if rankDiff == 0 {
return s.Pods[i].UID < s.Pods[j].UID
}
return rankDiff < 0
}
}
}
// 7. Pods with containers with higher restart counts < lower restart counts
if maxContainerRestarts(s.Pods[i]) != maxContainerRestarts(s.Pods[j]) {
return maxContainerRestarts(s.Pods[i]) > maxContainerRestarts(s.Pods[j])
}
// 8. Empty creation time pods < newer pods < older pods
if !s.Pods[i].CreationTimestamp.Equal(&s.Pods[j].CreationTimestamp) {
if !utilfeature.DefaultFeatureGate.Enabled(features.LogarithmicScaleDown) {
return afterOrZero(&s.Pods[i].CreationTimestamp, &s.Pods[j].CreationTimestamp)
} else {
if s.Now.IsZero() || s.Pods[i].CreationTimestamp.IsZero() || s.Pods[j].CreationTimestamp.IsZero() {
return afterOrZero(&s.Pods[i].CreationTimestamp, &s.Pods[j].CreationTimestamp)
}
rankDiff := logarithmicRankDiff(s.Pods[i].CreationTimestamp, s.Pods[j].CreationTimestamp, s.Now)
if rankDiff == 0 {
return s.Pods[i].UID < s.Pods[j].UID
}
return rankDiff < 0
}
}
return false
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative I can think of is to do the sorting without ranks first, and then run the loop to add the ranks and finally do a second stable sort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or... we can simply remove the ranks. Now that we have randomized removals https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2185-random-pod-select-on-replicaset-downscale, the nodes with higher number of pods are more likely to get pods removed :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two logic don't conflict. With this PR, pods with the same rank can still sort by randomized.

For example, there are 5 nodes and each of them has 2 pods on it. So, the rank of 5 pods on the different 5 nodes will be 1 and the other 5 pods will be 0. The random algorithm will sort the former 5 pods by randomized and the latter 5 pods by randomized.

image

@alculquicondor @damemi What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is no comparison between a-0 and a-1. You are directly assigning a preference without checking the timestamp.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are directly assigning a preference without checking the timestamp.

Yes, that might need to sort the pods on same node by timestamp and then give them the right ranks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I think it's only possible through two consecutive sort operations, as I proposed above. However, I think if we get rid of the rank entirely, we get the same result, probabilistically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT?

@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch from 309b1ca to 2c0d5a1 Compare July 26, 2021 02:19
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 26, 2021
@FillZpp
Copy link
Author

FillZpp commented Jul 26, 2021

WDYT?

@alculquicondor Hey, sorry for this late reply, have gone through a busy time.. 😢

I just updated the code so that pods on the same node will have the right ranks. Would you please take a look again?
BTW, I still wonder if the random itself can get the same result. If we have some test data of this, I'm very happy to get it.

@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch from 2c0d5a1 to 62b321e Compare July 26, 2021 06:26
@FillZpp
Copy link
Author

FillZpp commented Jul 26, 2021

/retest

}
}
for _, podsOnNode := range podsOnNodes {
// sort pods on the same node with active and timestamp
sort.Sort(controller.ActivePods(podsOnNode))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't sort with logarithmic comparisons

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, there is no need to sort pods on the same node with logarithmic comparisons, which are designed to solve the sort problem across different topologies (such as multiple failure domains).

The pods on different nodes that have the same rank will be sort by logarithmic in the later ActivePodsWithRanks.

@alculquicondor
Copy link
Member

BTW, I still wonder if the random itself can get the same result. If we have some test data of this, I'm very happy to get it.

You might be able to build some sort of unit test that plays the scenario you want to solve and compare your PR with what I'm proposing (removing the ranks)

@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch from 62b321e to c8b326c Compare July 27, 2021 07:34
@FillZpp
Copy link
Author

FillZpp commented Oct 27, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2021
@dims
Copy link
Member

dims commented Jan 4, 2022

/assign @soltysh

/retest

@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch 2 times, most recently from 1d46db6 to b5bc898 Compare January 5, 2022 05:36
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jan 5, 2022
@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch from b5bc898 to b8064ad Compare January 5, 2022 07:43
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: FillZpp
To complete the pull request process, please ask for approval from soltysh after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch from b8064ad to 4b8d762 Compare January 5, 2022 08:27
@FillZpp
Copy link
Author

FillZpp commented Jan 6, 2022

/retest

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2022
Signed-off-by: FillZpp <FillZpp.pub@gmail.com>
@FillZpp FillZpp force-pushed the fix-replicaset-same-node-rank branch from 4b8d762 to 7ce7e19 Compare May 13, 2022 07:27
@FillZpp
Copy link
Author

FillZpp commented May 13, 2022

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] ReplicaSet incorrect deletion rank when there are multiple colocation nodes
8 participants