Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #117865: Parallel StatefulSet pod create & delete #119224

Conversation

aleksandra-malinowska
Copy link
Contributor

@aleksandra-malinowska aleksandra-malinowska commented Jul 11, 2023

Cherry pick of #117865 on release-1.25.

#117865: Refactor StatefulSet controller update logic

For details on the cherry pick process, see the cherry pick requests page.

Fixed a performance issue where pods weren't created/deleted in parallel for a StatefulSet with podManagementPolicy: Parallel.

@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jul 11, 2023
@k8s-ci-robot k8s-ci-robot added do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 11, 2023
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 11, 2023
@aleksandra-malinowska aleksandra-malinowska changed the title Automated cherry pick of #117865: Refactor StatefulSet controller update logic Automated cherry pick of #117865: Parallel StatefulSet pod create & delete Jul 11, 2023
@aleksandra-malinowska
Copy link
Contributor Author

/kind bug
/priority important-soon
/assign @soltysh

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 11, 2023
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 11, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a959fb9294e5c5c7ef4398b0e8f5f1c79c3f8536

@soltysh
Copy link
Contributor

soltysh commented Jul 11, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 11, 2023
@Verolop
Copy link

Verolop commented Jul 12, 2023

/retest-required

@aleksandra-malinowska
Copy link
Contributor Author

Looks like in 1.26, unit test fakes were improved to avoid sorting all pods to change one pod's state. New tests added in this PR use more pods and are simply too slow with 1.25 inefficient fakes (exceed per package limit of 180s).

I can either reduce the number of pods in test cases, or backport some of the test-only improvements from #112744. Either way, it won't be a clean cherry-pick, but the difference will be confined to unit tests.

@Verolop @soltysh if that's OK, I'd prefer to improve the fakes, so that we have the same test cases everywhere.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 12, 2023
@aleksandra-malinowska
Copy link
Contributor Author

Pushed a minimal subset of unit test changes from #112744 as a new commit. Let me know if you'd prefer me to squash it so every commit compiles.

@Verolop
Copy link

Verolop commented Jul 12, 2023

@aleksandra-malinowska it's ok if it's not a 'clean' cherry-pick
@soltysh can you please add your lgtm again?
thanks!

@aleksandra-malinowska aleksandra-malinowska force-pushed the automated-cherry-pick-of-#117865-upstream-release-1.25 branch from 4b52b0a to 35df064 Compare July 12, 2023 14:45
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 12, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aleksandra-malinowska, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 22d03541b6dc5b71ad983ef8eafddc69850424b2

@aleksandra-malinowska
Copy link
Contributor Author

aleksandra-malinowska commented Jul 12, 2023

Unrelated flake likely due to #107414 (fix wasn't cherry-picked to 1.25):

=== FAIL: pkg/controller/volume/attachdetach/reconciler Test_Run_OneVolumeDetachFailNodeWithReadWriteOnce (1.13s)
W0712 15:04:41.258066   55960 mutation_detector.go:53] Mutation detector is enabled, this will result in memory leakage.
I0712 15:04:41.258485   55960 operation_generator.go:398] AttachVolume.Attach succeeded for volume "volume-name" (UniqueName: "fake-plugin/volume-name") from node "fail-detach-node" 
I0712 15:04:41.258753   55960 reconciler.go:346] "attacherDetacher.AttachVolume started" volume={VolumeToAttach:{MultiAttachErrorReported:false VolumeName:fake-plugin/volume-name VolumeSpec:0xc000013878 NodeName:fail-detach-node ScheduledPods:[&Pod{ObjectMeta:{pod-uid1  pod-uid1  pod-uid1  0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Spec:PodSpec{Volumes:[]Volume{},Containers:[]Container{},RestartPolicy:,TerminationGracePeriodSeconds:nil,ActiveDeadlineSeconds:nil,DNSPolicy:,NodeSelector:map[string]string{},ServiceAccountName:,DeprecatedServiceAccount:,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:nil,ImagePullSecrets:[]LocalObjectReference{},Hostname:,Subdomain:,Affinity:nil,SchedulerName:,InitContainers:[]Container{},AutomountServiceAccountToken:nil,Tolerations:[]Toleration{},HostAliases:[]HostAlias{},PriorityClassName:,Priority:nil,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[]PodReadinessGate{},RuntimeClassName:nil,EnableServiceLinks:nil,PreemptionPolicy:nil,Overhead:ResourceList{},TopologySpreadConstraints:[]TopologySpreadConstraint{},EphemeralContainers:[]EphemeralContainer{},SetHostnameAsFQDN:nil,OS:nil,HostUsers:nil,},Status:PodStatus{Phase:,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},},}]}}
E0712 15:04:42.285539   55960 reconciler.go:219] failed to get health of node fail-detach-node: node "fail-detach-node" not found
E0712 15:04:42.297667   55960 reconciler.go:219] failed to get health of node fail-detach-node: node "fail-detach-node" not found
E0712 15:04:42.308167   55960 reconciler.go:219] failed to get health of node fail-detach-node: node "fail-detach-node" not found
E0712 15:04:42.321998   55960 reconciler.go:219] failed to get health of node fail-detach-node: node "fail-detach-node" not found
E0712 15:04:42.332364   55960 reconciler.go:219] failed to get health of node fail-detach-node: node "fail-detach-node" not found
    reconciler_test.go:1605: Check volume <fake-plugin/volume-name> is reported as attached to node <fail-detach-node>, got false, expected true

/retest

@aleksandra-malinowska
Copy link
Contributor Author

/retest-required

Copy link

@Verolop Verolop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@Verolop Verolop added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 12, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Jul 12, 2023
@k8s-ci-robot k8s-ci-robot merged commit eb65549 into kubernetes:release-1.25 Jul 12, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants