Update coredns rolling update strategy #10748

tu1h · 2023-12-27T03:27:48Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
In a two-node cluster(1 master, 1 worker), when upgrading the cluster(e.g. from 1.26.0 to 1.26.7), a coredns pod in pending state will eventually appear.

$ kubectl get -n kube-system pod | grep coredns
coredns-5ffbf89ff4-wpsl8                   0/1     Pending   0             35m
coredns-d77484b69-lwnc8                    1/1     Running   0             34m
coredns-f8cb45497-ptwn7                    1/1     Running   1 (35m ago)   36m

$ kubectl get -n kube-system rs | grep coredns
coredns-5ffbf89ff4                   1         1         0       36m
coredns-d77484b69                    1         1         1       53m
coredns-f8cb45497                    1         1         1       37m

This can be solved by increasing the number of maxUnavailable, it is reasonable to set as 1.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Set the `maxUnavailable` of the coredns rolling update strategy to 1

Signed-off-by: tu1h <lihai.tu@daocloud.io>

yankay · 2023-12-28T08:51:05Z

Thanks @tu1h

Great PR. 👍The default value is 1 is reasonable, referenced by：

/approve

k8s-ci-robot · 2023-12-28T08:57:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tu1h, yankay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yankay]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

VannTen · 2023-12-29T14:09:26Z

No objection to align ourselves with other coredns distributions, but I do have a question:
Could you clarify what's happening with that pending pods (maybe share the describe output) ? Is that related to the dns autoscaler ?

Also, please complete the release note.

tu1h · 2024-01-02T08:20:35Z

I think it's caused by maxUnavailable being equal to 0, because there are only two nodes in this cluster and the upgrade performs cordon on each node next to each other, which causes all the pods on the current node to be evicted, whereas the coredns have a pod anti-affinity and a maximum unavailable number of 0 for rolling upgrades, which results in the new pods being not able to be scheduled during the upgrade.

There are some details during the upgrading described at below.

Before upgrade:

$ kubectl logs -n kube-system dns-autoscaler-56dd9d6549-7ksd5
I0102 07:27:34.035819       1 autoscaler.go:49] Scaling Namespace: kube-system, Target: deployment/coredns
I0102 07:27:34.292225       1 autoscaler_server.go:157] ConfigMap not found: configmaps "dns-autoscaler" not found, will create one with default params
I0102 07:27:34.309545       1 k8sclient.go:148] Created ConfigMap dns-autoscaler in namespace kube-system
I0102 07:27:34.309609       1 plugin.go:50] Set control mode to linear
I0102 07:27:34.309641       1 linear_controller.go:60] ConfigMap version change (old:  new: 976) - rebuilding params
I0102 07:27:34.309658       1 linear_controller.go:61] Params from apiserver:
{"coresPerReplica":256,"min":2,"nodesPerReplica":16,"preventSinglePointFailure":true}
I0102 07:27:34.319510       1 k8sclient.go:273] Cluster status: SchedulableNodes[2], SchedulableCores[20]
I0102 07:27:34.319585       1 k8sclient.go:274] Replicas are not as expected : updating replicas from 1 to 2

After:

$ kubectl logs -n kube-system dns-autoscaler-56dd9d6549-ntc4w -f
I0102 07:49:31.121668       1 autoscaler.go:49] Scaling Namespace: kube-system, Target: deployment/coredns
I0102 07:49:31.377089       1 plugin.go:50] Set control mode to linear
I0102 07:49:31.377185       1 linear_controller.go:60] ConfigMap version change (old:  new: 976) - rebuilding params
I0102 07:49:31.377206       1 linear_controller.go:61] Params from apiserver:
{"coresPerReplica":256,"min":2,"nodesPerReplica":16,"preventSinglePointFailure":true}

$ kubectl describe deployments.apps coredns
..................
Replicas:               2 desired | 2 updated | 3 total | 2 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  0 max unavailable, 10% max surge
..................
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  coredns-f8cb45497 (1/1 replicas created)
NewReplicaSet:   coredns-5ffbf89ff4 (2/2 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  32m    deployment-controller  Scaled up replica set coredns-d77484b69 to 1
  Normal  ScalingReplicaSet  32m    deployment-controller  Scaled up replica set coredns-d77484b69 to 2 from 1
  Normal  ScalingReplicaSet  12m    deployment-controller  Scaled up replica set coredns-f8cb45497 to 1
  Normal  ScalingReplicaSet  12m    deployment-controller  Scaled down replica set coredns-d77484b69 to 1 from 2
  Normal  ScalingReplicaSet  12m    deployment-controller  Scaled up replica set coredns-f8cb45497 to 2 from 1
  Normal  ScalingReplicaSet  11m    deployment-controller  Scaled down replica set coredns-f8cb45497 to 1 from 2
  Normal  ScalingReplicaSet  11m    deployment-controller  Scaled up replica set coredns-5ffbf89ff4 to 1 from 0
  Normal  ScalingReplicaSet  5m17s  deployment-controller  Scaled down replica set coredns-d77484b69 to 0 from 1
  Normal  ScalingReplicaSet  5m17s  deployment-controller  Scaled up replica set coredns-5ffbf89ff4 to 2 from 1

$ kubectl get -n kube-system rs | grep coredns
coredns-5ffbf89ff4                   2         2         1       12m
coredns-d77484b69                    0         0         0       33m
coredns-f8cb45497                    1         1         1       13m

$ kubectl -n kube-system describe rs coredns-5ffbf89ff4
.............
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 3
                deployment.kubernetes.io/revision: 3
Controlled By:  Deployment/coredns
Replicas:       2 current / 2 desired
Pods Status:    1 Running / 1 Waiting / 0 Succeeded / 0 Failed
.............
Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  13m    replicaset-controller  Created pod: coredns-5ffbf89ff4-jjvg2
  Normal  SuccessfulCreate  6m32s  replicaset-controller  Created pod: coredns-5ffbf89ff4-q94sd

$ kubectl -n kube-system describe rs coredns-d77484b69
.............
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 3
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/coredns
Replicas:       0 current / 0 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
.............
Events:
  Type     Reason            Age                From                   Message
  ----     ------            ----               ----                   -------
  Warning  FailedCreate      34m (x7 over 34m)  replicaset-controller  Error creating: pods "coredns-d77484b69-" is forbidden: error looking up service account kube-system/coredns: serviceaccount "coredns" not found
  Normal   SuccessfulCreate  34m                replicaset-controller  Created pod: coredns-d77484b69-vjxmd
  Normal   SuccessfulCreate  34m                replicaset-controller  Created pod: coredns-d77484b69-qzl2x
  Normal   SuccessfulCreate  20m                replicaset-controller  Created pod: coredns-d77484b69-ngnn6
  Normal   SuccessfulDelete  14m                replicaset-controller  Deleted pod: coredns-d77484b69-ngnn6
  Normal   SuccessfulCreate  12m                replicaset-controller  Created pod: coredns-d77484b69-vfbtp
  Normal   SuccessfulDelete  7m4s               replicaset-controller  Deleted pod: coredns-d77484b69-vfbtp

kubectl -n kube-system describe rs coredns-f8cb45497
.............
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 3
                deployment.kubernetes.io/revision: 2
Controlled By:  Deployment/coredns
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
.............
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  14m   replicaset-controller  Created pod: coredns-f8cb45497-rjjbv
  Normal  SuccessfulCreate  14m   replicaset-controller  Created pod: coredns-f8cb45497-tgnsk
  Normal  SuccessfulDelete  13m   replicaset-controller  Deleted pod: coredns-f8cb45497-rjjbv

VannTen · 2024-01-08T16:32:49Z

Thanks for the explanation !
/lgtm

Signed-off-by: tu1h <lihai.tu@daocloud.io>

Update coredns rolling update strategy

68a2fa5

Signed-off-by: tu1h <lihai.tu@daocloud.io>

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 27, 2023

k8s-ci-robot requested review from jayonlau and VannTen December 27, 2023 03:27

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 27, 2023

k8s-ci-robot assigned yankay Dec 28, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2023

yankay added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Dec 28, 2023

k8s-ci-robot assigned VannTen Jan 8, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2024

k8s-ci-robot merged commit ddf5c6e into kubernetes-sigs:master Jan 8, 2024
65 checks passed

ledroide pushed a commit to ledroide/kubespray that referenced this pull request Jan 9, 2024

Update coredns rolling update strategy (kubernetes-sigs#10748)

ea8992e

Signed-off-by: tu1h <lihai.tu@daocloud.io>

yankay mentioned this pull request Jan 12, 2024

Release Proposal v2.24 #10720

Closed

pedro-peter pushed a commit to pedro-peter/kubespray that referenced this pull request May 8, 2024

Update coredns rolling update strategy (kubernetes-sigs#10748)

f7489d6

Signed-off-by: tu1h <lihai.tu@daocloud.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update coredns rolling update strategy #10748

Update coredns rolling update strategy #10748

tu1h commented Dec 27, 2023 •

edited

Loading

yankay commented Dec 28, 2023 •

edited

Loading

k8s-ci-robot commented Dec 28, 2023

VannTen commented Dec 29, 2023

tu1h commented Jan 2, 2024 •

edited

Loading

VannTen commented Jan 8, 2024

Update coredns rolling update strategy #10748

Update coredns rolling update strategy #10748

Conversation

tu1h commented Dec 27, 2023 • edited Loading

yankay commented Dec 28, 2023 • edited Loading

k8s-ci-robot commented Dec 28, 2023

VannTen commented Dec 29, 2023

tu1h commented Jan 2, 2024 • edited Loading

VannTen commented Jan 8, 2024

tu1h commented Dec 27, 2023 •

edited

Loading

yankay commented Dec 28, 2023 •

edited

Loading

tu1h commented Jan 2, 2024 •

edited

Loading