In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number 0, in case of upgrade failure. #119684

ankushhifi007 · 2023-07-31T06:14:11Z

I am using statefulset for my application with 2 replicas, and updating pods with rolling update partition with following detail using helm.

updateStrategy:
rollingUpdate:
partition: 1
type: RollingUpdate

upgrade is starting from 1 to 0.
in 1 scenario, pod-0 upgrade fails, and i tried to do rollback using helm rollback 1 by 1 but image did not update on any pod.
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION

1 Fri Jul 28 12:25:31 2023 superseded 1.7.2+65 app 5.0 Install complete

2 Fri Jul 28 13:14:43 2023 superseded 1.8.0-203 app 5.0 Upgrade complete

3 Fri Jul 28 13:36:35 2023 superseded 1.8.0-203 app 5.0 Upgrade complete

4 Fri Jul 28 14:12:20 2023 superseded 1.7.2+65 app 5.0 Rollback to 2

Same procedure working till V1.26.1.

k8s-ci-robot · 2023-07-31T06:14:19Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ankushhifi007 · 2023-07-31T06:31:09Z

/SIG Apps

ankushhifi007 · 2023-07-31T07:43:26Z

kind/bug

liangyuanpeng · 2023-08-01T02:23:11Z

Some steps to reproduce it would be great.

ankushhifi007 · 2023-08-01T11:29:54Z

sts.zip
-- Sample yaml attached.

1 - Deploy STS using attached yaml where replica count is 2 and updatingstrategy.rolling.partition as 1.

2 - Edit STS and update the image to 1.15. At this stage pod-1 will be updated with image 1.15.

3 - Now delete the pod pod-0 and check the image tag for POD-0 came up with new image which is not expected [behavio]

aojea · 2023-08-01T16:15:23Z

/sig apps
/kind bug

ankushhifi007 · 2023-08-01T20:09:15Z

Any work around for this issue?

aojea · 2023-08-01T20:16:33Z

3 - Now delete the pod pod-0 and check the image tag for POD-0 came up with new image which is not expected [behavio]

what do you mean by "POD-0 came up with new image" , the Pod-0 has to have image 1.15 that is the one you have updated to

ankushhifi007 · 2023-08-01T20:22:44Z

Yes, it is has new image 1.15.
But I have only updated the image for pod-1 using partition 1 in statefulset.

liangyuanpeng · 2023-08-02T07:02:03Z

I'm interested in this, let me check it out.

/assign

ankushhifi007 · 2023-08-02T17:30:54Z

HI liangyuanpeng,
Any update,
Do you have any work around for this issue.

aojea · 2023-08-03T14:20:24Z

oh, I missed that part, indeed sounds like a bug and we should have an e2e test verifying that behavior, it seems a simple e2e test to add, @liangyuanpeng please add the e2e test reproducing the issue if you are going to work on this

/cc @soltysh

https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#rolling-updates

Partitioned rolling updates
The RollingUpdate update strategy can be partitioned, by specifying a .spec.updateStrategy.rollingUpdate.partition. If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet's .spec.template is updated. All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version.

ankushhifi007 · 2023-08-10T17:26:20Z

Hi @liangyuanpeng,
Need 1 info for your code changes, with respect to actual problem for my application upgrade.
I have attached the nginx helm chart according to my application and steps to reproduce the issue.
Can you please check and share, why it is working till v1.26.1 and also will it be solved, after your code change.

nginx.tar.gz
fallback issue step.txt

liangyuanpeng · 2023-08-11T10:12:53Z

@ankushhifi007
In my test, this problem exists in 1.27.x, and I packaged a patch version, maybe it's worth your try.

ghcr.io/liangyuanpeng/kube-controller-manager-amd64:v1.27-patch

I will try to test again with your files.

ankushhifi007 · 2023-08-16T03:58:30Z

@liangyuanpeng
Any further findings with my steps and also any possible workaround for me.

ankushhifi007 · 2023-08-17T04:05:56Z

@liangyuanpeng
Any further findings with my steps.

vkatabat · 2023-08-17T05:36:07Z

@aleksandra-malinowska Can #119096 this be potential cause for this issue which is seen only in 1.27.1 but not in 1.26.

ankushhifi007 · 2023-08-18T11:09:31Z

@liangyuanpeng,
I have tested your patch, pod image change issue is fixed during pod restart,

But now upgrade procedure is breaking.
I have tested my scenario where stateful set having 2 replicas.
I started upgrade from pod with ordinal number 1, that is working fine.
But image is not updating while doing the upgrade of pod with ordinal number zero.

same upgrade procedure is working with v1.27.0 but after applying the patch it is not working.

ankushhifi007 · 2023-08-19T08:37:10Z

@liangyuanpeng,
1 additional point.
This behaviour is only with helm upgrade. while upgrading with editing stateful set, it is working fine.

lowang-bh · 2023-08-20T04:10:53Z

Maybe it is as designed, other replicas will keep old version utill the first upgraded one finished upgring.

ankushhifi007 · 2023-08-20T13:16:15Z

Maybe it is as designed, other replicas will keep old version utill the first upgraded one finished upgring.

Yes you are right other replicas keeping the older version till first one is upgraded, but when I am upgrading other pod after
finish of first pod, it should come up with new image, but with @liangyuanpeng patch helm upgrade with partition 0 is not updating new image in pod-0.

lowang-bh · 2023-08-21T01:20:52Z

with partition 0 is not updating new image in pod-0

I think you can check the real partition value in yaml. k8s will update pod with index from replicas-1 to partition if partition is set.

kubernetes/pkg/controller/statefulset/stateful_set_control.go

Lines 682 to 711 in 2979242

    
           // we compute the minimum ordinal of the target sequence for a destructive update based on the strategy. 
        
           updateMin := 0 
        
           if set.Spec.UpdateStrategy.RollingUpdate != nil { 
        
           	updateMin = int(*set.Spec.UpdateStrategy.RollingUpdate.Partition) 
        
           } 
        
           // we terminate the Pod with the largest ordinal that does not match the update revision. 
        
           for target := len(replicas) - 1; target >= updateMin; target-- { 
        
           	// delete the Pod if it is not already terminating and does not match the update revision. 
        
           	if getPodRevision(replicas[target]) != updateRevision.Name && !isTerminating(replicas[target]) { 
        
           		logger.V(2).Info("Pod of StatefulSet is terminating for update", 
        
           			"statefulSet", klog.KObj(set), "pod", klog.KObj(replicas[target])) 
        
           		if err := ssc.podControl.DeleteStatefulPod(set, replicas[target]); err != nil { 
        
           			if !errors.IsNotFound(err) { 
        
           				return &status, err 
        
           			} 
        
           		} 
        
           		status.CurrentReplicas-- 
        
           		return &status, err 
        
           	} 
        
           	// wait for unhealthy Pods on update 
        
           	if !isHealthy(replicas[target]) { 
        
           		logger.V(4).Info("StatefulSet is waiting for Pod to update", 
        
           			"statefulSet", klog.KObj(set), "pod", klog.KObj(replicas[target])) 
        
           		return &status, nil 
        
           	} 
        
           } 
        
           return &status, nil

ankushhifi007 · 2023-08-28T06:41:13Z

with partition 0 is not updating new image in pod-0

I think you can check the real partition value in yaml. k8s will update pod with index from replicas-1 to partition if partition is set.

kubernetes/pkg/controller/statefulset/stateful_set_control.go

Lines 682 to 711 in 2979242

// we compute the minimum ordinal of the target sequence for a destructive update based on the strategy.

updateMin := 0

if set.Spec.UpdateStrategy.RollingUpdate != nil {

updateMin = int(*set.Spec.UpdateStrategy.RollingUpdate.Partition)

}

// we terminate the Pod with the largest ordinal that does not match the update revision.

for target := len(replicas) - 1; target >= updateMin; target-- {

// delete the Pod if it is not already terminating and does not match the update revision.

if getPodRevision(replicas[target]) != updateRevision.Name && !isTerminating(replicas[target]) {

logger.V(2).Info("Pod of StatefulSet is terminating for update",

"statefulSet", klog.KObj(set), "pod", klog.KObj(replicas[target]))

if err := ssc.podControl.DeleteStatefulPod(set, replicas[target]); err != nil {

if !errors.IsNotFound(err) {

return &status, err

}

}

status.CurrentReplicas--

return &status, err

}

// wait for unhealthy Pods on update

if !isHealthy(replicas[target]) {

logger.V(4).Info("StatefulSet is waiting for Pod to update",

"statefulSet", klog.KObj(set), "pod", klog.KObj(replicas[target]))

return &status, nil

}

}

return &status, nil

I am checking and setting partition values as per my upgrade requirement but my my point is with @liangyuanpeng 's patch
ghcr.io/liangyuanpeng/kube-controller-manager-amd64:v1.27-patch

Helm upgrade is not working as per helm design now that is the issue. apart from original reported issue.

ankushhifi007 · 2023-08-28T06:42:05Z

@liangyuanpeng
did you test the scenario with my helm chart.

aleksandra-malinowska · 2023-09-08T09:31:10Z

Aleksandra Malinowska Can #119096 this be potential cause for this issue which is seen only in 1.27.1 but not in 1.26.

#119096 was cherry-picked to 1.27.4, it's not in 1.27.1

liangyuanpeng · 2023-11-06T08:41:39Z

@ankushhifi007 I believe that it's fixed by #120731

/unassign

k8s-triage-robot · 2024-02-04T09:27:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

adilGhaffarDev · 2024-02-05T21:42:15Z

closing this because it is fixed in #120731 and backported to 1.27 and 1.28
/close

k8s-ci-robot · 2024-02-05T21:42:20Z

@adilGhaffarDev: Closing this issue.

In response to this:

closing this because it is fixed in #120731 and backported to 1.27 and 1.28
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 31, 2023

ankushhifi007 changed the title ~~In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number1~~ In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number 0, in case of upgrade failure. Jul 31, 2023

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. kind/bug Categorizes issue or PR as related to a bug. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 1, 2023

k8s-ci-robot assigned liangyuanpeng Aug 2, 2023

liangyuanpeng mentioned this issue Aug 4, 2023

[StatefulSet] fix statefulSet partition rollingUpdate wrong compute replicas. #119759

Closed

aleksandra-malinowska mentioned this issue Sep 15, 2023

[Flake] StatefulSet Basic StatefulSet functionality #120700

Closed

adilGhaffarDev mentioned this issue Sep 18, 2023

Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate #120731

Merged

k8s-ci-robot unassigned liangyuanpeng Nov 6, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2024

k8s-ci-robot closed this as completed Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number 0, in case of upgrade failure. #119684

In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number 0, in case of upgrade failure. #119684

ankushhifi007 commented Jul 31, 2023

k8s-ci-robot commented Jul 31, 2023

ankushhifi007 commented Jul 31, 2023

ankushhifi007 commented Jul 31, 2023

liangyuanpeng commented Aug 1, 2023

ankushhifi007 commented Aug 1, 2023

aojea commented Aug 1, 2023

ankushhifi007 commented Aug 1, 2023

aojea commented Aug 1, 2023

ankushhifi007 commented Aug 1, 2023

liangyuanpeng commented Aug 2, 2023 •

edited

ankushhifi007 commented Aug 2, 2023

aojea commented Aug 3, 2023 •

edited

ankushhifi007 commented Aug 10, 2023

liangyuanpeng commented Aug 11, 2023

ankushhifi007 commented Aug 16, 2023

ankushhifi007 commented Aug 17, 2023

vkatabat commented Aug 17, 2023

ankushhifi007 commented Aug 18, 2023 •

edited

ankushhifi007 commented Aug 19, 2023

lowang-bh commented Aug 20, 2023

ankushhifi007 commented Aug 20, 2023

lowang-bh commented Aug 21, 2023

ankushhifi007 commented Aug 28, 2023

ankushhifi007 commented Aug 28, 2023

aleksandra-malinowska commented Sep 8, 2023

liangyuanpeng commented Nov 6, 2023 •

edited

k8s-triage-robot commented Feb 4, 2024

adilGhaffarDev commented Feb 5, 2024

k8s-ci-robot commented Feb 5, 2024

In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number 0, in case of upgrade failure. #119684

In kubernetes V1.27.1, Image not rolling back to older version for pod with ordinal number 0, in case of upgrade failure. #119684

Comments

ankushhifi007 commented Jul 31, 2023

I am using statefulset for my application with 2 replicas, and updating pods with rolling update partition with following detail using helm.

updateStrategy: rollingUpdate: partition: 1 type: RollingUpdate

k8s-ci-robot commented Jul 31, 2023

ankushhifi007 commented Jul 31, 2023

ankushhifi007 commented Jul 31, 2023

liangyuanpeng commented Aug 1, 2023

ankushhifi007 commented Aug 1, 2023

aojea commented Aug 1, 2023

ankushhifi007 commented Aug 1, 2023

aojea commented Aug 1, 2023

ankushhifi007 commented Aug 1, 2023

liangyuanpeng commented Aug 2, 2023 • edited

ankushhifi007 commented Aug 2, 2023

aojea commented Aug 3, 2023 • edited

ankushhifi007 commented Aug 10, 2023

liangyuanpeng commented Aug 11, 2023

ankushhifi007 commented Aug 16, 2023

ankushhifi007 commented Aug 17, 2023

vkatabat commented Aug 17, 2023

ankushhifi007 commented Aug 18, 2023 • edited

ankushhifi007 commented Aug 19, 2023

lowang-bh commented Aug 20, 2023

ankushhifi007 commented Aug 20, 2023

lowang-bh commented Aug 21, 2023

ankushhifi007 commented Aug 28, 2023

ankushhifi007 commented Aug 28, 2023

aleksandra-malinowska commented Sep 8, 2023

liangyuanpeng commented Nov 6, 2023 • edited

k8s-triage-robot commented Feb 4, 2024

adilGhaffarDev commented Feb 5, 2024

k8s-ci-robot commented Feb 5, 2024

updateStrategy:
rollingUpdate:
partition: 1
type: RollingUpdate

liangyuanpeng commented Aug 2, 2023 •

edited

aojea commented Aug 3, 2023 •

edited

ankushhifi007 commented Aug 18, 2023 •

edited

liangyuanpeng commented Nov 6, 2023 •

edited