New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing CurrentReplicas and CurrentRevision in completeRollingUpdate #120731
Conversation
Hi @adilGhaffarDev. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
Thank you Adil for picking this up! I believe this is also going to fix #120700. /lgtm |
LGTM label has been added. Git tree hash: 7e345dbea0711a4b27991d7686e63118c743f564
|
status.UpdatedReplicas == status.Replicas && | ||
status.ReadyReplicas == status.Replicas { | ||
status.UpdatedReplicas == *set.Spec.Replicas && | ||
status.ReadyReplicas == *set.Spec.Replicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'm wondering if we might also need a check for status.Replicas == *set.Spec.Replicas
and/or status.CurrentReplicas == 0
?
I don't think we can get here now if we have condemned pods that haven't been successfully removed, but given the complexity of this controller's logic and the strong guarantee we need from this check (to avoid unexpected rollout progress), I think it might make it more resilient to check all conditions we care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I will add this check too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added that check kindly check
Have some lint failed of e2e test and i can help to fix it at tomorrow. |
e956c4c
to
77b0900
Compare
551a170
to
00c21ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @soltysh
LGTM label has been added. Git tree hash: e7527d29c11bc9adaaba378a22cec6af16a2752a
|
Excellent, I will add it to the 1.29 milestone. /milestone v1.29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still issue of the mismatching statuses, as I explained in slack, but I'll also post here for visibility:
While testing this PR I found STS status isn't also fully correct, I started logging out full status right after each wait command and I'm seeing this:
// CurrentRevision is the only revision, and the CurrentReplicas matches Replicas, ie. 3
STEP: 1: Status v1.StatefulSetStatus{ObservedGeneration:1, Replicas:3, ReadyReplicas:3, CurrentReplicas:3, UpdatedReplicas:3, CurrentRevision:"ss2-7b6c9599d5", UpdateRevision:"ss2-7b6c9599d5", AvailableReplicas:3}
// after first update, with Partition=1, we need to update only 2 pods (1 and 2), but CurrentReplicas (ie. old) should be 0, UpdatedReplicas (ie. new) should be 2
STEP: 2: Status v1.StatefulSetStatus{ObservedGeneration:2, Replicas:3, ReadyReplicas:3, CurrentReplicas:2, UpdatedReplicas:0, CurrentRevision:"ss2-7b6c9599d5", UpdateRevision:"ss2-5459d8585b", AvailableReplicas:3}
// after the pod removal, we need to recreate the pod for CurrentRevision, so CurrentReplicas=1, and UpdatedReplicas=2
STEP: 3: Status v1.StatefulSetStatus{ObservedGeneration:2, Replicas:3, ReadyReplicas:3, CurrentReplicas:1, UpdatedReplicas:2, CurrentRevision:"ss2-7b6c9599d5", UpdateRevision:"ss2-5459d8585b", AvailableReplicas:3}
if one checks the descriptions we have in our API we clearly are doing a bad job reflecting the actual changes happening to Current(Replicas|Revision) and Updated(Replicas|Revisions) such that they don't reflect the actual state of world, since the division between Current and Updated should be strictly tied to Partitions (as described in the original proposal).
Still this PR improves the current situation and fixes parts of the problem we're seeing, and adds sufficient tests allowing us to ensure that .spec.partition
is working as expected.
/hold cancel
/lgtm
/approve
/triage accepted |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adilGhaffarDev, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Also, based on the discussion @mimowo @adilGhaffarDev and I had on slack, we're going to backport this to only 1.28 and 1.27 due to limitations. More details can be found in this slack thread. Although, I'd like to see us fixing that status I described above before attempting the picks, since there's still time before the next patch releases. |
…31-upstream-release-1.28 Automated cherry pick of #120731: Fixing CurrentReplicas and CurrentRevision in
…31-upstream-release-1.27 Automated cherry pick of #120731: Fixing CurrentReplicas and CurrentRevision in
What type of PR is this?
/kind bug
What this PR does / why we need it:
In
completeRollingUpdate
function we should not comparestatus.UpdatedReplicas
andstatus.ReadyReplicas
withstatus.Replicas
becausestatus.Replicas
is not showing total desired replicas, it shows only those replicas that are created, but here we want to compare with total desired replicas(set.Spec.Replicas) and only whenset.Spec.Replicas == status.ReadyReplicas
andset.Spec.Replicas == status.UpdatedReplicas
we should changestatus.CurrentReplicas
tostatus.UpdatedReplicas
. This was causing updated image on pods with ordinal number lower than the rolling partition number when they were being deleted.Which issue(s) this PR fixes:
Fixes #119685 #119684
Special notes for your reviewer:
This is taking e2e test from this PR #119759 , lot of discussion regarding the fix happened there. Just for reference.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Thanks @liangyuanpeng for adding e2e tests for this. Integration is added by @mimowo , in addition, @mimowo also helped in fixing e2e test.
Co-authored-by: Lan Liang gcslyp@gmail.com. Michał Woźniak @mimowo