-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix StatefulSet e2e flake #42367
Fix StatefulSet e2e flake #42367
Conversation
@k8s-bot kops aws e2e test this |
@@ -217,6 +217,7 @@ var _ = framework.KubeDescribe("StatefulSet", func() { | |||
|
|||
By("Before scale up finished setting 2nd pod to be not ready by breaking readiness probe") | |||
sst.BreakProbe(ss, testProbe) | |||
sst.WaitForStatus(ss, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be WaitForStatus(ss, 2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status.Replicas should always be 1 here regardless of the pod being ready or not. It seems that this is not the case actually and we treat this field as ReadyReplicas for ReplicaSets/Deployments (DaemonSets have a different name but there is still a separation between created and ready in the status). So the test seems fine but API-wise I think there is an unnecessary incosistency between the workload apis.
/lgtm Opened #42410 to discuss the api issue. |
Labeling as a test bug and adding the 1.6 milestone |
@@ -355,7 +355,8 @@ func (s *StatefulSetTester) SetHealthy(ss *apps.StatefulSet) { | |||
} | |||
} | |||
|
|||
func (s *StatefulSetTester) waitForStatus(ss *apps.StatefulSet, expectedReplicas int32) { | |||
// WaitForStatus waits for the ss.Status.Replicas to be equal to expectedReplicas | |||
func (s *StatefulSetTester) WaitForStatus(ss *apps.StatefulSet, expectedReplicas int32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this to also check that ss.Status.ObservedGeneration >= ss.Generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that observedGeneration is not working for StatefulSets and sent a PR: #42429
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kargakis Using this code
if ss.Status.ObservedGeneration == nil {
Logf("ss observed generation is nil")
} else {
Logf("ss observed generation %d",ss.Status.ObservedGeneration)
}
Logf("ss generation %d",ss.Generation)
if ssGet.Status.ObservedGeneration == nil {
Logf("ssGet observed generation is nil")
} else {
Logf("ssGet observed generation %d",ssGet.Status.ObservedGeneration)
}
Logf("ssGet generation %d",ssGet.Generation)
The observed generation of both ss and ssGet are always nil. However the generation appears to increment consistently.
what about the following
if ssGet.Generation < ss.Generation {
return false, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just tagged it with a manual approval since the hack changes were minimal and it should be merged when you get back online:) Once you update this helper with the new check feel free to apply lgtm
@kubernetes/test-infra-maintainers the bot shouldn't suggest (ping) others if the PR is approved. |
@enisoc Can we set the milestone to this release (its break fix)? |
@apelisse @grodrigues3 is #42367 (comment) on your list? |
@ixdy @Kargakis Yes, I think it's similar to kubernetes/test-infra#2076, we are working on this. |
Pods prior mutating the StatefulSet object to trigger sclaing. Add ObervedVersion check
/lgtm |
@kow3ns: you can't LGTM your own PR. In response to this comment:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED The following people have approved this PR: foxish, kargakis, kow3ns Needs approval from an approver in each of these OWNERS Files:
We suggest the following people: |
@k8s-bot cvm gke e2e test this |
@k8s-bot gci gke e2e test this |
@k8s-bot cvm gke e2e test this |
@kow3ns: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310) |
What this PR does / why we need it:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
Which issue this PR fixes
fixes #41889