Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix StatefulSet e2e flake #42367

Merged
merged 1 commit into from
Mar 4, 2017
Merged

Conversation

kow3ns
Copy link
Member

@kow3ns kow3ns commented Mar 1, 2017

What this PR does / why we need it:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
Which issue this PR fixes
fixes #41889

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 1, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Mar 1, 2017
@spxtr spxtr assigned 0xmichalis and unassigned spxtr Mar 1, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@kow3ns
Copy link
Member Author

kow3ns commented Mar 2, 2017

@k8s-bot kops aws e2e test this

@@ -217,6 +217,7 @@ var _ = framework.KubeDescribe("StatefulSet", func() {

By("Before scale up finished setting 2nd pod to be not ready by breaking readiness probe")
sst.BreakProbe(ss, testProbe)
sst.WaitForStatus(ss, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be WaitForStatus(ss, 2)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status.Replicas should always be 1 here regardless of the pod being ready or not. It seems that this is not the case actually and we treat this field as ReadyReplicas for ReplicaSets/Deployments (DaemonSets have a different name but there is still a separation between created and ready in the status). So the test seems fine but API-wise I think there is an unnecessary incosistency between the workload apis.

@0xmichalis
Copy link
Contributor

/lgtm
/approve

Opened #42410 to discuss the api issue.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2017
@0xmichalis 0xmichalis added this to the v1.6 milestone Mar 2, 2017
@0xmichalis 0xmichalis added kind/flake Categorizes issue or PR as related to a flaky test. kind/bug Categorizes issue or PR as related to a bug. labels Mar 2, 2017
@0xmichalis
Copy link
Contributor

Labeling as a test bug and adding the 1.6 milestone

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2017
@0xmichalis 0xmichalis removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2017
@@ -355,7 +355,8 @@ func (s *StatefulSetTester) SetHealthy(ss *apps.StatefulSet) {
}
}

func (s *StatefulSetTester) waitForStatus(ss *apps.StatefulSet, expectedReplicas int32) {
// WaitForStatus waits for the ss.Status.Replicas to be equal to expectedReplicas
func (s *StatefulSetTester) WaitForStatus(ss *apps.StatefulSet, expectedReplicas int32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this to also check that ss.Status.ObservedGeneration >= ss.Generation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that observedGeneration is not working for StatefulSets and sent a PR: #42429

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis Using this code

if ss.Status.ObservedGeneration == nil { 
        Logf("ss observed generation is nil")
} else {
        Logf("ss observed generation %d",ss.Status.ObservedGeneration)
}
Logf("ss generation %d",ss.Generation)
if ssGet.Status.ObservedGeneration == nil {
        Logf("ssGet observed generation is nil")
} else {
        Logf("ssGet observed generation %d",ssGet.Status.ObservedGeneration)
}
Logf("ssGet generation %d",ssGet.Generation)

The observed generation of both ss and ssGet are always nil. However the generation appears to increment consistently.

what about the following

if  ssGet.Generation < ss.Generation {
     return false, nil
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kargakis just saw the above do you want to wait until #42429 merges before submitting this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just tagged it with a manual approval since the hack changes were minimal and it should be merged when you get back online:) Once you update this helper with the new check feel free to apply lgtm

@0xmichalis
Copy link
Contributor

@kubernetes/test-infra-maintainers the bot shouldn't suggest (ping) others if the PR is approved.

@kow3ns
Copy link
Member Author

kow3ns commented Mar 2, 2017

@enisoc Can we set the milestone to this release (its break fix)?

@enisoc
Copy link
Member

enisoc commented Mar 2, 2017

@kow3ns It's already in the milestone, so it should be good once it gets lgtm, which it seems @Kargakis removed due to an additional change requested.

@ixdy
Copy link
Member

ixdy commented Mar 2, 2017

@apelisse @grodrigues3 is #42367 (comment) on your list?

@apelisse
Copy link
Member

apelisse commented Mar 2, 2017

@ixdy @Kargakis Yes, I think it's similar to kubernetes/test-infra#2076, we are working on this.

Pods prior mutating the StatefulSet object to trigger sclaing.

Add ObervedVersion check
@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 3, 2017
@kow3ns
Copy link
Member Author

kow3ns commented Mar 3, 2017

/lgtm

@k8s-ci-robot
Copy link
Contributor

@kow3ns: you can't LGTM your own PR.

In response to this comment:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@foxish foxish self-assigned this Mar 3, 2017
@foxish
Copy link
Contributor

foxish commented Mar 3, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

The following people have approved this PR: foxish, kargakis, kow3ns

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @pwittrock
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@kow3ns
Copy link
Member Author

kow3ns commented Mar 3, 2017

@k8s-bot cvm gke e2e test this

@kow3ns
Copy link
Member Author

kow3ns commented Mar 3, 2017

@k8s-bot gci gke e2e test this

@kow3ns
Copy link
Member Author

kow3ns commented Mar 3, 2017

@k8s-bot cvm gke e2e test this

@k8s-ci-robot
Copy link
Contributor

@kow3ns: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins GCI GKE smoke e2e 08f95af link @k8s-bot gci gke e2e test this
Jenkins GKE smoke e2e 08f95af link @k8s-bot cvm gke e2e test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
10 participants