Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add StatefulSet MinReadySeconds e2e test #104078

Merged
merged 1 commit into from
Sep 15, 2021

Conversation

atiratree
Copy link
Member

What type of PR is this?

What this PR does / why we need it:

2nd part of #103073 adding additional e2e test:

MinReadySeconds should update availableReplicas accordingly when enabled

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP]: https://github.com/kubernetes/enhancements/pull/2607
[Other doc]: https://github.com/kubernetes/website/pull/27683

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 2, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @atiratree. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/apps Categorizes an issue or PR as relevant to SIG Apps. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 2, 2021
@atiratree
Copy link
Member Author

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @atiratree. Please look at my comments.

setHTTPProbe(ss)
ss, err := c.AppsV1().StatefulSets(ns).Create(context.TODO(), ss, metav1.CreateOptions{})
framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this 0? Do you want to show that the replicas won't be available immediately? If yes, this could be racy if reaching this statement would take more than 10 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to put a longer time so the test can complete quickly.

There are two ways to solve this.

  • make MinReadySeconds a bit higher - eg 30s? (test takes longer)
  • remove the 0 check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravisantoshgudimetla and yes I want to check the replicas are not available immediately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kube as a whole is eventually consistent. So, better to make the value here longer like 30 seconds as you mentioned and wait till 5 or 10 seconds and make sure that the AvailableReplicas is not updated. Even this may be racy somtimes. I'd prefer having -ve tests in integration of units by faking clock.

Copy link
Member Author

@atiratree atiratree Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now I put 30 there with the additional wait (thanks for the suggestion). I expect it not be that racy since we are just waiting for the controller to pick up the resource and set 0 (so waiting time should be our poll 10s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revisit the test if it is failing(flaking).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar concerns here as Ravi, this might be flaking when STS controller immediately creates the first replica and you'll never meet 0 status condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about the creation of a replica, but about the availability and since we set MinReadySeconds to 30 we should observe 0 for the first 30 seconds. WaitForStatusAvailableReplicas is waiting only 0-10s to observe the 0 to prevent the race

@@ -1154,6 +1154,42 @@ var _ = SIGDescribe("StatefulSet", func() {
framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 1)
})

ginkgo.It("MinReadySeconds should update availableReplicas accordingly when enabled [Feature:StatefulSetMinReadySeconds] [alpha]", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AvailableReplicas should depend on the MinReadySeconds value specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

test/e2e/apps/statefulset.go Show resolved Hide resolved

ginkgo.By("check Available Replicas are shown in status")
out, err := framework.RunKubectl(ns, "get", "statefulset", ss.Name, "-o=yaml")
framework.ExpectNoError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the .By enough? What else do you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, By is good enough, I usually add a comment to separate code for different tests.

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits but looks good for me.

@soltysh can you review as well?

@k8s-ci-robot k8s-ci-robot added the area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework label Aug 19, 2021
@pacoxu
Copy link
Member

pacoxu commented Sep 3, 2021

/kind cleanup

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 3, 2021
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh - Can I get a review of this PR now that

#104045 is getting merged.

setHTTPProbe(ss)
ss, err := c.AppsV1().StatefulSets(ns).Create(context.TODO(), ss, metav1.CreateOptions{})
framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revisit the test if it is failing(flaking).

@ravisantoshgudimetla
Copy link
Contributor

/retest

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but mostly looks good.
/ok-to-test
/priority backlog
/traige accepted
/approve
/assign @ravisantoshgudimetla
for final lgtm

@@ -1154,6 +1154,50 @@ var _ = SIGDescribe("StatefulSet", func() {
framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 1)
})

ginkgo.It("AvailableReplicas should get updated accordingly when MinReadySeconds is enabled [Feature:StatefulSetMinReadySeconds] [alpha]", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is beta now that we merged #104045

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

setHTTPProbe(ss)
ss, err := c.AppsV1().StatefulSets(ns).Create(context.TODO(), ss, metav1.CreateOptions{})
framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar concerns here as Ravi, this might be flaking when STS controller immediately creates the first replica and you'll never meet 0 status condition.

framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 0)
// let's check that the availableReplicas have still not updated
time.Sleep(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to use time.Sleep but rather add those extra 5s in the next WaitFor..., if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although looking at WaitForStatusAvailableReplicas it waits up to 10mins, so you can just drop that bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in #104078 (comment) we might observe 0 AvailableReplicas immediately and this ensures that we wait at least 5 seconds to check it still stays 0.

The next WaitForStatusAvailableReplicas check for Availability. This one checks for UnAvailability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, I missed the initial MinReadySeconds being set, I assumed it's not.

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Sep 15, 2021
@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 15, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2021
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

framework.ExpectNoError(err)
e2estatefulset.WaitForStatusAvailableReplicas(c, ss, 0)
// let's check that the availableReplicas have still not updated
time.Sleep(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, I missed the initial MinReadySeconds being set, I assumed it's not.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atiratree, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@soltysh
Copy link
Contributor

soltysh commented Sep 15, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 15, 2021
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verfication failed, other than that LGTM.

@ravisantoshgudimetla
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit 07a4ae1 into kubernetes:master Sep 15, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants