Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaemonSet e2e: Update image and rolling upgrade test timeout #72738

Merged
merged 1 commit into from
Feb 27, 2019

Conversation

alexbrand
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR removes an unnecessary sleep from the serve-hostname binary that is used in conformance tests.

The reason for this change is that the introduction of this 60 second sleep has negatively impacted the DaemonSet rolling upgrade conformance test.

Specifically, it prevents the serve-hostname pods from terminating gracefully, and thus each pod takes at least 30 seconds to terminate (due to the default 30 second termination grace period).

In large clusters, the conformance test fails because rolling all pods of the DaemonSet takes longer than timeout configured in the conformance test (5 minutes).

Which issue(s) this PR fixes:

Fixes #71666

Special notes for your reviewer:
I have updated all the references to this image that I could find using code search. Not sure if I missed any.

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 9, 2019
@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 9, 2019
@alexbrand
Copy link
Contributor Author

/sig testing

@ixdy
Copy link
Member

ixdy commented Jan 9, 2019

/assign @freehan

@@ -8,4 +8,4 @@ metadata:
spec:
containers:
- name: kubernetes-serve-hostname
image: gcr.io/kubernetes-e2e-test-images/serve-hostname-amd64:1.1
image: gcr.io/kubernetes-e2e-test-images/serve-hostname-amd64:1.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting the reference updates into a separate commit might make this easier to cherrypick to release branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ixdy. Will move the updates to a separate commit.

@freehan
Copy link
Contributor

freehan commented Jan 10, 2019

Can you add a flag for this instead?

@timothysc timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 10, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 10, 2019
@alexbrand
Copy link
Contributor Author

@freehan I think adding a flag is a possibility, but another potential route is just updating the DaemonSet test instead. I tagged you in the original issue as there is some more discussion happening there. Would love your insight and thoughts.

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jan 22, 2019
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 22, 2019
@alexbrand alexbrand changed the title Remove sleep from serve-hostname binary used in conformance tests Set termination grace period to 1 second in DaemonSet update e2e test Jan 22, 2019
@alexbrand
Copy link
Contributor Author

Updated the title to reflect the new approach for this fix. This is a more direct fix to the problem being observed in #71666.

Instead of changing the serve-hostname image which is depended on by other tests, we fix the DaemonSet test directly.

@BenTheElder, @ixdy, @freehan PTAL

@ixdy
Copy link
Member

ixdy commented Jan 30, 2019

/approve

I'll let @freehan apply lgtm.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexbrand, ixdy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2019
@janetkuo janetkuo self-assigned this Feb 20, 2019
// Set the termination grace period to 1 second. This is necessary because the
// serve-hostname binary sleeps for 1 minute after receiving SIGTERM.
terminationPeriod := int64(1)
ds.Spec.Template.Spec.TerminationGracePeriodSeconds = &terminationPeriod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix should work, but DaemonSet test shouldn't be tied to the serve-hostname image. We can simply change the image we use in this test:

image := framework.ServeHostnameImage

to something else (can pick from this list), such as nginx.

We can also increase the timeout based on number of nodes, as this affects the number of replicas a DaemonSet creates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. Should we take that route instead? Will take a look at the list of images.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for looking into this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the image used throughout all DaemonSet tests, or just the one used during the DaemonSet update test?

Copy link
Member

@janetkuo janetkuo Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the image used throughout all DaemonSet tests (i.e. change image := framework.ServeHostnameImage) is fine. Which image we pick doesn't matter as long as it can be pulled and doesn't take too long to start/terminate.

We can scale update timeout based on # of nodes too, maybe something like 5 mins + 1 min * # of nodes.

@janetkuo janetkuo added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Feb 21, 2019
Use Nginx as the DaemonSet image instead of the ServeHostname image.
This was changed because the ServeHostname has a sleep after terminating
which makes it incompatible with the DaemonSet Rolling Upgrade e2e test.

In addition, make the DaemonSet Rolling Upgrade e2e test timeout a
function of the number of nodes that make up the cluster. This is
required because the more nodes there are, the longer the time it will
take to complete a rolling upgrade.

Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 22, 2019
@alexbrand
Copy link
Contributor Author

Updated the image, and made the timeout a function of the number of nodes in the cluster. PTAL.

@janetkuo janetkuo changed the title Set termination grace period to 1 second in DaemonSet update e2e test DaemonSet e2e: Update image and rolling upgrade test timeout Feb 26, 2019
@janetkuo
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2019
@BenTheElder
Copy link
Member

thank you!

@janetkuo
Copy link
Member

We need to cherrypick this to older releases as well.

@janetkuo
Copy link
Member

janetkuo commented Mar 1, 2019

Sent cherrypick PRs:
#74820
#74822

k8s-ci-robot added a commit that referenced this pull request Mar 6, 2019
…38-upstream-release-1.12

Automated cherry pick of #72738: DaemonSet e2e: Update image and rolling upgrade test timeout
k8s-ci-robot added a commit that referenced this pull request Mar 18, 2019
…38-upstream-release-1.13

Automated cherry pick of #72738: DaemonSet e2e: Update image and rolling upgrade test timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conformance test for DaemonSet RollingUpdate to rigid on timeout
7 participants