-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DaemonSet e2e: Update image and rolling upgrade test timeout #72738
Conversation
/sig testing |
/assign @freehan |
@@ -8,4 +8,4 @@ metadata: | |||
spec: | |||
containers: | |||
- name: kubernetes-serve-hostname | |||
image: gcr.io/kubernetes-e2e-test-images/serve-hostname-amd64:1.1 | |||
image: gcr.io/kubernetes-e2e-test-images/serve-hostname-amd64:1.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putting the reference updates into a separate commit might make this easier to cherrypick to release branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ixdy. Will move the updates to a separate commit.
bc0c247
to
42dc6e6
Compare
Can you add a flag for this instead? |
@freehan I think adding a flag is a possibility, but another potential route is just updating the DaemonSet test instead. I tagged you in the original issue as there is some more discussion happening there. Would love your insight and thoughts. |
42dc6e6
to
4646917
Compare
4646917
to
24e0008
Compare
Updated the title to reflect the new approach for this fix. This is a more direct fix to the problem being observed in #71666. Instead of changing the @BenTheElder, @ixdy, @freehan PTAL |
/approve I'll let @freehan apply lgtm. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexbrand, ixdy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
test/e2e/apps/daemon_set.go
Outdated
// Set the termination grace period to 1 second. This is necessary because the | ||
// serve-hostname binary sleeps for 1 minute after receiving SIGTERM. | ||
terminationPeriod := int64(1) | ||
ds.Spec.Template.Spec.TerminationGracePeriodSeconds = &terminationPeriod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix should work, but DaemonSet test shouldn't be tied to the serve-hostname image. We can simply change the image we use in this test:
image := framework.ServeHostnameImage
to something else (can pick from this list), such as nginx.
We can also increase the timeout based on number of nodes, as this affects the number of replicas a DaemonSet creates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. Should we take that route instead? Will take a look at the list of images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for looking into this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update the image used throughout all DaemonSet tests, or just the one used during the DaemonSet update test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the image used throughout all DaemonSet tests (i.e. change image := framework.ServeHostnameImage
) is fine. Which image we pick doesn't matter as long as it can be pulled and doesn't take too long to start/terminate.
We can scale update timeout based on # of nodes too, maybe something like 5 mins + 1 min * # of nodes
.
Use Nginx as the DaemonSet image instead of the ServeHostname image. This was changed because the ServeHostname has a sleep after terminating which makes it incompatible with the DaemonSet Rolling Upgrade e2e test. In addition, make the DaemonSet Rolling Upgrade e2e test timeout a function of the number of nodes that make up the cluster. This is required because the more nodes there are, the longer the time it will take to complete a rolling upgrade. Signed-off-by: Alexander Brand <alexbrand09@gmail.com>
24e0008
to
d971597
Compare
Updated the image, and made the timeout a function of the number of nodes in the cluster. PTAL. |
/lgtm |
thank you! |
We need to cherrypick this to older releases as well. |
…38-upstream-release-1.12 Automated cherry pick of #72738: DaemonSet e2e: Update image and rolling upgrade test timeout
…38-upstream-release-1.13 Automated cherry pick of #72738: DaemonSet e2e: Update image and rolling upgrade test timeout
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR removes an unnecessary sleep from the serve-hostname binary that is used in conformance tests.
The reason for this change is that the introduction of this 60 second sleep has negatively impacted the DaemonSet rolling upgrade conformance test.
Specifically, it prevents the
serve-hostname
pods from terminating gracefully, and thus each pod takes at least 30 seconds to terminate (due to the default 30 second termination grace period).In large clusters, the conformance test fails because rolling all pods of the DaemonSet takes longer than timeout configured in the conformance test (5 minutes).
Which issue(s) this PR fixes:
Fixes #71666
Special notes for your reviewer:
I have updated all the references to this image that I could find using code search. Not sure if I missed any.
Does this PR introduce a user-facing change?: