Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nodeShutdownReason for node shutdown e2e #104540

Merged
merged 1 commit into from
Oct 20, 2021

Conversation

wzshiming
Copy link
Member

@wzshiming wzshiming commented Aug 24, 2021

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Which issue(s) this PR fixes:

The #102840 modified the Status.Reason of shutdown, but e2e did not modify it accordingly.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 24, 2021
@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 24, 2021
@wzshiming
Copy link
Member Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 25, 2021

@wzshiming: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-kubelet-serial 86acf3b link /test pull-kubernetes-node-kubelet-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@SergeyKanzhelev SergeyKanzhelev moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board Sep 1, 2021
@SergeyKanzhelev
Copy link
Member

/kind failing-test

@k8s-ci-robot k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 1, 2021
@SergeyKanzhelev
Copy link
Member

/assign @bobbypage

@pacoxu pacoxu added this to Triage in SIG Node PR Triage Sep 3, 2021
@pacoxu pacoxu removed this from Triage in SIG Node PR Triage Sep 3, 2021
@ehashman
Copy link
Member

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Sep 27, 2021
@k8s-ci-robot k8s-ci-robot removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 27, 2021
@ehashman
Copy link
Member

"GracefulNodeShutdown [Serial] [NodeAlphaFeature:GracefulNodeShutdown]" these tests are only running on their own CI tab and don't have a corresponding presubmit job. Can we just nix the NodeAlphaFeature selector and have them run as part of the serial job? This is beta now so the feature flag should be getting set, no? https://testgrid.k8s.io/sig-node-kubelet#kubelet-serial-gce-e2e-graceful-node-shutdown

I can't find any matching failures for this error message: https://storage.googleapis.com/k8s-triage/index.html?text=Expecting%20non-critcal%20pod%20to%20be%20shutdown%2C%20but%20it%27s%20not%20currently.

Test failures for this job all fail on waiting on /configz, so I think perhaps the jobs don't have a valid config to enable gracefulshutdown: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial-graceful-node-shutdown/1442540619628023808

/hold
/cc @bobbypage @rphillips @wgahnagl

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 27, 2021
@ehashman ehashman moved this from PRs - Needs Reviewer to PRs Waiting on Author in SIG Node CI/Test Board Sep 27, 2021
@bobbypage
Copy link
Member

bobbypage commented Sep 28, 2021

@ehashman I agree we should eventually remove this separate job and move into serial. However, I think we should first ensure to get this job fully stable and working and only after migrate this into the existing serial tests.

I think we still need this change as described in #104540 (comment)

The waiting on configz failure is because the job relies on DynamicKubeletConfig, but it was turned off by default in #102966

I will send separate PR to turn it back on for this job for now, so we can get it back to green.

In terms of this change here
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 28, 2021
@bobbypage
Copy link
Member

PR for enabling DynamicKubeletConfig to fix waiting on /configz errors - kubernetes/test-infra#23791

@bobbypage
Copy link
Member

kubernetes/test-infra#23791 was merged to fix configz errors.

@ehashman can we remove the hold on this so we can start fixing the test? Thanks!

@bobbypage
Copy link
Member

/assign @ehashman

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

�[1mSTEP�[0m: Verifying that non-critical pods are shutdown
Oct 19 22:21:05.208: INFO: Expecting non-critcal pod to be shutdown, but it's not currently. Pod: "period-120", Pod Status Phase: "Running", Pod Status Reason: "Terminated"

from

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial-graceful-node-shutdown/1450585087165861888

@ehashman
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ehashman, wzshiming

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2021
@k8s-ci-robot k8s-ci-robot merged commit 7128409 into kubernetes:master Oct 20, 2021
SIG Node CI/Test Board automation moved this from PRs Waiting on Author to Done Oct 20, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 20, 2021
@wzshiming wzshiming deleted the fix/node-shutdown-e2e branch October 20, 2021 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants