Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-graceful node shutdown #2268

Open
xing-yang opened this issue Jan 14, 2021 · 50 comments · Fixed by #3320
Open

Non-graceful node shutdown #2268

xing-yang opened this issue Jan 14, 2021 · 50 comments · Fixed by #3320
Assignees
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Milestone

Comments

@xing-yang
Copy link
Contributor

xing-yang commented Jan 14, 2021

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 14, 2021
@xing-yang
Copy link
Contributor Author

xing-yang commented Jan 14, 2021

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 14, 2021
@xing-yang
Copy link
Contributor Author

xing-yang commented Jan 14, 2021

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 14, 2021
@annajung annajung added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Jan 15, 2021
@annajung annajung added this to the v1.21 milestone Jan 15, 2021
@xing-yang xing-yang self-assigned this Jan 17, 2021
@annajung
Copy link
Member

annajung commented Jan 25, 2021

Hi @xing-yang, 1.21 enhancements lead here.
I see that you’ve opted in this enhancement into 1.21, but I also see that this is tagged with participation from the SIG node. Is that accurate? If so, is there work that SIG node must deliver in 1.21 as well?

@yastij
Copy link
Member

yastij commented Jan 25, 2021

Hi @annajung - we're still in the process of seeing which changes are needed for sig-node

@jrsapi
Copy link

jrsapi commented Feb 5, 2021

Greetings @xing-yang ,
This is Joseph v1.21 enhancement shadow following up. For the enhancement to be included in the 1.21 milestone, it must meet the following criteria:

The KEP must be merged in an implementable state
The KEP must have test plans
The KEP must have graduation criteria
The KEP must have a production readiness review

Starting v1.21, all KEPs must include a production readiness review. Please make sure to take a look at the instructions and follow all steps.

Thank you!

@jrsapi
Copy link

jrsapi commented Feb 8, 2021

Greetings @xing-yang,

Enhancements Freeze is 2 days away, Feb 9th EOD PST

Enhancements team is aware that KEP update is currently in progress (PR #1116). Please make sure to work on PRR questionnaires and requirements and get them merged before the freeze. For PRR related questions or to boost the PR for PRR review, please reach out in slack #prod-readiness

Any enhancements that do not complete the following requirements by the freeze will require an exception.

[IN PROGRESS] The KEP must be merged in an implementable state
[IN PROGRESS] The KEP must have test plans
[IN PROGRESS] The KEP must have graduation criteria
[IN PROGRESS] The KEP must have a production readiness review

@xing-yang
Copy link
Contributor Author

xing-yang commented Feb 8, 2021

Hi @jrsapi,
Thanks for the reminder! We still need more discussions to figure out some design issues. So it will not make it in 1.21.

@xing-yang xing-yang removed this from the v1.21 milestone Feb 8, 2021
@jrsapi jrsapi added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Feb 8, 2021
@fejta-bot
Copy link

fejta-bot commented May 9, 2021

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2021
@YuikoTakada
Copy link

YuikoTakada commented May 24, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2021
@YuikoTakada
Copy link

YuikoTakada commented May 24, 2021

Thank you for this issue.
it would be better to update this issue's description according to:

We are trying to get KEP merged as "Provisional" and continue with prototyping in 1.22. We want to do more testing before targeting Alpha as this is a complicated problem.

In 1.23, we'll target Alpha.

Thanks!

@k8s-triage-robot
Copy link

k8s-triage-robot commented Aug 22, 2021

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2021
@YuikoTakada
Copy link

YuikoTakada commented Aug 26, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2021
@xing-yang
Copy link
Contributor Author

xing-yang commented Aug 30, 2021

/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 30, 2021
@salaxander salaxander removed the tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team label Aug 31, 2021
@Priyankasaggu11929 Priyankasaggu11929 added the tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team label Jun 24, 2022
@alculquicondor
Copy link
Member

alculquicondor commented Jul 13, 2022

Could you add to the Alternatives section why we didn't change the behavior of node.kubernetes.io/unreachable to trigger pod GC? Then the behavior would be fully automated and users could still add the taint themselves.

@xing-yang xing-yang added the lead-opted-in Denotes that an issue has been opted in to a release label Sep 7, 2022
@xing-yang
Copy link
Contributor Author

xing-yang commented Sep 7, 2022

/milestone v1.26

@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Sep 7, 2022
@msau42 msau42 linked a pull request Sep 20, 2022 that will close this issue
@rhockenbury rhockenbury added tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team and removed tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team labels Sep 20, 2022
@marosset
Copy link
Contributor

marosset commented Sep 20, 2022

Hello @xing-ying 👋, 1.26 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PDT on Thursday 6th October 2022.

This enhancement is targeting for stage beta for 1.26 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: 1.26
  • KEP readme has a updated detailed test plan section filled out
  • KEP readme has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would just need to update the following:

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@rhockenbury
Copy link

rhockenbury commented Oct 3, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 3, 2022

@rhockenbury: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Oct 3, 2022
@marosset
Copy link
Contributor

marosset commented Oct 3, 2022

With #3320 merged, this enhancement is now tracked for the v1.26 release. Thanks!

@YuikoTakada
Copy link

YuikoTakada commented Oct 14, 2022

Is anyone working for e2e tests now?

yosshy added a commit to yosshy/kube-fencing that referenced this issue Oct 14, 2022
K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service"
that enables automatic deletion of pv-attached pods on failed nodes.
This patch makes fencing-controller adding it to a fenced node just after
the fencing job was successfully finished.

See the pages below for more detail:
kubernetes/enhancements#2268
kubernetes/kubernetes#108486
@mickeyboxell
Copy link

mickeyboxell commented Oct 20, 2022

@xing-yang Who is the right point of contact to create a PR for the k/website docs?

@sonasingh46
Copy link

sonasingh46 commented Oct 20, 2022

Hey @mickeyboxell . See if @sftim can provide some info on this.

@YuikoTakada
Copy link

YuikoTakada commented Oct 24, 2022

I created a PR about feature gate to k/website...is it OK? Should I close this?
kubernetes/website#37374

@rhockenbury
Copy link

rhockenbury commented Oct 29, 2022

Hi @xing-yang 👋,

Checking in as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

Please let me know what other PRs in k/k I should be tracking for this KEP.

As always, we are here to help should questions come up. Thanks!

@xing-yang
Copy link
Contributor Author

xing-yang commented Nov 1, 2022

Is anyone working for e2e tests now?

@YuikoTakada e2e test is merged: kubernetes/kubernetes#111380

@xing-yang
Copy link
Contributor Author

xing-yang commented Nov 1, 2022

@xing-yang Who is the right point of contact to create a PR for the k/website docs?

@mickeyboxell Sorry, I missed the ping earlier. Here are the doc PRs:
Placeholder doc PR: kubernetes/website#37374 Thanks @YuikoTakada!
Placeholder blog PR: kubernetes/website#37583

@YuikoTakada
Copy link

YuikoTakada commented Nov 2, 2022

As @rhockenbury said, it seems to need to update this issue's description to add the link kubernetes/kubernetes#111380 .

And also, it would be better to update enhancement target:

Enhancement target (which target equals to which milestone):

  • Alpha release target (x.y): 1.24
  • Beta release target (x.y): 1.25
  • Stable release target (x.y): 1.26
  • Beta release target (x.y): 1.26
  • Stable release target (x.y): 1.27

@xing-yang
Copy link
Contributor Author

xing-yang commented Nov 4, 2022

All code PRs for moving the feature to beta are merged:

@rhockenbury
Copy link

rhockenbury commented Nov 9, 2022

With those PRs merged, I have this marked as tracked for code freeze.

yosshy added a commit to yosshy/kube-fencing that referenced this issue Nov 26, 2022
K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service"
that enables automatic deletion of pv-attached pods on failed nodes.
This patch makes fencing-controller adding it to a fenced node just after
the fencing job was successfully finished.

See the pages below for more detail:
kubernetes/enhancements#2268
kubernetes/kubernetes#108486
yosshy added a commit to yosshy/kube-fencing that referenced this issue Nov 26, 2022
K8s v1.24 has a new well-known taint "node.kubernetes.io/out-of-service"
that enables automatic deletion of pv-attached pods on failed nodes.
This patch makes fencing-controller adding it to a fenced node just after
the fencing job was successfully finished.

See the pages below for more detail:
kubernetes/enhancements#2268
kubernetes/kubernetes#108486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Projects
Status: Graduating
Development

Successfully merging a pull request may close this issue.