Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move "kubectl drain" into the server #25625

Closed
davidopp opened this issue May 15, 2016 · 73 comments
Closed

move "kubectl drain" into the server #25625

davidopp opened this issue May 15, 2016 · 73 comments
Labels
area/node-lifecycle Issues or PRs related to Node lifecycle lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@davidopp
Copy link
Member

random note: @roberthbailey had suggested it might be useful to have a "dry run" mode where you just ask which is the best node to drain, or best N of some set, but don't drain it. not sure how you express that using a REST API though. also can't remember what the use case was (it might have been for autoscaling scale-down, so you know which node is best to remove?)

as an aside, we need to consolidate all the issues related to this: #7351, #6080, #6079, #3885, ...

@davidopp davidopp added priority/backlog Higher priority than priority/awaiting-more-evidence. team/control-plane labels May 15, 2016
@roberthbailey
Copy link
Contributor

I think that applies to both autoscaling down and node upgrades.

/cc @mwielgus @ihmccreery

@resouer
Copy link
Contributor

resouer commented Jun 19, 2016

@roberthbailey I'm wondering is this really a common use case that worth doing? Could you explain with more details, for example, how to apply to autoscaling down?

@roberthbailey
Copy link
Contributor

When removing a node from a cluster (either because an autoscaler decides to free up space or a user requests the release of resources), we should drain the node before deleting it from the cluster. If we want to build this into automation (e.g. autoscaling), then we don't want to rely on the drain command only existing in the kubectl client that is intended to be used by a human.

@davidopp
Copy link
Member Author

@roberthbailey I agree with that, but I think the question might have been why the dry-run mode / ability to just ask which is the best node to drain but not actually drain it is useful. TBH I can't remember why you suggested it. Presumably it's because the client (e.g. autoscaler) wants to manage the drain itself?

@roberthbailey
Copy link
Contributor

I think it was for upgrades. But I don't recall why that would be better than just asking the server to do it.

@hjacobs
Copy link

hjacobs commented Feb 11, 2017

I would like to see kubectl drain on the server side. kubectl drain currently is unreliable for us and fails for many cases (e.g. when hitting Job resources on some node). We are evaluating the best approach to get proper "safe" autoscaling on AWS (we are already doing it in a non-safe manner with https://github.com/hjacobs/kube-aws-autoscaler).

@davidopp davidopp added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 11, 2017
@davidopp
Copy link
Member Author

davidopp commented Mar 4, 2017

One model for this: you ask the system to drain N nodes, give it some parameters controlling the choice and number simultaneous etc., and it picks the nodes and does the drains gives a callback to an HTTP endpoint you specify when it is done.

BTW Mesos machine maintenance model is described here
https://github.com/apache/mesos/blob/master/docs/maintenance.md

@erictune
Copy link
Member

erictune commented Apr 5, 2017

There would need to be a timeout or a way to cancel a drain that is taking too long.

@davidopp
Copy link
Member Author

@mml
Copy link
Contributor

mml commented Sep 8, 2017

One new compelling reason to do this is:

  1. We expect the operational definition of drain and cordon to change. E.g., Make "kubectl drain" use taint instead of Unschedulable #44944, but N.B. that at the moment, there is neither a clear roadmap nor timeline.
  2. This logic is embedded in kubectl right now (mea culpa).
  3. Authors of other clients (reasonably) want to treat drain and cordon as cluster primitives. Add drain and cordon functions. kubernetes-client/python-base#32

We now have to choose between responsibility for the correctness of these implementations (including version skew, matrix testing), disallowing them entirely, or deferring the correctness problems until later with disclaimers (i.e. create tech debt). None of these is satisfactory.

@mml
Copy link
Contributor

mml commented Sep 8, 2017

@davidopp or @timothysc per my last message, is there any chance this could be prioritized for 1.9 or maybe next year?

@timothysc
Copy link
Member

It's entirely based on resources and folks willing to show up and do the work.

@mml
Copy link
Contributor

mml commented Sep 12, 2017

@timothysc Can I extrapolate that as of right now, no one has shown up and indicated they wish to do this work?

@timothysc
Copy link
Member

@timothysc Can I extrapolate that as of right now, no one has shown up and indicated they wish to do this work?

Yes.

@timothysc timothysc added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Sep 14, 2017
@timothysc timothysc added this to the next-candidate milestone Sep 14, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2018
@redbaron
Copy link
Contributor

redbaron commented Jan 7, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2018
@Danil-Grigorev
Copy link
Member

@fabiand In sig-cloud-provider - cc @andrewsykim @cheftako

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2021
@florianstoeber
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2022
@redbaron
Copy link
Contributor

redbaron commented Jan 8, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 8, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@redbaron
Copy link
Contributor

redbaron commented Jun 7, 2022

/reopen

@k8s-ci-robot
Copy link
Contributor

@redbaron: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@yanirq: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@redbaron
Copy link
Contributor

redbaron commented Jun 8, 2022

OK, @k8s-ci-robot , you won. Of course lack of spamming on a well understood issue just waiting to be implemented is a clear sign that issue is not relevant anymore, I am with you. Good work.

@Abirdcfly
Copy link
Member

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Jun 8, 2022
@k8s-ci-robot
Copy link
Contributor

@Abirdcfly: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 8, 2022
@k8s-ci-robot
Copy link
Contributor

@davidopp: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Abirdcfly
Copy link
Member

@redbaron Please go on...😂
PS: I think the collaborator here is confusing🤔️

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/node-lifecycle Issues or PRs related to Node lifecycle lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests