Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move of unreachable taint key out of alpha #54208

Merged
merged 2 commits into from
Nov 20, 2017

Conversation

resouer
Copy link
Contributor

@resouer resouer commented Oct 19, 2017

What this PR does / why we need it:

Move of unreachable taint key out of alpha, which already happened in community doc.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #54198

Special notes for your reviewer:
Please see #54198 for the context of this inconsistency.

Release note:

Move unreachable taint key out of alpha. 
Please note the existing pods with the alpha toleration should be updated by user himself to tolerate the GA taint.

@resouer resouer added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 19, 2017
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 19, 2017
@bsalamat
Copy link
Member

/lgtm

I wait for Klaus to take a look as well before approving, just in case.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2017
@@ -30,7 +30,7 @@ const (
// TaintNodeUnreachable would be automatically added by node controller
// when node becomes unreachable (corresponding to NodeReady status ConditionUnknown)
// and removed when node becomes reachable (NodeReady status ConditionTrue).
TaintNodeUnreachable = "node.alpha.kubernetes.io/unreachable"
TaintNodeUnreachable = "node.kubernetes.io/unreachable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this similar with not-ready? Add an new label, named DeprecateTaintNodeUnreachable.

Copy link
Contributor Author

@resouer resouer Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k82cn Not very sure about this actually, as unreachable seems to be internal used only.

@liggitt Do you think we need to address compatibility issue here, just like what we did for not-ready taint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt , any suggestion :).

Copy link
Member

@liggitt liggitt Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar issue to the other taint name change... the node controller currently adds this taint, so I'd expect it to remove the alpha taint, otherwise you can end up with a node with a permanent taint added by a kubernetes component

also, the DefaultTolerationSeconds admission plugin currently adds tolerations for the alpha taint (and isn't documented to be alpha or subject to change). What's the plan for avoiding disrupting pods admitted with tolerations by that plugin?

Copy link
Contributor Author

@resouer resouer Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in #51266 should be able to fix the first concern, i.e. auto-replace old taint to new taint by controller.

Can we do the same operation for DefaultTolerationSeconds? let the admission plugin replace old toleration with new one. @liggitt WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let the admission plugin replace old toleration with new one. @liggitt WDYT?

that doesn't update pods already in the system

Copy link
Contributor Author

@resouer resouer Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe add a admit plugin to fix pods tolerations? @liggitt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the best course of action... @davidopp how did you envision updating system-added tolerations on existing pods when the system-added taint changes names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly ping @davidopp , we are discussing the strategy to upgrade alpha taint keys without breaking existing user, and your advice would be highly valued here.

@k82cn
Copy link
Member

k82cn commented Oct 20, 2017

/approve cancel

@k82cn
Copy link
Member

k82cn commented Oct 20, 2017

/cc @liggitt

@k82cn
Copy link
Member

k82cn commented Oct 20, 2017

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 20, 2017
@k82cn
Copy link
Member

k82cn commented Oct 20, 2017

Just to stop merge progress before we get agreement on the PR :).

@BenTheElder
Copy link
Member

/retest

@BenTheElder
Copy link
Member

/retest

@davidopp
Copy link
Member

@gmarek

@davidopp
Copy link
Member

I'll take a look at this in the next few days, but if @gmarek has a comment it would be useful too.

@resouer
Copy link
Contributor Author

resouer commented Oct 27, 2017

Thanks David! @davidopp

@gmarek As @liggitt pointed out, the DefaultTolerationSeconds admission plugin currently adds tolerations for the alpha taint. So the pods already in the system are tend be suffered from this update.

One possible approach I am thinking is adding a admit to fix pods toleration. Do you think it's reasonable?

@bsalamat
Copy link
Member

@resouer Why don't you take a similar approach to notReady? Leave the alpha in place and mark it deprecated and add a new taint for non-alpha version.

@resouer
Copy link
Contributor Author

resouer commented Oct 30, 2017

@bsalamat The special problem here is the DefaultTolerationSeconds admission plugin currently adds tolerations for the alpha taint to Pods.

So I just proposed to adding a admit to fix pods tolerations. Does it sound good to you?

@bsalamat
Copy link
Member

@resouer Your description is a bit brief. So, I am not sure if your solution will handle the following scenario correctly:

  1. Assume that an existing replicaset is creating pods from an existing template which has toleration for the alpha taint.
  2. Nodes get the GA taint.
  3. A new pod for the replicaset is created with the old taint.

Will the new pod be scheduled?

@resouer
Copy link
Contributor Author

resouer commented Oct 31, 2017

@bsalamat I think that approach may solve the problem:

  1. User replaces Kubernetes binary to upgrade.
  2. The node controller replaces alpha taint to GA taint on node (like we did before).
  3. Then we write an admission plugin, it will check if the pod has old taint key, and replace it with new one (also, update the pod template of its controllerRef).

So the system should work.

Am I missed something?

@liggitt
Copy link
Member

liggitt commented Oct 31, 2017

we write an admission plugin, when it sees new pod comes in, it will check if the pod has old taint key, and replace it with new one

I don't think this is necessary... I'd only expect us to work to smoothly transition objects tainted/tolerated by in-tree components:

  • switch the existing admission plugin to write the non-alpha toleration
  • handle updating existing pods to swap the alpha toleration with a non-alpha toleration (or add the non-alpha toleration... I don't think existing tolerations can be removed)
  • update nodes to remove the alpha taint and add the non-alpha taint

I think out of tree components are responsible for stopping their own usage of alpha tolerations

@bsalamat
Copy link
Member

What @liggitt said makes sense. Given that this was an alpha taint, we don't need to avoid breaking external components that use the alpha taint in the future.

@resouer
Copy link
Contributor Author

resouer commented Nov 2, 2017

OK, so the scope is much clear now. I've already update the patch and amending unit tests now. Thank you for advice.

@bsalamat
Copy link
Member

bsalamat commented Nov 15, 2017

@resouer Can you please update the release note and mention that existing pods with the alpha toleration should be updated to tolerate the GA taint as well?

@resouer
Copy link
Contributor Author

resouer commented Nov 16, 2017

@bsalamat Updated. Will also mention this in google group if it's fine.

@resouer resouer changed the title [RFC] Move of unreachable taint key out of alpha Move of unreachable taint key out of alpha Nov 16, 2017
@bsalamat
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2017
@bsalamat
Copy link
Member

I think this PR requires "action required" label.

@dims
Copy link
Member

dims commented Nov 16, 2017

/assign @deads2k

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Current

@bsalamat @deads2k @k82cn @resouer

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/important-longterm: Escalate to the pull request owners; move out of the milestone after 1 attempt.
  • kind/cleanup: Adding tests, refactoring, fixing old bugs.
Help

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2017
@resouer
Copy link
Contributor Author

resouer commented Nov 17, 2017

@bsalamat I just updated the order of test cases to fix test failure after rebase, could you please apply label again?

I think this PR requires "action required" label.

@dims Do you know how to do that?

@resouer
Copy link
Contributor Author

resouer commented Nov 17, 2017

/test pull-kubernetes-unit

@bsalamat
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2017
@dims
Copy link
Member

dims commented Nov 18, 2017

/retest

@deads2k
Copy link
Contributor

deads2k commented Nov 20, 2017

You probably want to address the migration concern here: #54208 (comment)

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, deads2k, resouer

Associated issue: 54198

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 20, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit d0301aa into kubernetes:master Nov 20, 2017
@resouer
Copy link
Contributor Author

resouer commented Nov 21, 2017

@deads2k Yes, the current plan is Release Note + Announcement in kubernetes-dev

@resouer resouer deleted the rm-alpha branch November 21, 2017 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider moving TaintNodeUnreachable out of alpha