Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Remove rescheduler and corresponding tests from master" #64592

Conversation

ravisantoshgudimetla
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla commented Jun 1, 2018

Reverts #64364

After discussing with @bsalamat on how DS controllers(ref: #63223 (comment)) cannot create pods if the cluster is at capacity and they have to rely on rescheduler for making some space, we thought it is better to

  • Bring rescheduler back.
  • Make rescheduler priority aware.
  • If cluster is full and if only DS controller is not able to create pods, let rescheduler be run and let it evict some pods which have less priority.
  • The DS controller pods will be scheduled now.

So, I am reverting this PR now. Step 2, 3 above are going to be in rescheduler.

/cc @bsalamat @aveshagarwal @k82cn

Please let me know your thoughts on this.

Revert #64364 to resurrect rescheduler. More info https://github.com/kubernetes/kubernetes/issues/64725 :)

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 1, 2018
@@ -0,0 +1,133 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a revert PR, let keep it as is :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, make sense!

testutils "k8s.io/kubernetes/test/utils"
imageutils "k8s.io/kubernetes/test/utils/image"

. "github.com/onsi/ginkgo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above k8s.io*

@k82cn
Copy link
Member

k82cn commented Jun 1, 2018

Bobby's concern is a good point :).
I went through the original PR and was curious why the e2e was passed. So I re-check the code, and found that critical pod (ExperimentalCriticalPodAnnotation) is still alpha feature :).

Anyway , let me also check whether it is enabled specially in test-infra :). If not enabled, I think that's safe for us to remove it.

BTW, did you grep yaml files to see whether we need replace critical pod with priority?

@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Jun 1, 2018

I went through the original PR and was curious why the e2e was passed. So I re-check the code, and found that critical pod (ExperimentalCriticalPodAnnotation) is still alpha feature :).

That is correct.

BTW, did you grep yaml files to see whether we need replace critical pod with priority?

I believe you are talking about critical pod annotation, as of now, by definition a critical pod is one which has either critical pod annotation or system-node-critical, system-cluster-critical priorities.

@k82cn
Copy link
Member

k82cn commented Jun 1, 2018

I believe you are talking about critical pod annotation

Yes, that's great if we already defined.

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2018
@bsalamat
Copy link
Member

bsalamat commented Jun 1, 2018

Yes, that's great if we already defined.

As Revi said, we have already changed our code to recognize system-node-critical and system-cluster-critical priorities as "critical". The annotation still works as well (the code is not removed). We have also changed critical pod manifests and have added appropriate priority class names in addition to the critical annotation.

@ravisantoshgudimetla
Copy link
Contributor Author

/retest

@bsalamat bsalamat added this to the v1.11 milestone Jun 2, 2018
@bsalamat bsalamat added status/approved-for-milestone priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 2, 2018
@bsalamat bsalamat added kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed milestone/incomplete-labels labels Jun 2, 2018
@bsalamat
Copy link
Member

bsalamat commented Jun 2, 2018

@ravisantoshgudimetla Please add a release note (NONE?)

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 3, 2018
@k82cn
Copy link
Member

k82cn commented Jun 3, 2018

Re-notes, prefer to "Revert #64364 to resurrect rescheduler."

LGTM, thanks very much :)

@ravisantoshgudimetla
Copy link
Contributor Author

/retest

@ravisantoshgudimetla
Copy link
Contributor Author

ping @ixdy @eparis @Random-Liu for review/approval.

/cc @sjenning @derekwaynecarr

@bsalamat
Copy link
Member

bsalamat commented Jun 4, 2018

ping
We need to get this approved ASAP. Thanks!

@eparis
Copy link
Contributor

eparis commented Jun 4, 2018

/approve

@dims
Copy link
Member

dims commented Jun 4, 2018

/assign @mikedanese @yujuhong

@ixdy
Copy link
Member

ixdy commented Jun 4, 2018

/approve

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@bsalamat @mikedanese @ravisantoshgudimetla @yujuhong

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/feature: New functionality.
Help

Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the rescheduler deprecated/removed and what conditions changed such that the rescheduler is needed again?

@@ -141,7 +141,7 @@ func (sp SyncPodType) String() string {
}

// IsCriticalPod returns true if the pod bears the critical pod annotation key or if pod's priority is greater than
// or equal to SystemCriticalPriority. Both the default scheduler and the kubelet use this function
// or equal to SystemCriticalPriority. Both the rescheduler(deprecated in 1.10) and the kubelet use this function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be fixed since rescheduler is no longer deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think the statement is still valid just that we will delete rescheduler in next release. This is a stop gap solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the long term fix (and if we rely on a deprecated feature, is it really deprecated?)

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find my comments below(#64592 (comment)). While I am not familiar with deprecation policies and this may not sound like exact reason, last release had rescheduler deprecated too but we relied on it for DS pods.

var totalMillicores int

BeforeEach(func() {
framework.Skipf("Rescheduler is being deprecated soon in Kubernetes 1.10")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is always skipped?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a correct test in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding the test back if we always skip it?

@ravisantoshgudimetla
Copy link
Contributor Author

Thanks @yujuhong for review.

Why was the rescheduler deprecated/removed

Default scheduler is supposed to do preemptions for all the pods there by eliminating need for rescheduler.

what conditions changed such that the rescheduler is needed again?

For pods generated by DS controller, the preemption is not happening as DS controller schedules its own pods, so we have a PR to change the behaviour of DS controller so that its pods are scheduled by default scheduler(#63223) but this feature is still in alpha making DS controller to schedule its own pods instead of default scheduler. We were initially hoping to make this beta but since the code-path was not sufficiently tested scheduling/apps sigs have decided to make it alpha. With this in picture, we had to bring back rescheduler. More explaination in here: (#63223 (comment))

@yujuhong
Copy link
Contributor

yujuhong commented Jun 4, 2018

@ravisantoshgudimetla thanks for the explanation! Could you create an issue with the explanation and plan to remove the rescheduler?

Assuming the tests will be removed/updated, the PR lgtm.
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, eparis, ixdy, ravisantoshgudimetla, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2018
@yujuhong
Copy link
Contributor

yujuhong commented Jun 4, 2018

Could you create an issue with the explanation and plan to remove the rescheduler?

Forgot to mention, could you add a link to the issue in your release note?

@ravisantoshgudimetla
Copy link
Contributor Author

Thanks @yujuhong for the review and approval.

Forgot to mention, could you add a link to the issue in your release note?

Done - #64725

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 63453, 64592, 64482, 64618, 64661). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 898831a into kubernetes:master Jun 4, 2018
@k8s-ci-robot
Copy link
Contributor

@ravisantoshgudimetla: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 872addf link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@zparnold
Copy link
Member

Hello there! @ravisantoshgudimetla I'm Zach Arnold working on Docs for the 1.11 release. This PR was identified as one needing some documentation in the https://github.com/kubernetes/website repo around your contributions (thanks by the way!) When you have some time, could you please modify/add/remove the relevant content that needs changing in our documentation repo? Thanks! Please let me or my colleague Misty know (@zparnold/@misty on K8s Slack) if you need any assistance with the documentation.

@ravisantoshgudimetla
Copy link
Contributor Author

@zparnold Apologies for not getting back earlier. I already have updated documents with these changes to rescheduler.

@ravisantoshgudimetla ravisantoshgudimetla deleted the revert-64364-remove-rescheduler branch June 26, 2018 14:20
@zparnold
Copy link
Member

zparnold commented Jun 27, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet