Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requiredDuringSchedulingRequiredDuringExecution status #96149

Open
Treyone opened this issue Nov 3, 2020 · 43 comments
Open

requiredDuringSchedulingRequiredDuringExecution status #96149

Treyone opened this issue Nov 3, 2020 · 43 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@Treyone
Copy link

Treyone commented Nov 3, 2020

requiredDuringSchedulingRequiredDuringExecution is mentioned in the documentation as "In the future we plan to offer",
but I could not find any plan to do so in a near future. None of the workarounds I found until now could help us fixing a scheduling problem where requiredDuringSchedulingIgnoredDuringExecution is insufficient. We have a constraint for some replicas of a deployment to be co-located on the same node. requiredDuringSchedulingIgnoredDuringExecution does the job for initial deployment, but whenever one of these pods crashes, he will most probably be re-scheduled somewhere else, crashing the whole application.

What is also disturbing is that I found some resources mentioning it as working and when I tried, there was no error
when I applied it: the affinity section was just left blank without notice, both in the updated deployment and pods. If the feature is not
supported, I doubt this is the intended behavior, is it?

@Treyone Treyone added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 3, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 3, 2020
@k8s-ci-robot
Copy link
Contributor

@Treyone: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 3, 2020
@neolit123
Copy link
Member

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 3, 2020
@ahg-g
Copy link
Member

ahg-g commented Nov 3, 2020

What is also disturbing is that I found some resources mentioning it as working

Are you referring to k8s docs? if so, please flag those so we can fix the docs.

@ahg-g
Copy link
Member

ahg-g commented Nov 3, 2020

I believe requiredDuringSchedulingRequiredDuringExecution is simply ignored everywhere in the code, right @Huang-Wei?

@Huang-Wei
Copy link
Member

I believe requiredDuringSchedulingRequiredDuringExecution is simply ignored everywhere in the code, right @Huang-Wei?

Correct, it's not implemented yet. And due to its complexity, we don't have a concrete timeline to promise when to get it implemented.

@Treyone
Copy link
Author

Treyone commented Nov 4, 2020

What is also disturbing is that I found some resources mentioning it as working

Are you referring to k8s docs? if so, please flag those so we can fix the docs.

No, the doc is clear on the "planned" status. That were SO discussions like this one.
Although I have no solid alternative, I appreciate the clarification on the fact that it's not yet planned. In that case, I think it would be wise to throw an error when it's found in the YAML.

@alculquicondor
Copy link
Member

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Nov 4, 2020
@alculquicondor
Copy link
Member

Although I have no solid alternative, I appreciate the clarification on the fact that it's not yet planned. In that case, I think it would be wise to throw an error when it's found in the YAML.

It should during Pod creation, as the field doesn't exist in the API. Is this not the case?

As for the implementation, it's more on kubelet side to do. This is somewhat similar to Taint based evictions, which, AFAIK, had several implementation issues.

@lingsamuel
Copy link
Contributor

@alculquicondor Could you explain more about the "implementation issues"? I read some previous discussion, I think these conversations indicate the kube-scheduler is not the right place to handle this feature, but I think we can create a NodeAffinityManager like TaintManager

@alculquicondor
Copy link
Member

cc @damemi who was involved in taint based evictions

@damemi
Copy link
Contributor

damemi commented Jan 5, 2021

I wasn't very involved in the early design for TBE, but part of the legwork to get it to GA was officially handing it off to the node team for ownership.

So I agree with @alculquicondor here (and what you said, @lingsamuel) that this is probably similarly out of the scheduler's scope. This is because the scheduler is generally not concerned with pods after they've been placed on a node.

Some of the discussion around this has suggested it to be explicitly implemented in descheduler's NodeAffinity strategy, at least temporarily until a core solution can be agreed upon. I think that should ultimately be something similar to TaintManager.

@lingsamuel
Copy link
Contributor

I think the only thing related to DuringExecution that may change after a pod is scheduled is node labels (while pod's NodeAffinity and node's metadata.name is immutable at runtime). So IMO it's a node lifecycle problem.
I made a quick NodeAffinityManager prototype today, the implementation is simple and the basic case works well. If it's an acceptable solution, I would like to create a PR later.

@damemi
Copy link
Contributor

damemi commented Jan 5, 2021

@lingsamuel sure, opening a PR doesn't hurt :) This may be big enough to merit its own KEP though, @alculquicondor wdyt?

@alculquicondor
Copy link
Member

Definitively requires a KEP, but not up to sig-scheduling (us) to approve

@lingsamuel
Copy link
Contributor

lingsamuel commented Jan 28, 2021

@Treyone I am writing a KEP user-story, could you share more information about your usecase?

When re-reading your description, I am confused why the scheduler couldn't re-schedule the crashed pod to the previous node.
If the node selector matches only 1 node, the crashed pod should also be scheduled to that node. If the node selector matches multiple nodes, AFAIK, there is no guarantee that these pods will be scheduled to the same node.

@Treyone
Copy link
Author

Treyone commented Jan 28, 2021

The affiinity rule I used is the following:

      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: my-app-role
                operator: In
                values:
                - master
            topologyKey: kubernetes.io/hostname

I need this pod to be co-located on the same host as the pod with this label. There's only one pod with this label, so I think only one node matched the rule, but maybe I'm missing something ?

@alculquicondor
Copy link
Member

There are 2 Pods, say A and B. The rule above sets a dependency B->A

If A gets removed, requiredDuringSchedulingIgnoredDuringExecution won't cause B to be rescheduled. If B gets removed, and a B2 replacement comes in, it should go into the same Node.

@lingsamuel
Copy link
Contributor

🤔 well I misunderstanding the context. I thought we are talking about the NodeAffinity.
Personally, I don't think the runtime constraint of PodAffinity can be implemented due to algorithm complexity... The RequiredDuringExecution is also only mentioned in the node affinity section.

@Huang-Wei
Copy link
Member

well I misunderstanding the context

Indeed... What this issue is talking about is the "...RequiredDuringExecution" semantics of Pod(Anti)Affinity, which can be heavily relevant with performance - for any new pod and existing pod's update, you have to check if that breaks existing Pod(Anti)Affinity constraints.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2021
@lingsamuel
Copy link
Contributor

/remove-lifecycle stale

@lingsamuel
Copy link
Contributor

@lingsamuel are you working on this?

I am waiting someone to review the kep kubernetes/enhancements#2359

@shay-berman
Copy link

@lingsamuel @alculquicondor what is the status of this issue?
Indeed having requiredDuringSchedulingRequiredDuringExecution is needed as mentioned when you have pod that must run with other pods on the same node.

Is there a work around until requiredDuringSchedulingRequiredDuringExecution will be available?

@alculquicondor
Copy link
Member

You can check the KEP or take if over if nobody is working on it.

@shay-berman
Copy link

@alculquicondor the KEP kubernetes/enhancements#2359 is about node affinity and not pod affinity.
Do you know if there is any propose or KEP about pod requiredDuringSchedulingRequiredDuringExecution?

@alculquicondor
Copy link
Member

I'm not aware of any. Feel free to work on it. But note that revisions fall under sig/node, with sig/scheduling input.

@ffromani
Copy link
Contributor

please make sure the KEP is recorded in the sig-node tracker and introduced in the sig-node mtg, because I can't recall this being mentioned in recent meetings (may very well be my fualt though). Anyway, since we have a KEP and a k/e issue it seems this issue is no longer needed.
/close

@k8s-ci-robot
Copy link
Contributor

@fromanirh: Closing this issue.

In response to this:

please make sure the KEP is recorded in the sig-node tracker and introduced in the sig-node mtg, because I can't recall this being mentioned in recent meetings (may very well be my fualt though). Anyway, since we have a KEP and a k/e issue it seems this issue is no longer needed.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ffromani
Copy link
Contributor

added the KEP linked here to the sig-node feature backlog document

@shay-berman
Copy link

what is the status of this feature in k8s? @fromanirh

@ffromani
Copy link
Contributor

what is the status of this feature in k8s? @fromanirh

AFAIK no changes since last update

@AxeZhan
Copy link
Member

AxeZhan commented Nov 6, 2023

/reopen
I'd like to continue this in next release :). I've created a new issue(#121758) to track and drafted a simple kep.

@k8s-ci-robot
Copy link
Contributor

@AxeZhan: Reopened this issue.

In response to this:

/reopen
I'd like to continue this in next release :). I've created a new issue(#121758) to track and drafted a simple kep.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Nov 6, 2023
@AxeZhan
Copy link
Member

AxeZhan commented Nov 8, 2023

For anyone who is still interested in this issue. I've created a new kep with POC code in k/k. Hope can get some reviewers/mentors from here 😊

@AxeZhan
Copy link
Member

AxeZhan commented Nov 15, 2023

@ffromani @damemi Anyone of you interested in reviewing the KEP for this issue?
Pinged since you are involved in this issue before, sorry for interrupting.

@ffromani
Copy link
Contributor

@ffromani @damemi Anyone of you interested in reviewing the KEP for this issue? Pinged since you are involved in this issue before, sorry for interrupting.

I can review and I'm quite happy to but I'm not a scheduling expert, so my LGTM will likely be non-binding.

@alculquicondor
Copy link
Member

Note that this mostly a sig-node feature, so try to get a review from that SIG first.

@AxeZhan
Copy link
Member

AxeZhan commented Dec 5, 2023

Well, after the discussion in KEP. The main concern is, why is it necessary to add a controller in k8s when the descheduler can do the same thing?
And honestly, I myself don't have a strong point of view to support the idea of adding a controller.
For me, I prefer kubernets to support this feature natively, because I don’t want to spend more time learning a new component (descheduler). But I don't know if this is worth it, because it will also increase the complexity of k/k.

So I would like to ask for your opinions @alculquicondor @Huang-Wei .
Do you think it is still necessary for the scheduler to implement this feature? Or should we just delete the commented code in scheduler and let the descheduler take over this feature.

@alculquicondor
Copy link
Member

To me, this sounds like a basic feature that should be part kubernetes. We already have a similar mechanism for taints.

For me, I prefer kubernets to support this feature natively, because I don’t want to spend more time learning a new component (descheduler). But I don't know if this is worth it, because it will also increase the complexity of k/k.

That is very hard to balance.

I think we should implement it, but it's not critical.

@ffromani
Copy link
Contributor

ffromani commented Dec 5, 2023

Well, after the discussion in KEP. The main concern is, why is it necessary to add a controller in k8s when the descheduler can do the same thing? And honestly, I myself don't have a strong point of view to support the idea of adding a controller.

my 2c (FWIW): I raised this comment on the KEP review, but I actually don't have any strong opinion either way. If we move the functionality in core k/k, we however need a good and clear rationale about why we do so (which is, again, something I'm fine with but which deserves a rationale), and make very clear what's the overlap with scheduler, transition plan (for users of descheduler) and if/how the feature and descheduler will coexist or not.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2024
@kerthcet
Copy link
Member

kerthcet commented Mar 5, 2024

/remove-lifecycle stale
I think @AxeZhan is working on this.
/assign @AxeZhan

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.