Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pods with a non-empty .pod.nodeName by replacing with node affinity #28

Merged

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Mar 25, 2024

What this PR does / why we need it:
Creating pods with both scheduling gates and a non-empty .spec.nodeName is forbidden in Kubernetes. This causes an issue since AAQ's webhook adds scheduling gates for certain pods, which might have a non-empty nodeName set.

With this PR's changes, the pod's .spec.nodeName would be dropped and instead a node affinity targeting the same node would be added.

For example, a pod with the following spec:

spec:
  nodeName: node01

would be changed to:

spec:
  # nodeName: node01 --> this field is now dropped
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - node01

This trick allows AAQ to support pods with a target node set.

Note to reviewer:
In the first commit, I've imported kubevirt's patch package to ensure the patching code remains elegant.

Release note:

Support pods with a non-empty .pod.nodeName by replacing with node affinity

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Mar 25, 2024
Signed-off-by: Itamar Holder <iholder@redhat.com>
@Barakmor1
Copy link
Member

Thanks a lot for your PR! 😊 I'm really grateful for your help.

I'd prefer not to add kubevirt/kubevirt as a requirement if possible. It could mean doing some extra steps, like asking for changes in kubevirt's code, making a new version of kubevirt, and then updating our project here. Let's try to keep things simple and make sure we can use the new stuff easily. Thanks again for your work!

Signed-off-by: Itamar Holder <iholder@redhat.com>
Since it's not allowed to create a pod with both
scheduling gates and a nodeName being set, the
nodeName would be replaced with node affinity.

Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101 iholder101 force-pushed the bugfix/support-pods-with-nodename branch from e3d7c28 to e142ed4 Compare March 25, 2024 12:53
@iholder101
Copy link
Contributor Author

Thanks a lot for your PR! 😊 I'm really grateful for your help.

I'd prefer not to add kubevirt/kubevirt as a requirement if possible. It could mean doing some extra steps, like asking for changes in kubevirt's code, making a new version of kubevirt, and then updating our project here. Let's try to keep things simple and make sure we can use the new stuff easily. Thanks again for your work!

Thanks for reviewing :)
Done. PTAL

@Barakmor1
Copy link
Member

Thanks for the important improvement :)
I wonder if kubernetes folks could somehow provide support for .spec.nodeName with schedulingGate.
@vladikr WDYT?

/approve

We can track this here

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Barakmor1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2024
@alculquicondor
Copy link

Why were you setting a nodeName in the first place?

Setting a nodeName complete bypasses the scheduler, which could lead to over commitment of nodes.

There is no point in supporting nodeName and scheduling gates, as the whole point of scheduling gates is to prevent scheduling. Whereas setting a nodeName is the equivalent of scheduling.

@iholder101
Copy link
Contributor Author

Why were you setting a nodeName in the first place?

Setting a nodeName complete bypasses the scheduler, which could lead to over commitment of nodes.

There is no point in supporting nodeName and scheduling gates, as the whole point of scheduling gates is to prevent scheduling. Whereas setting a nodeName is the equivalent of scheduling.

Hey @alculquicondor! Thanks for your reply!

Firstly, I agree that setting .spec.nodeName is a bad idea. I'm guessing that this field is very old and being kept around for backward compatibility reasons, since using affinity is always a better idea (not even talking about the fact that it gets populated when the pod schedules, as if it's part of the pod's status..).

Secondly, it wasn't me who set the nodeName. This operator is being installed on various environments, some of them are running pods with a nodeName set. While you're right that it's probably a bad idea to set this field, I still want AAQ to be able to run on such environments. I think you'd agree that if such a pod is being created, then being mutated by our hook, then the creation fails, is not a great behavior from our end.

Lastly, I fail to understand why it is restricted to set a nodeName alongside scheduling gates in the first place. This basically means that dynamic quota management (which is the only use-case specified in the scheduling gates KEP) is not possible for pods that set a nodeName. It seems to me that it could have been solved easily by saying "if there's a scheduling gate - the pod is not ready for scheduling - whether it has or doesn't have a nodeNamde set".

WDYT?

@vladikr
Copy link
Member

vladikr commented Mar 26, 2024

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2024
@kubevirt-bot kubevirt-bot merged commit a54b7c0 into kubevirt:main Mar 26, 2024
5 checks passed
@Barakmor1
Copy link
Member

/cherry-pick release-v1.1

@kubevirt-bot
Copy link
Contributor

@Barakmor1: new pull request created: #31

In response to this:

/cherry-pick release-v1.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

iholder101 added a commit to iholder101/application-aware-quota that referenced this pull request May 2, 2024
This reverts recently added .spec.nodeName handling.
Specifically, reverts the following commits:

66885ca
9198f4d
1117aff
441f0e4
e142ed4
be02d00
e7cd4d5
7c5d9a0
09c4d1d
9ceac3b
ce6b0ba
77a3e1d
4c1c0b5
5fe072f

Signed-off-by: Itamar Holder <iholder@redhat.com>
kubevirt-bot pushed a commit to kubevirt-bot/applications-aware-quota that referenced this pull request May 2, 2024
This reverts recently added .spec.nodeName handling.
Specifically, reverts the following commits:

66885ca
9198f4d
1117aff
441f0e4
e142ed4
be02d00
e7cd4d5
7c5d9a0
09c4d1d
9ceac3b
ce6b0ba
77a3e1d
4c1c0b5
5fe072f

Signed-off-by: Itamar Holder <iholder@redhat.com>
kubevirt-bot pushed a commit to kubevirt-bot/applications-aware-quota that referenced this pull request May 2, 2024
This reverts recently added .spec.nodeName handling.
Specifically, reverts the following commits:

66885ca
9198f4d
1117aff
441f0e4
e142ed4
be02d00
e7cd4d5
7c5d9a0
09c4d1d
9ceac3b
ce6b0ba
77a3e1d
4c1c0b5
5fe072f

Signed-off-by: Itamar Holder <iholder@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants