Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dra: update for Kubernetes 1.28 #41856

Merged
merged 1 commit into from Aug 7, 2023
Merged

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jul 3, 2023

@k8s-ci-robot k8s-ci-robot added this to the 1.28 milestone Jul 3, 2023
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 3, 2023
@k8s-ci-robot k8s-ci-robot requested a review from klueska July 3, 2023 14:56
@k8s-ci-robot k8s-ci-robot added the language/en Issues or PRs related to English language label Jul 3, 2023
@k8s-ci-robot k8s-ci-robot added the sig/docs Categorizes an issue or PR as relevant to SIG Docs. label Jul 3, 2023
@sftim
Copy link
Contributor

sftim commented Jul 5, 2023

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jul 5, 2023
@bart0sh bart0sh added this to WIP in SIG Node PR Triage Jul 12, 2023
@katcosgrove
Copy link
Contributor

Hello @pohly 👋 please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before Tuesday 25th July 2023. Thank you!

Several improvements under the hood which don't need to be documented here. API
changes (like storing generated resource claim names in the pod status) are
part of the generated API documentation.

What is worth mentioning because it was listed as "limitation" before is that
pre-scheduled pods are now supported better.
@netlify
Copy link

netlify bot commented Aug 2, 2023

👷 Deploy Preview for kubernetes-io-vnext-staging processing.

Name Link
🔨 Latest commit 6c926cb
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/64ca640d8f8c030008a310bf

@k8s-ci-robot k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 2, 2023
@pohly pohly changed the title WIP: dra: update for Kubernetes 1.28 dra: update for Kubernetes 1.28 Aug 2, 2023
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 2, 2023
@pohly
Copy link
Contributor Author

pohly commented Aug 2, 2023

PR updated. Sorry, I was on vacation and had to catch up this week.

However, it is better to avoid this because a Pod which is assigned to a node
blocks normal resources (RAM, CPU) that then cannot be used for other Pods
while the Pod is stuck. To make a Pod run on a specific node while still going
through the normal scheduling flow, create the Pod with a node selector that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
through the normal scheduling flow, create the Pod with a node selector that
through the normal scheduling flow, create the Pod with a `nodeSelector` that

Not quite sure if this is correct but you capitalize other api fields or used code style for those. Thought it looked a bit off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a node selector" refers to the general concept here, not the specific field. That is then shown in the example. I think using plain English is fine here and also used elsewhere.

@bart0sh bart0sh moved this from WIP to Needs Reviewer in SIG Node PR Triage Aug 3, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Aug 3, 2023

/cc

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready for tech review. The suggestions here wouldn't block a merge.

blocks normal resources (RAM, CPU) that then cannot be used for other Pods
while the Pod is stuck. To make a Pod run on a specific node while still going
through the normal scheduling flow, create the Pod with a node selector that
matches exactly the desired node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matches exactly the desired node:
exactly matches the desired node:

detects this and tries to make the Pod runnable by triggering allocation and/or
reserving the required ResourceClaims.

However, it is better to avoid this because a Pod which is assigned to a node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
However, it is better to avoid this because a Pod which is assigned to a node
However, it is better to avoid this because a Pod that is assigned to a node

future.
## Pre-scheduled Pods

When creating a Pod with `nodeName` already set, the scheduler gets bypassed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When creating a Pod with `nodeName` already set, the scheduler gets bypassed.
When you - or another API client - create a Pod with `.spec.nodeName` already set, the
scheduler gets bypassed.

nodeSelector:
kubernetes.io/hostname: name-of-the-intended-node
...
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally:

Suggested change
You may also be able to mutate the incoming Pod, at admission time, to unset the `.spec.nodeName`
field and to use a node selector instead.


When creating a Pod with `nodeName` already set, the scheduler gets bypassed.
If some ResourceClaim needed by that Pod does not exist yet, is not allocated
or not reserved for the Pod, then the kubelet will fail to run the Pod and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be great to add instruction on how to unstuck it. I think simply deleting the pod will do. The only thing - mention that if Pod is part of the replicaset, another instance may be created with the same issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are numerous reasons for what might be wrong, so listing all possible remediation here will not be possible. The goal in this section is more about raising awareness of the problem ("don't do this"!) and explain that some automatic mitigation is available now through the kube-controller-manager changes.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, thanks

@sftim
Copy link
Contributor

sftim commented Aug 7, 2023

Thanks

/lgtm
/approve

Fixup PRs welcome!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9f280718ce19836c8db17404415c3fe8328bca25

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SergeyKanzhelev, sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit d9091f3 into kubernetes:dev-1.28 Aug 7, 2023
7 checks passed
SIG Node PR Triage automation moved this from Needs Reviewer to Done Aug 7, 2023
pohly pushed a commit to pohly/website that referenced this pull request Aug 8, 2023
This is a follow-up to kubernetes#41856 with
the suggested enhancements.
@pohly
Copy link
Contributor Author

pohly commented Aug 8, 2023

Fixup PRs welcome!

Darn, too slow! Sorry, I should have update the PR yesterday. I was still catching after my vacation. See #42445 for the follow-up.

Rishit-dagli pushed a commit to Rishit-dagli/website that referenced this pull request Aug 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants