Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: interaction with unexpected node shutdown KEP #120421

Closed
pohly opened this issue Sep 5, 2023 · 8 comments · Fixed by #120965
Closed

DRA: interaction with unexpected node shutdown KEP #120421

pohly opened this issue Sep 5, 2023 · 8 comments · Fixed by #120965
Assignees
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pohly
Copy link
Contributor

pohly commented Sep 5, 2023

We need to explore how DRA interacts with https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown.

One likely outcome is that the DRA helper code watches for nodes with the special node.kubernetes.io/out-of-service=nodeshutdown:NoExecute label and helps DRA drivers clean up.

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 5, 2023
@lowang-bh lowang-bh removed their assignment Sep 9, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Sep 19, 2023

/assign

@bart0sh
Copy link
Contributor

bart0sh commented Sep 19, 2023

/triage important-soon
as this is a DRA Beta criteria and DRA is going to be graduated to Beta in 1.29

@k8s-ci-robot
Copy link
Contributor

@bart0sh: The label(s) triage/important-soon cannot be applied, because the repository doesn't have them.

In response to this:

/triage important-soon
as this is a DRA Beta criteria and DRA is going to be graduated to Beta in 1.29

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bart0sh
Copy link
Contributor

bart0sh commented Sep 19, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 19, 2023
@pohly
Copy link
Contributor Author

pohly commented Sep 26, 2023

@bart0sh
Copy link
Contributor

bart0sh commented Sep 26, 2023 via email

@bart0sh
Copy link
Contributor

bart0sh commented Oct 2, 2023

I was able to simulate non-graceful node shutdown by running docker stop <node name>.
It turned out that after adding node.kubernetes.io/out-of-service=nodeshutdown:NoExecute tent, all node pods are deleted by the GC controller and the resource claim controller successfully deallocated the resource.

So, it looks like a non-graceful shutdown is handled correctly and no code changes are required.

However, the fact that Deallocate can potentially be called without calling NodeUnprepareResource should be explained in the DRA KEP, so that DRA plugin authors will be aware of this.

I'll add an explanation of handling non-graceful node shoutdown to the DRA KEP and add a correspondent e2e test case to the DRA e2e test suite.

@bart0sh
Copy link
Contributor

bart0sh commented Oct 3, 2023

KEP update PR: kubernetes/enhancements#4260
E2E PR: #120965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.

4 participants