-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-1948: Adding KEP for allowing deallocate in device plugin API call #1949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| kep-number: 1948 | ||
| alpha: | ||
| approver: "@deads2k" |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,215 @@ | ||||||
| # KEP-20201808: Add Deallocate and PostStopContainer to device plugin API | ||||||
|
|
||||||
| ## Table of Contents | ||||||
|
|
||||||
| <!-- toc --> | ||||||
| - [Release Signoff Checklist](#release-signoff-checklist) | ||||||
| - [Summary](#summary) | ||||||
| - [Motivation](#motivation) | ||||||
| - [Goals](#goals) | ||||||
| - [Non-Goals](#non-goals) | ||||||
| - [Proposal](#proposal) | ||||||
| - [Risks and Mitigations](#risks-and-mitigations) | ||||||
| - [Design Details](#design-details) | ||||||
| - [Test Plan](#test-plan) | ||||||
| - [Graduation Criteria](#graduation-criteria) | ||||||
| - [Alpha -> Beta Graduation](#alpha---beta-graduation) | ||||||
| - [Beta -> GA Graduation](#beta---ga-graduation) | ||||||
| - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||||||
| - [Version Skew Strategy](#version-skew-strategy) | ||||||
| - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | ||||||
| - [Feature Enablement and Rollback](#feature-enablement-and-rollback) | ||||||
| - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) | ||||||
| - [Monitoring Requirements](#monitoring-requirements) | ||||||
| - [Dependencies](#dependencies) | ||||||
| - [Scalability](#scalability) | ||||||
| - [Troubleshooting](#troubleshooting) | ||||||
| - [Implementation History](#implementation-history) | ||||||
| - [Drawbacks](#drawbacks) | ||||||
| <!-- /toc --> | ||||||
|
|
||||||
| ## Release Signoff Checklist | ||||||
|
|
||||||
| Items marked with (R) are required *prior to targeting to a milestone / release*. | ||||||
|
|
||||||
| - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | ||||||
| - [ ] (R) KEP approvers have approved the KEP status as `implementable` | ||||||
| - [ ] (R) Design details are appropriately documented | ||||||
| - [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input | ||||||
| - [ ] (R) Graduation criteria is in place | ||||||
| - [ ] (R) Production readiness review completed | ||||||
| - [ ] Production readiness review approved | ||||||
| - [ ] "Implementation History" section is up-to-date for milestone | ||||||
| - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||||||
| - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes | ||||||
|
|
||||||
| [kubernetes.io]: https://kubernetes.io/ | ||||||
| [kubernetes/enhancements]: https://git.k8s.io/enhancements | ||||||
| [kubernetes/kubernetes]: https://git.k8s.io/kubernetes | ||||||
| [kubernetes/website]: https://git.k8s.io/website | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| This KEP proposes adding two extra API calls: | ||||||
| - `Deallocate`: (Optional). Which is the opposite of allocate, and is needed to inform device plugins that some devices are no longer being used. | ||||||
| - `PostStopContainer`: (Optional). Which allow the device plugins to do device cleanup, driver unloading, and any other actions that may be needed. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit
Suggested change
|
||||||
|
|
||||||
| Since both additions are optional, existing device plugins should continue functioning properly with no needed modifications. Only device plugins that wish to utilize the new API calls will need to be modified. | ||||||
|
|
||||||
| ## Motivation | ||||||
|
|
||||||
| The following are some use cases and motivations for the proposed change: | ||||||
| - `PostStopContainer`: | ||||||
| - For use with some devices like FPGAs. Devices like these will need to be cleaned up (i.e. de-programmed) after each use. Otherwise they run the possibility of 2 risks: | ||||||
| * If whatever is programmed on the FPGA is not cleaned up, it will keep running and consuming power for no reason, on a large scale (datacenter scale) this is unacceptable. | ||||||
| * If whatever is programmed on the FPGA has network access, it runs the risk of continuing to send and respond to packets and pollute the network. | ||||||
| - For dynamically binding/unbinding drivers for the devices as needed. | ||||||
| - `Deallocate`: | ||||||
| - For use with complex device plugins that require tracking the state of their devices and learning when they are no longer in use. For example, multi modal devices. Multi modal devices can operate in more than one mode of operation, and thus have to be advertised by the device plugin as two separate devices, and the device plugin has to take care to stop advertising a device when its being used in the other mode, and so on. An example of multi modal devices is also using FPGAs in the following 2 modes: | ||||||
| * Use the entire FPGA as a device | ||||||
| * Split the FPGA between multiple users, essentially advertising one FPGA as multiple smaller FPGAs. | ||||||
| * A device plugin needs to know when a full FPGA will stop being used so it can go back to advertise the FPGA partitions, and vice versa. | ||||||
| - To maintain the same logical splitting of `Allocate` and `PreStartContainer` | ||||||
|
|
||||||
| ### Goals | ||||||
|
|
||||||
| - Add the `PostStopContainer` and `Deallocate` API calls to the device plugin API. | ||||||
| - Make the new added API calls optional, as they are not needed for all devices. | ||||||
| - Maintain compatibility with existing device plugins. | ||||||
|
|
||||||
| ### Non-Goals | ||||||
|
|
||||||
| - Make any modifications to the main API calls of the device plugin API. | ||||||
| - Make changes specific to one type of devices. | ||||||
|
|
||||||
| ## Proposal | ||||||
|
|
||||||
| The device plugin API includes API calls for: | ||||||
| - `Allocate`: Which is used to instruct device plugins to allocate device(s) to requesting containers. | ||||||
| - `PreStartContainer`: (Optional). Which allow the device plugins to do device initialization, loading drivers, and any other initialization actions that may be needed. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| This KEP proposes adding two extra API calls, maintaining the same logical reasoning of the previous two. Those are: | ||||||
| - `Deallocate`: (Optional). Which is the opposite of allocate, and is needed to inform device plugins that some devices are no longer being used (this used to happen silently before) | ||||||
| - `PostStopContainer`: (Optional). Which allow the device plugins to do device cleanup, driver unloading, and any other cleanup actions that may be needed. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| Since both additions are optional, existing device plugins should continue functioning properly with no needed modifications. Only device plugins that wish to utilize the new API calls will need to be modified. | ||||||
|
|
||||||
| ### Risks and Mitigations | ||||||
|
|
||||||
| Only risk is breaking existing device plugins by introducing non optional changes. Can be mitigated by enough test coverage. | ||||||
|
|
||||||
| ## Design Details | ||||||
|
|
||||||
| - Move `PreStartContainer` in `DeviceManager` to be used as a container lifecycle hook. (This change isn't truly required, but useful for compatibility and organization with the next steps). | ||||||
| - Add `PostStopContainer` and `Deallocate` calls in the DevicePlugin API. | ||||||
| - Add `PostStopContainer` as a container lifecycle hook. | ||||||
| - Add `Deallocate` calls in container manager, taking care to only do so for devices that are no longer in the reuse list. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will make it asymmetrical with the If we do go this route, I'd say we also need to do this for I'm wondering if the right answer here is actually to keep both of these symmetric with their I imagine the plugin to then do something different depending on which type of container the call is coming in for.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bart0sh I agree, will update this as soon as I can.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mewais any updates? I can help if needed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does anyone know if there is any precedent for this, i.e. specifying the container-type (init vs. app) over a gRPC call? It would be good to keep already-accepted semantics if so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And maybe we have other approaches to resolve the difference between Just for an example, we may have two 'post-stop' interfaces, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @klueska friendly ping, could you check the previous comments from @Windrow14 ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
👍 this makes sense to me to avoid confusion and complication. @mewais is not responding, how can we proceed this?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You can create your own PR and continue with it I guess. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @FengGaoCSC @Windrow14 could you create PR with |
||||||
| - Add and modify test cases for both calls. | ||||||
| - Test with existing device plugins to ensure the changes are non-breaking. | ||||||
| - Test with new device plugins utilizing such changes to ensure the changes are working. | ||||||
|
|
||||||
| ### Test Plan | ||||||
|
|
||||||
| - Unit tests will be updated to include the new API calls | ||||||
| - E2E tests should be added with a sample device plugin for verification. | ||||||
|
|
||||||
| ### Graduation Criteria | ||||||
|
|
||||||
| #### Alpha -> Beta Graduation | ||||||
|
|
||||||
| - Gather feedback from developers and surveys, specially about the device reuse and possible alternatives. | ||||||
| - Complete implementation for the new API calls. | ||||||
| - Tests are in Testgrid and linked in KEP. | ||||||
|
|
||||||
| #### Beta -> GA Graduation | ||||||
|
|
||||||
| - More rigorous testing as needed/discussed by developers. | ||||||
| - Larger scale use/testing by interested users with no reported major bugs. | ||||||
|
|
||||||
| ### Upgrade / Downgrade Strategy | ||||||
|
|
||||||
| As part of the device plugin API, this will follow the same API versioning system. This means that it is up to an application (a device plugin) to choose the required API version it wants. As long as the cluster has a recent enough (to include the required API version) kubernetes, upgrade or downgrades require no cluster modifications at all, and are decided on the application level. | ||||||
|
|
||||||
| ### Version Skew Strategy | ||||||
|
|
||||||
| As part of the device plugin API, this will follow the same API versioning system. This means that it is up to an application (a device plugin) to choose the required API version it wants. No version skew issues will arise. | ||||||
|
|
||||||
| ## Production Readiness Review Questionnaire | ||||||
|
|
||||||
| ### Feature Enablement and Rollback | ||||||
|
|
||||||
| * **How can this feature be enabled / disabled in a live cluster?** | ||||||
| - The API versioning meechanism can be usedto enable/disable this feature per application as needed. The feature itself also adds optional API calls, so enabling/disabling the features is not really required. | ||||||
| - No downtime required for enabling/disabling this feature. | ||||||
|
|
||||||
| * **Does enabling the feature change any default behavior?** | ||||||
| No. The feature is optional, no default behavior will change. | ||||||
|
|
||||||
| * **Can the feature be disabled once it has been enabled (i.e. can we roll back | ||||||
| the enablement)?** | ||||||
| yes. the feature is optional, so not using it should suffice. Fully disabling/rolling back | ||||||
| can happen by simply using an older version of the API from the application side. | ||||||
|
|
||||||
| * **What happens if we reenable the feature if it was previously rolled back?** | ||||||
| No side effects expected. | ||||||
|
|
||||||
| * **Are there any tests for feature enablement/disablement?** | ||||||
| No | ||||||
|
|
||||||
| ### Rollout, Upgrade and Rollback Planning | ||||||
|
|
||||||
| _This section must be completed when targeting beta graduation to a release._ | ||||||
|
|
||||||
| * **How can a rollout fail? Can it impact already running workloads?** | ||||||
| No effect on already running workloads. Feature has to be specifically enabled/requested | ||||||
| from the application side. | ||||||
|
|
||||||
| * **Is the rollout accompanied by any deprecations and/or removals of features, APIs, | ||||||
| fields of API types, flags, etc.?** | ||||||
| No | ||||||
|
|
||||||
| ### Monitoring Requirements | ||||||
|
|
||||||
| * **How can an operator determine if the feature is in use by workloads?** | ||||||
| Currently, as far as I know, there's no way to monitor device plugin API calls (and | ||||||
| whether or not devices are in use) except by checking logs from the kubelet iself and/or | ||||||
| user created device plugins. | ||||||
|
|
||||||
| * **What are the SLIs (Service Level Indicators) an operator can use to determine | ||||||
| the health of the service?** | ||||||
| N/A (needs checking) | ||||||
|
|
||||||
| * **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** | ||||||
| N/A | ||||||
|
|
||||||
| * **Are there any missing metrics that would be useful to have to improve observability | ||||||
| of this feature?** | ||||||
| N/A | ||||||
|
|
||||||
| ### Dependencies | ||||||
|
|
||||||
| * **Does this feature depend on any specific services running in the cluster?** | ||||||
| No | ||||||
|
|
||||||
| ### Scalability | ||||||
|
|
||||||
| * **Will enabling / using this feature result in any new API calls?** | ||||||
| The mere enablement of this feature has no effect. However, using this API by user | ||||||
| applications may result in extra calls to the Device Manager API. These extra calls | ||||||
| only happen at the end of the lifecycle of some containers (those who have previously | ||||||
| requested devices). | ||||||
| The extra API calls originate from the kubelet to the device plugin running on the | ||||||
| same node. Since they are only happening on the node level, there's no risk of | ||||||
| congestion, or a need to measure their throughput, etc. | ||||||
|
|
||||||
| ### Troubleshooting | ||||||
|
|
||||||
| Detection of failures can only be done through using test/mock device plugins, along with checking the kubelet logs. Extra tests according to the test plan mentioned above should help mitigating issues. | ||||||
|
|
||||||
| ## Implementation History | ||||||
|
|
||||||
| A possible fix has already been submitted as a PR: https://github.com/kubernetes/kubernetes/pull/91190 (outdated, needs rebase) | ||||||
|
|
||||||
| ## Drawbacks | ||||||
|
|
||||||
| None, as this is an optional feature, it can simply be overlooked when not needed. | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| title: Add Deallocate and PostStopContainer to device plugin API | ||
| kep-number: 1948 | ||
| authors: | ||
| - "@mewais" | ||
| owning-sig: sig-node | ||
| participating-sigs: [] | ||
| status: implementable | ||
| creation-date: 2020-08-18 | ||
|
|
||
| reviewers: | ||
| - TBD | ||
| approvers: | ||
| - TBD | ||
| prr-approvers: | ||
| - "@deads2k" | ||
| see-also: [] | ||
| replaces: [] | ||
|
|
||
| # The target maturity stage in the current dev cycle for this KEP. | ||
| stage: alpha | ||
|
|
||
| # The milestone at which this feature was, or is targeted to be, at each stage. | ||
| latest-milestone: "v1.23" | ||
| milestone: | ||
| alpha: "v1.23" | ||
| beta: "v1.24" | ||
| stable: "v1.25" |
Uh oh!
There was an error while loading. Please reload this page.