-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DisableAcceleratorUsageMetrics Feature Gate #91930
Add DisableAcceleratorUsageMetrics Feature Gate #91930
Conversation
/sig-node |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
93ecdea
to
fe462d5
Compare
0441f3b
to
b12cd14
Compare
/test pull-kubernetes-e2e-kind-ipv6 |
/remove-kind api-change |
/retest |
/lgtm |
/kind api-change |
/lgtm will move back in milestone if sig-release approves. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, derekwaynecarr, RenaudWasTaken The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
e2018ca
to
34dc785
Compare
/lgtm |
/retest |
1 similar comment
/retest |
/lgtm again! |
Since the exception request was approved and the PR is already LGTM'ed and Approved by the owning SIG, adding the PR back into the v1.19 milestone. /milestone v1.19 |
/retest |
2 similar comments
/retest |
/retest |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
/retest |
1 similar comment
/retest |
/retest
…On Wed, Jul 22, 2020 at 7:04 PM Kubernetes Prow Robot < ***@***.***> wrote:
@RenaudWasTaken <https://github.com/RenaudWasTaken>: The following tests
*failed*, say /retest to rerun all failed tests:
Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big 34dc785
<34dc785>
link
<https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/91930/pull-kubernetes-kubemark-e2e-gce-big/1285900026907725826/> /test
pull-kubernetes-kubemark-e2e-gce-big
pull-kubernetes-e2e-gce-100-performance 34dc785
<34dc785>
link
<https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/91930/pull-kubernetes-e2e-gce-100-performance/1285900026815451136/> /test
pull-kubernetes-e2e-gce-100-performance
Full PR test history
<https://prow.k8s.io/pr-history?org=kubernetes&repo=kubernetes&pr=91930>. Your
PR dashboard. Please help us cut down on flakes by linking to
<https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/kubernetes/kubernetes/issues?q=is:issue+is:open> when
you hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://go.k8s.io/bot-commands>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#91930 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD24BUE7ES3QEV7Q2M2FGFLR43THXANCNFSM4NZCX7JA>
.
|
/retest |
/retest
…On Wed, Jul 22, 2020 at 8:53 PM Kubernetes Prow Robot < ***@***.***> wrote:
@RenaudWasTaken <https://github.com/RenaudWasTaken>: The following test
*failed*, say /retest to rerun all failed tests:
Test name Commit Details Rerun command
pull-kubernetes-typecheck 34dc785
<34dc785>
link
<https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/91930/pull-kubernetes-typecheck/1285956146762354688> /test
pull-kubernetes-typecheck
Full PR test history
<https://prow.k8s.io/pr-history?org=kubernetes&repo=kubernetes&pr=91930>. Your
PR dashboard. Please help us cut down on flakes by linking to
<https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/kubernetes/kubernetes/issues?q=is:issue+is:open> when
you hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://go.k8s.io/bot-commands>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#91930 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD24BUF5YI6UGTKHCTMFWKDR4376VANCNFSM4NZCX7JA>
.
|
It merged! 🚀 |
Wooooooooooooooooooooo! Thanks for running the retest command 😛! |
Signed-off-by: Renaud Gaubert rgaubert@nvidia.com
What type of PR is this?
/kind feature
What this PR does / why we need it: Adds the "DisableAcceleratorUsageMetrics" Feature Gate.
TLDR: Kubelet collects GPU metrics when it sees that the NVIDIA driver is present on the node.
Kubelet should no longer be collecting metrics from devices. The expected path is to use the pod resources API and have device vendors expose metrics through their own metrics container. This path is here for legacy reasons and we are deprecating it.
Furthermore because Kubelet now has an open handle on the NVIDIA driver, this breaks any infrastructure interactions (e.g: Removing or Updating the driver) with the NVIDIA driver. In other words any actions related to the NVIDIA driver cannot be taken without killing the kubelet.
See google/cadvisor#2574 for more details
Special notes for your reviewer:
This change does not remove the handle cadvisor has on the NVIDIA driver. To do so we will also need to wait for cadvisor to cut a new release and re-vendor cadvisor (so that we have the following PR in: google/cadvisor#2574).
/cc @dashpole
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
None, does this require a KEP / Doc / ... ?