New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal to expose deviceplugin info #68650

Open
u2takey opened this Issue Sep 14, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@u2takey
Contributor

u2takey commented Sep 14, 2018

As a user, sometimes i want to know gpuids allocated for a container. This information can be found as a container variable inside container, but it can not be achieved by inspect pod yaml.
So i propose to expose allocated deviceplugin information as prometheus metrics.
Now we can device_plugin_registration_count and device_plugin_alloc_latency_microseconds at https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/metrics/metrics.go#L52

Maybe we can add a metrics called device_plugin_alloc_resources, which expose data just like data saved in checkpoint:

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/devicemanager/manager.go#L88

Metric name Metric type Labels/tags
device_plugin_alloc_resources Gauge pod=<pod-name>
container=<container-name>
resource=<resource-name>
deviceids=<deviceids>
@u2takey

This comment has been minimized.

Show comment
Hide comment
@u2takey

u2takey Sep 14, 2018

Contributor

/sig node

Contributor

u2takey commented Sep 14, 2018

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Sep 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment