Skip to content

Commit

Permalink
Merge pull request kubernetes#2454 from vikaschoudhary16/compute-devices
Browse files Browse the repository at this point in the history
KEP: Support Device Monitoring
  • Loading branch information
k8s-ci-robot committed Nov 12, 2018
2 parents 44ae1a3 + 3e0e0dc commit 1e792ec
Showing 1 changed file with 150 additions and 0 deletions.
150 changes: 150 additions & 0 deletions sig-node/compute-device-assignment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
kep-number: 18
title: Kubelet endpoint for device assignment observation details
authors:
- "@dashpole"
- "@vikaschoudhary16"
owning-sig: sig-node
reviewers:
- "@thockin"
- "@derekwaynecarr"
- "@dchen1107"
- "@vishh"
approvers:
- "@sig-node-leads"
editors:
- "@dashpole"
- "@vikaschoudhary16"
creation-date: "2018-07-19"
last-updated: "2018-07-19"
status: provisional
---
# Kubelet endpoint for device assignment observation details

Table of Contents
=================
* [Abstract](#abstract)
* [Background](#background)
* [Objectives](#objectives)
* [User Journeys](#user-journeys)
* [Device Monitoring Agents](#device-monitoring-agents)
* [Changes](#changes)
* [Potential Future Improvements](#potential-future-improvements)
* [Alternatives Considered](#alternatives-considered)

## Abstract
In this document we will discuss the motivation and code changes required for introducing a kubelet endpoint to expose device to container bindings.

## Background
[Device Monitoring](https://docs.google.com/document/d/1NYnqw-HDQ6Y3L_mk85Q3wkxDtGNWTxpsedsgw4NgWpg/edit?usp=sharing) requires external agents to be able to determine the set of devices in-use by containers and attach pod and container metadata for these devices.

## Objectives

* To remove current device-specific knowledge from the kubelet, such as [accellerator metrics](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/stats/v1alpha1/types.go#L229)
* To enable future use-cases requiring device-specific knowledge to be out-of-tree

## User Journeys

### Device Monitoring Agents

* As a _Cluster Administrator_, I provide a set of devices from various vendors in my cluster. Each vendor independently maintains their own agent, so I run monitoring agents only for devices I provide. Each agent adheres to to the [node monitoring guidelines](https://docs.google.com/document/d/1_CdNWIjPBqVDMvu82aJICQsSCbh2BR-y9a8uXjQm4TI/edit?usp=sharing), so I can use a compatible monitoring pipeline to collect and analyze metrics from a variety of agents, even though they are maintained by different vendors.
* As a _Device Vendor_, I manufacture devices and I have deep domain expertise in how to run and monitor them. Because I maintain my own Device Plugin implementation, as well as Device Monitoring Agent, I can provide consumers of my devices an easy way to consume and monitor my devices without requiring open-source contributions. The Device Monitoring Agent doesn't have any dependencies on the Device Plugin, so I can decouple monitoring from device lifecycle management. My Device Monitoring Agent works by periodically querying the `/devices/<ResourceName>` endpoint to discover which devices are being used, and to get the container/pod metadata associated with the metrics:

![device monitoring architecture](https://user-images.githubusercontent.com/3262098/43926483-44331496-9bdf-11e8-82a0-14b47583b103.png)


## Changes

Add a v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns information about the kubelet's assignment of devices to containers. It obtains this information from the internal state of the kubelet's Device Manager. The GRPC Service returns a single PodResourcesResponse, which is shown in proto below:
```protobuf
// PodResources is a service provided by the kubelet that provides information about the
// node resources consumed by pods and containers on the node
service PodResources {
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
}
// ListPodResourcesRequest is the request made to the PodResources service
message ListPodResourcesRequest {}
// ListPodResourcesResponse is the response returned by List function
message ListPodResourcesResponse {
repeated PodResources pod_resources = 1;
}
// PodResources contains information about the node resources assigned to a pod
message PodResources {
string name = 1;
string namespace = 2;
repeated ContainerResources containers = 3;
}
// ContainerResources contains information about the resources assigned to a container
message ContainerResources {
string name = 1;
repeated ContainerDevices devices = 2;
}
// ContainerDevices contains information about the devices assigned to a container
message ContainerDevices {
string resource_name = 1;
repeated string device_ids = 2;
}
```

### Potential Future Improvements

* Add `ListAndWatch()` function to the GRPC endpoint so monitoring agents don't need to poll.
* Add identifiers for other resources used by pods to the `PodResources` message.
* For example, persistent volume location on disk

## Alternatives Considered

### Add v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns a list of [CreateContainerRequest](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L734)s used to create containers.
* Pros:
* Reuse an existing API for describing containers rather than inventing a new one
* Cons:
* It ties the endpoint to the CreateContainerRequest, and may prevent us from adding other information we want in the future
* It does not contain any additional information that will be useful to monitoring agents other than device, and contains lots of irrelevant information for this use-case.
* Notes:
* Does not include any reference to resource names. Monitoring agentes must identify devices by the device or environment variables passed to the pod or container.

### Add a field to Pod Status.
* Pros:
* Allows for observation of container to device bindings local to the node through the `/pods` endpoint
* Cons:
* Only consumed locally, which doesn't justify an API change
* Device Bindings are immutable after allocation, and are _debatably_ observable (they can be "observed" from the local checkpoint file). Device bindings are generally a poor fit for status.

### Use the Kubelet Device Manager Checkpoint file
* Allows for observability of device to container bindings through what exists in the checkpoint file
* Requires adding additional metadata to the checkpoint file as required by the monitoring agent
* Requires implementing versioning for the checkpoint file, and handling version skew between readers and the kubelet
* Future modifications to the checkpoint file are more difficult.

### Add a field to the Pod Spec:
* A new object `ComputeDevice` will be defined and a new variable `ComputeDevices` will be added in the `Container` (Spec) object which will represent a list of `ComputeDevice` objects.
```golang
// ComputeDevice describes the devices assigned to this container for a given ResourceName
type ComputeDevice struct {
// DeviceIDs is the list of devices assigned to this container
DeviceIDs []string
// ResourceName is the name of the compute resource
ResourceName string
}

// Container represents a single container that is expected to be run on the host.
type Container struct {
...
// ComputeDevices contains the devices assigned to this container
// This field is alpha-level and is only honored by servers that enable the ComputeDevices feature.
// +optional
ComputeDevices []ComputeDevice
...
}
```
* During Kubelet pod admission, if `ComputeDevices` is found non-empty, specified devices will be allocated otherwise behaviour will remain same as it is today.
* Before starting the pod, the kubelet writes the assigned `ComputeDevices` back to the pod spec.
* Note: Writing to the Api Server and waiting to observe the updated pod spec in the kubelet's pod watch may add significant latency to pod startup.
* Allows devices to potentially be assigned by a custom scheduler.
* Serves as a permanent record of device assignments for the kubelet, and eliminates the need for the kubelet to maintain this state locally.

0 comments on commit 1e792ec

Please sign in to comment.