|
| 1 | +Custom Metrics API |
| 2 | +================== |
| 3 | + |
| 4 | +The new [metrics monitoring vision](monitoring_architecture.md) proposes |
| 5 | +an API that the Horizontal Pod Autoscaler can use to access arbitrary |
| 6 | +metrics. |
| 7 | + |
| 8 | +Similarly to the [master metrics API](resource-metrics-api.md), the new |
| 9 | +API should be structured around accessing metrics by referring to |
| 10 | +Kubernetes objects (or groups thereof) and a metric name. For this |
| 11 | +reason, the API could be useful for other consumers (most likely |
| 12 | +controllers) that want to consume custom metrics (similarly to how the |
| 13 | +master metrics API is generally useful to multiple cluster components). |
| 14 | + |
| 15 | +The HPA can refer to metrics describing all pods matching a label |
| 16 | +selector, as well as an arbitrary named object. |
| 17 | + |
| 18 | +API Paths |
| 19 | +--------- |
| 20 | + |
| 21 | +The root API path will look like `/apis/custom-metrics/v1alpha1`. For |
| 22 | +brevity, this will be left off below. |
| 23 | + |
| 24 | +- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given |
| 25 | + metric for the given non-namespaced object (e.g. Node, PersistentVolume) |
| 26 | + |
| 27 | +- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all |
| 28 | + non-namespaced objects of the given type |
| 29 | + |
| 30 | +- `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the |
| 31 | + given metric for all non-namespaced objects of the given type matching |
| 32 | + the given label selector |
| 33 | + |
| 34 | +- `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`: |
| 35 | + retrieve the given metric for the given namespaced object |
| 36 | + |
| 37 | +- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all |
| 38 | + namespaced objects of the given type |
| 39 | + |
| 40 | +- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given |
| 41 | + metric for all namespaced objects of the given type matching the |
| 42 | + given label selector |
| 43 | + |
| 44 | +- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given |
| 45 | + metric which describes the given namespace. |
| 46 | + |
| 47 | +For example, to retrieve the custom metric "hits-per-second" for all |
| 48 | +ingress objects matching "app=frontend` in the namespaces "webapp", the |
| 49 | +request might look like: |
| 50 | + |
| 51 | +``` |
| 52 | +GET /apis/custom-metrics/v1alpha1/namespaces/webapp/ingress.extensions/*/hits-per-second?labelSelector=app%3Dfrontend` |
| 53 | +
|
| 54 | +--- |
| 55 | +
|
| 56 | +Verb: GET |
| 57 | +Namespace: webapp |
| 58 | +APIGroup: custom-metrics |
| 59 | +APIVersion: v1alpha1 |
| 60 | +Resource: ingress.extensions |
| 61 | +Subresource: hits-per-second |
| 62 | +Name: ResourceAll(*) |
| 63 | +``` |
| 64 | + |
| 65 | +Notice that getting metrics which describe a namespace follows a slightly |
| 66 | +different pattern from other resources; Since namespaces cannot feasibly |
| 67 | +have unbounded subresource names (due to collision with resource names, |
| 68 | +etc), we introduce a pseudo-resource named "metrics", which represents |
| 69 | +metrics describing namespaces, where the resource name is the metric name: |
| 70 | + |
| 71 | +``` |
| 72 | +GET /apis/custom-metrics/v1alpha1/namespaces/webapp/metrics/queue-length |
| 73 | +
|
| 74 | +--- |
| 75 | +
|
| 76 | +Verb: GET |
| 77 | +Namespace: webapp |
| 78 | +APIGroup: custom-metrics |
| 79 | +APIVersion: v1alpha1 |
| 80 | +Resource: metrics |
| 81 | +Name: queue-length |
| 82 | +``` |
| 83 | + |
| 84 | +API Path Design, Discovery, and Authorization |
| 85 | +--------------------------------------------- |
| 86 | + |
| 87 | +The API paths in this proposal are designed to a) resemble normal |
| 88 | +Kubernetes APIs, b) facilitate writing authorization rules, and c) |
| 89 | +allow for discovery. |
| 90 | + |
| 91 | +Since the API structure follows the same structure as other Kubernetes |
| 92 | +APIs, it allows for fine grained control over access to metrics. Access |
| 93 | +can be controlled on a per-metric basic (each metric is a subresource, so |
| 94 | +metrics may be whitelisted by allowing access to a particular |
| 95 | +resource-subresource pair), or granted in general for a namespace (by |
| 96 | +allowing access to any resource in the `custom-metrics` API group). |
| 97 | + |
| 98 | +Similarly, since metrics are simply subresources, a normal Kubernetes API |
| 99 | +discovery document can be published by the adapter's API server, allowing |
| 100 | +clients to discover the available metrics. |
| 101 | + |
| 102 | +Note that we introduce the syntax of having a name of ` * ` here since |
| 103 | +there is no current syntax for getting the output of a subresource on |
| 104 | +multiple objects. |
| 105 | + |
| 106 | +API Objects |
| 107 | +----------- |
| 108 | + |
| 109 | +The request URLs listed above will return the `MetricValueList` type described |
| 110 | +below (when a name is given that is not ` * `, the API should simply return a |
| 111 | +list with a single element): |
| 112 | + |
| 113 | +```go |
| 114 | + |
| 115 | +// a list of values for a given metric for some set of objects |
| 116 | +type MetricValueList struct { |
| 117 | + metav1.TypeMeta`json:",inline"` |
| 118 | + metav1.ListMeta`json:"metadata,omitempty"` |
| 119 | + |
| 120 | + // the value of the metric across the described objects |
| 121 | + Items []MetricValue `json:"items"` |
| 122 | +} |
| 123 | + |
| 124 | +// a metric value for some object |
| 125 | +type MetricValue struct { |
| 126 | + metav1.TypeMeta`json:",inline"` |
| 127 | + |
| 128 | + // a reference to the described object |
| 129 | + DescribedObject ObjectReference `json:"describedObject"` |
| 130 | + |
| 131 | + // the name of the metric |
| 132 | + MetricName string `json:"metricName"` |
| 133 | + |
| 134 | + // indicates the time at which the metrics were produced |
| 135 | + Timestamp unversioned.Time `json:"timestamp"` |
| 136 | + |
| 137 | + // indicates the window ([Timestamp-Window, Timestamp]) from |
| 138 | + // which these metrics were calculated, when returning rate |
| 139 | + // metrics calculated from cumulative metrics (or zero for |
| 140 | + // non-calculated instantaneous metrics). |
| 141 | + WindowSeconds int64 `json:"window"` |
| 142 | + |
| 143 | + // the value of the metric for this |
| 144 | + Value resource.Quantity |
| 145 | +} |
| 146 | +``` |
| 147 | + |
| 148 | +For instance, the example request above would yield the following object: |
| 149 | + |
| 150 | +```json |
| 151 | +{ |
| 152 | + "kind": "MetricValueList", |
| 153 | + "apiVersion": "custom-metrics/v1alpha1", |
| 154 | + "items": [ |
| 155 | + { |
| 156 | + "metricName": "hits-per-second", |
| 157 | + "describedObject": { |
| 158 | + "kind": "Ingress", |
| 159 | + "apiVersion": "extensions", |
| 160 | + "name": "server1", |
| 161 | + "namespace": "webapp" |
| 162 | + }, |
| 163 | + "timestamp": SOME_TIMESTAMP_HERE, |
| 164 | + "windowSeconds": "10", |
| 165 | + "value": "10" |
| 166 | + }, |
| 167 | + { |
| 168 | + "metricName": "hits-per-second", |
| 169 | + "describedObject": { |
| 170 | + "kind": "Ingress", |
| 171 | + "apiVersion": "extensions", |
| 172 | + "name": "server2", |
| 173 | + "namespace": "webapp" |
| 174 | + }, |
| 175 | + "timestamp": ANOTHER_TIMESTAMP_HERE, |
| 176 | + "windowSeconds": "10", |
| 177 | + "value": "15" |
| 178 | + } |
| 179 | + ] |
| 180 | +} |
| 181 | +``` |
| 182 | + |
| 183 | +Semantics |
| 184 | +--------- |
| 185 | + |
| 186 | +### Object Types ### |
| 187 | + |
| 188 | +In order to properly identify resources, we must use resource names |
| 189 | +qualified with group names (since the group for the requests will always |
| 190 | +be `custom-metrics`). |
| 191 | + |
| 192 | +The `object-type` parameter should be the string form of |
| 193 | +`unversioned.GroupResource`. Note that we do not include version in this; |
| 194 | +we simply wish to uniquely identify all the different types of objects in |
| 195 | +Kubernetes. For example, the pods resource (which exists in the un-named |
| 196 | +legacy API group) would be represented simply as `pods`, while the jobs |
| 197 | +resource (which exists in the `batch` API group) would be represented as |
| 198 | +`jobs.batch`. |
| 199 | + |
| 200 | +In the case of cross-group object renames, the adapter should maintain |
| 201 | +a list of "equivalent versions" that the monitoring system uses. This is |
| 202 | +monitoring-system dependent (for instance, the monitoring system might |
| 203 | +record all HorizontalPodAutoscalers as in `autoscaling`, but should be |
| 204 | +aware that HorizontalPodAutoscaler also exist in `extensions`). |
| 205 | + |
| 206 | +Note that for namespace metrics, we use a pseudo-resource called |
| 207 | +`metrics`. Since there is no resource in the legacy API group, this will |
| 208 | +not clash with any existing resources. |
| 209 | + |
| 210 | +### Metric Names ### |
| 211 | + |
| 212 | +Metric names must be able to appear as a single subresource. In particular, |
| 213 | +metric names, *as passed to the API*, may not contain the characters '%', '/', |
| 214 | +or '?', and may not be named '.' or '..' (but may contain these sequences). |
| 215 | +Note, specifically, that URL encoding is not acceptable to escape the forbidden |
| 216 | +characters, due to issues in the Go URL handling libraries. Otherwise, metric |
| 217 | +names are open-ended. |
| 218 | + |
| 219 | +### Metric Values and Timing ### |
| 220 | + |
| 221 | +There should be only one metric value per object requested. The returned |
| 222 | +metrics should be the most recently available metrics, as with the resource |
| 223 | +metrics API. Implementers *should* attempt to return all metrics with roughly |
| 224 | +identical timestamps and windows (when appropriate), but consumers should also |
| 225 | +verify that any differences in timestamps are within tolerances for |
| 226 | +a particular application (e.g. a dashboard might simply display the older |
| 227 | +metric with a note, while the horizontal pod autoscaler controller might choose |
| 228 | +to pretend it did not receive that metric value). |
| 229 | + |
| 230 | +### Labeled Metrics (or lack thereof) ### |
| 231 | + |
| 232 | +For metrics systems that support differentiating metrics beyond the |
| 233 | +Kubernetes object hierarchy (such as using additional labels), the metrics |
| 234 | +systems should have a metric which represents all such series aggregated |
| 235 | +together. Additionally, implementors may choose to identify the individual |
| 236 | +"sub-metrics" via the metric name, but this is expected to be fairly rare, |
| 237 | +since it most likely requires specific knowledge of individual metrics. |
| 238 | +For instance, suppose we record filesystem usage by filesystem inside the |
| 239 | +container. There should then be a metric `filesystem/usage`, and the |
| 240 | +implementors of the API may choose to expose more detailed metrics like |
| 241 | +`filesystem/usage/my-first-filesystem`. |
| 242 | + |
| 243 | +### Resource Versions ### |
| 244 | + |
| 245 | +API implementors should set the `resourceVersion` field based on the |
| 246 | +scrape time of the metric. The resource version is expected to increment |
| 247 | +when the scrape/collection time of the returned metric changes. While the |
| 248 | +API does not support writes, and does not currently support watches, |
| 249 | +populating resource version preserves the normal expected Kubernetes API |
| 250 | +semantics. |
| 251 | + |
| 252 | +Relationship to HPA v2 |
| 253 | +---------------------- |
| 254 | + |
| 255 | +The URL paths in this API are designed to correspond to different source |
| 256 | +types in the [HPA v2](hpa-v2.md). Specifially, the `pods` source type |
| 257 | +corresponds to a URL of the form |
| 258 | +`/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the |
| 259 | +`object` source type corresponds to a URL of the form |
| 260 | +`/namespaces/$NS/$RESOURCE.$GROUP/$OBJECT_NAME/$METRIC_NAME`. |
| 261 | + |
| 262 | +The HPA then takes the results, aggregates them together (in the case of |
| 263 | +the former source type), and uses the resulting value to produce a usage |
| 264 | +ratio. |
| 265 | + |
| 266 | +The resource source type is taken from the API provided by the |
| 267 | +"metrics" API group (the master/resource metrics API). |
| 268 | + |
| 269 | +The HPA will consume the API as a federated API server. |
| 270 | + |
| 271 | +Relationship to Resource Metrics API |
| 272 | +------------------------------------ |
| 273 | + |
| 274 | +The metrics presented by this API may be a superset of those present in the |
| 275 | +resource metrics API, but this is not guaranteed. Clients that need the |
| 276 | +information in the resource metrics API should use that to retrieve those |
| 277 | +metrics, and supplement those metrics with this API. |
| 278 | + |
| 279 | +Mechanical Concerns |
| 280 | +------------------- |
| 281 | + |
| 282 | +This API is intended to be implemented by monitoring pipelines (e.g. |
| 283 | +inside Heapster, or as an adapter on top of a solution like Prometheus). |
| 284 | +It shares many mechanical requirements with normal Kubernetes APIs, such |
| 285 | +as the need to support encoding different versions of objects in both JSON |
| 286 | +and protobuf, as well as acting as a discoverable API server. For these |
| 287 | +reasons, it is expected that implemenators will make use of the Kubernetes |
| 288 | +genericapiserver code. If implementors choose not to use this, they must |
| 289 | +still follow all of the Kubernetes API server conventions in order to work |
| 290 | +properly with consumers of the API. |
| 291 | + |
| 292 | +Specifically, they must support the semantics of the GET verb in |
| 293 | +Kubernetes, including outputting in different API versions and formats as |
| 294 | +requested by the client. They must support integrating with API discovery |
| 295 | +(including publishing a discovery document, etc). |
| 296 | + |
| 297 | +Location |
| 298 | +-------- |
| 299 | + |
| 300 | +The types and clients for this API will live in a separate repository |
| 301 | +under the Kubernetes organization (e.g. `kubernetes/metrics`). This |
| 302 | +repository will most likely also house other metrics-related APIs for |
| 303 | +Kubernetes (e.g. historical metrics API definitions, the resource metrics |
| 304 | +API definitions, etc). |
| 305 | + |
| 306 | +Note that there will not be a canonical implemenation of the custom |
| 307 | +metrics API under Kubernetes, just the types and clients. Implementations |
| 308 | +will be left up to the monitoring pipelines. |
| 309 | + |
| 310 | +Alternative Considerations |
| 311 | +-------------------------- |
| 312 | + |
| 313 | +### Quantity vs Float ### |
| 314 | + |
| 315 | +In the past, custom metrics were represented as floats. In general, |
| 316 | +however, Kubernetes APIs are not supposed to use floats. The API proposed |
| 317 | +above thus uses `resource.Quantity`. This adds a bit of encoding |
| 318 | +overhead, but makes the API line up nicely with other Kubernetes APIs. |
| 319 | + |
| 320 | +### Labeled Metrics ### |
| 321 | + |
| 322 | +Many metric systems support labeled metrics, allowing for dimenisionality |
| 323 | +beyond the Kubernetes object hierarchy. Since the HPA currently doesn't |
| 324 | +support specifying metric labels, this is not supported via this API. We |
| 325 | +may wish to explore this in the future. |
0 commit comments