Skip to content

Commit e094b55

Browse files
committed
Proposal: Introduce Custom Metrics API
This proposal details the custom metrics API as proposed in the new monitoring vision. It is designed for use with the HPA, but should be generally useful for controllers that wish to consumer custom metrics.
1 parent 5329551 commit e094b55

File tree

1 file changed

+325
-0
lines changed

1 file changed

+325
-0
lines changed
Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
Custom Metrics API
2+
==================
3+
4+
The new [metrics monitoring vision](monitoring_architecture.md) proposes
5+
an API that the Horizontal Pod Autoscaler can use to access arbitrary
6+
metrics.
7+
8+
Similarly to the [master metrics API](resource-metrics-api.md), the new
9+
API should be structured around accessing metrics by referring to
10+
Kubernetes objects (or groups thereof) and a metric name. For this
11+
reason, the API could be useful for other consumers (most likely
12+
controllers) that want to consume custom metrics (similarly to how the
13+
master metrics API is generally useful to multiple cluster components).
14+
15+
The HPA can refer to metrics describing all pods matching a label
16+
selector, as well as an arbitrary named object.
17+
18+
API Paths
19+
---------
20+
21+
The root API path will look like `/apis/custom-metrics/v1alpha1`. For
22+
brevity, this will be left off below.
23+
24+
- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given
25+
metric for the given non-namespaced object (e.g. Node, PersistentVolume)
26+
27+
- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all
28+
non-namespaced objects of the given type
29+
30+
- `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the
31+
given metric for all non-namespaced objects of the given type matching
32+
the given label selector
33+
34+
- `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`:
35+
retrieve the given metric for the given namespaced object
36+
37+
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all
38+
namespaced objects of the given type
39+
40+
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given
41+
metric for all namespaced objects of the given type matching the
42+
given label selector
43+
44+
- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
45+
metric which describes the given namespace.
46+
47+
For example, to retrieve the custom metric "hits-per-second" for all
48+
ingress objects matching "app=frontend` in the namespaces "webapp", the
49+
request might look like:
50+
51+
```
52+
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/ingress.extensions/*/hits-per-second?labelSelector=app%3Dfrontend`
53+
54+
---
55+
56+
Verb: GET
57+
Namespace: webapp
58+
APIGroup: custom-metrics
59+
APIVersion: v1alpha1
60+
Resource: ingress.extensions
61+
Subresource: hits-per-second
62+
Name: ResourceAll(*)
63+
```
64+
65+
Notice that getting metrics which describe a namespace follows a slightly
66+
different pattern from other resources; Since namespaces cannot feasibly
67+
have unbounded subresource names (due to collision with resource names,
68+
etc), we introduce a pseudo-resource named "metrics", which represents
69+
metrics describing namespaces, where the resource name is the metric name:
70+
71+
```
72+
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/metrics/queue-length
73+
74+
---
75+
76+
Verb: GET
77+
Namespace: webapp
78+
APIGroup: custom-metrics
79+
APIVersion: v1alpha1
80+
Resource: metrics
81+
Name: queue-length
82+
```
83+
84+
API Path Design, Discovery, and Authorization
85+
---------------------------------------------
86+
87+
The API paths in this proposal are designed to a) resemble normal
88+
Kubernetes APIs, b) facilitate writing authorization rules, and c)
89+
allow for discovery.
90+
91+
Since the API structure follows the same structure as other Kubernetes
92+
APIs, it allows for fine grained control over access to metrics. Access
93+
can be controlled on a per-metric basic (each metric is a subresource, so
94+
metrics may be whitelisted by allowing access to a particular
95+
resource-subresource pair), or granted in general for a namespace (by
96+
allowing access to any resource in the `custom-metrics` API group).
97+
98+
Similarly, since metrics are simply subresources, a normal Kubernetes API
99+
discovery document can be published by the adapter's API server, allowing
100+
clients to discover the available metrics.
101+
102+
Note that we introduce the syntax of having a name of ` * ` here since
103+
there is no current syntax for getting the output of a subresource on
104+
multiple objects.
105+
106+
API Objects
107+
-----------
108+
109+
The request URLs listed above will return the `MetricValueList` type described
110+
below (when a name is given that is not ` * `, the API should simply return a
111+
list with a single element):
112+
113+
```go
114+
115+
// a list of values for a given metric for some set of objects
116+
type MetricValueList struct {
117+
metav1.TypeMeta`json:",inline"`
118+
metav1.ListMeta`json:"metadata,omitempty"`
119+
120+
// the value of the metric across the described objects
121+
Items []MetricValue `json:"items"`
122+
}
123+
124+
// a metric value for some object
125+
type MetricValue struct {
126+
metav1.TypeMeta`json:",inline"`
127+
128+
// a reference to the described object
129+
DescribedObject ObjectReference `json:"describedObject"`
130+
131+
// the name of the metric
132+
MetricName string `json:"metricName"`
133+
134+
// indicates the time at which the metrics were produced
135+
Timestamp unversioned.Time `json:"timestamp"`
136+
137+
// indicates the window ([Timestamp-Window, Timestamp]) from
138+
// which these metrics were calculated, when returning rate
139+
// metrics calculated from cumulative metrics (or zero for
140+
// non-calculated instantaneous metrics).
141+
WindowSeconds int64 `json:"window"`
142+
143+
// the value of the metric for this
144+
Value resource.Quantity
145+
}
146+
```
147+
148+
For instance, the example request above would yield the following object:
149+
150+
```json
151+
{
152+
"kind": "MetricValueList",
153+
"apiVersion": "custom-metrics/v1alpha1",
154+
"items": [
155+
{
156+
"metricName": "hits-per-second",
157+
"describedObject": {
158+
"kind": "Ingress",
159+
"apiVersion": "extensions",
160+
"name": "server1",
161+
"namespace": "webapp"
162+
},
163+
"timestamp": SOME_TIMESTAMP_HERE,
164+
"windowSeconds": "10",
165+
"value": "10"
166+
},
167+
{
168+
"metricName": "hits-per-second",
169+
"describedObject": {
170+
"kind": "Ingress",
171+
"apiVersion": "extensions",
172+
"name": "server2",
173+
"namespace": "webapp"
174+
},
175+
"timestamp": ANOTHER_TIMESTAMP_HERE,
176+
"windowSeconds": "10",
177+
"value": "15"
178+
}
179+
]
180+
}
181+
```
182+
183+
Semantics
184+
---------
185+
186+
### Object Types ###
187+
188+
In order to properly identify resources, we must use resource names
189+
qualified with group names (since the group for the requests will always
190+
be `custom-metrics`).
191+
192+
The `object-type` parameter should be the string form of
193+
`unversioned.GroupResource`. Note that we do not include version in this;
194+
we simply wish to uniquely identify all the different types of objects in
195+
Kubernetes. For example, the pods resource (which exists in the un-named
196+
legacy API group) would be represented simply as `pods`, while the jobs
197+
resource (which exists in the `batch` API group) would be represented as
198+
`jobs.batch`.
199+
200+
In the case of cross-group object renames, the adapter should maintain
201+
a list of "equivalent versions" that the monitoring system uses. This is
202+
monitoring-system dependent (for instance, the monitoring system might
203+
record all HorizontalPodAutoscalers as in `autoscaling`, but should be
204+
aware that HorizontalPodAutoscaler also exist in `extensions`).
205+
206+
Note that for namespace metrics, we use a pseudo-resource called
207+
`metrics`. Since there is no resource in the legacy API group, this will
208+
not clash with any existing resources.
209+
210+
### Metric Names ###
211+
212+
Metric names must be able to appear as a single subresource. In particular,
213+
metric names, *as passed to the API*, may not contain the characters '%', '/',
214+
or '?', and may not be named '.' or '..' (but may contain these sequences).
215+
Note, specifically, that URL encoding is not acceptable to escape the forbidden
216+
characters, due to issues in the Go URL handling libraries. Otherwise, metric
217+
names are open-ended.
218+
219+
### Metric Values and Timing ###
220+
221+
There should be only one metric value per object requested. The returned
222+
metrics should be the most recently available metrics, as with the resource
223+
metrics API. Implementers *should* attempt to return all metrics with roughly
224+
identical timestamps and windows (when appropriate), but consumers should also
225+
verify that any differences in timestamps are within tolerances for
226+
a particular application (e.g. a dashboard might simply display the older
227+
metric with a note, while the horizontal pod autoscaler controller might choose
228+
to pretend it did not receive that metric value).
229+
230+
### Labeled Metrics (or lack thereof) ###
231+
232+
For metrics systems that support differentiating metrics beyond the
233+
Kubernetes object hierarchy (such as using additional labels), the metrics
234+
systems should have a metric which represents all such series aggregated
235+
together. Additionally, implementors may choose to identify the individual
236+
"sub-metrics" via the metric name, but this is expected to be fairly rare,
237+
since it most likely requires specific knowledge of individual metrics.
238+
For instance, suppose we record filesystem usage by filesystem inside the
239+
container. There should then be a metric `filesystem/usage`, and the
240+
implementors of the API may choose to expose more detailed metrics like
241+
`filesystem/usage/my-first-filesystem`.
242+
243+
### Resource Versions ###
244+
245+
API implementors should set the `resourceVersion` field based on the
246+
scrape time of the metric. The resource version is expected to increment
247+
when the scrape/collection time of the returned metric changes. While the
248+
API does not support writes, and does not currently support watches,
249+
populating resource version preserves the normal expected Kubernetes API
250+
semantics.
251+
252+
Relationship to HPA v2
253+
----------------------
254+
255+
The URL paths in this API are designed to correspond to different source
256+
types in the [HPA v2](hpa-v2.md). Specifially, the `pods` source type
257+
corresponds to a URL of the form
258+
`/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the
259+
`object` source type corresponds to a URL of the form
260+
`/namespaces/$NS/$RESOURCE.$GROUP/$OBJECT_NAME/$METRIC_NAME`.
261+
262+
The HPA then takes the results, aggregates them together (in the case of
263+
the former source type), and uses the resulting value to produce a usage
264+
ratio.
265+
266+
The resource source type is taken from the API provided by the
267+
"metrics" API group (the master/resource metrics API).
268+
269+
The HPA will consume the API as a federated API server.
270+
271+
Relationship to Resource Metrics API
272+
------------------------------------
273+
274+
The metrics presented by this API may be a superset of those present in the
275+
resource metrics API, but this is not guaranteed. Clients that need the
276+
information in the resource metrics API should use that to retrieve those
277+
metrics, and supplement those metrics with this API.
278+
279+
Mechanical Concerns
280+
-------------------
281+
282+
This API is intended to be implemented by monitoring pipelines (e.g.
283+
inside Heapster, or as an adapter on top of a solution like Prometheus).
284+
It shares many mechanical requirements with normal Kubernetes APIs, such
285+
as the need to support encoding different versions of objects in both JSON
286+
and protobuf, as well as acting as a discoverable API server. For these
287+
reasons, it is expected that implemenators will make use of the Kubernetes
288+
genericapiserver code. If implementors choose not to use this, they must
289+
still follow all of the Kubernetes API server conventions in order to work
290+
properly with consumers of the API.
291+
292+
Specifically, they must support the semantics of the GET verb in
293+
Kubernetes, including outputting in different API versions and formats as
294+
requested by the client. They must support integrating with API discovery
295+
(including publishing a discovery document, etc).
296+
297+
Location
298+
--------
299+
300+
The types and clients for this API will live in a separate repository
301+
under the Kubernetes organization (e.g. `kubernetes/metrics`). This
302+
repository will most likely also house other metrics-related APIs for
303+
Kubernetes (e.g. historical metrics API definitions, the resource metrics
304+
API definitions, etc).
305+
306+
Note that there will not be a canonical implemenation of the custom
307+
metrics API under Kubernetes, just the types and clients. Implementations
308+
will be left up to the monitoring pipelines.
309+
310+
Alternative Considerations
311+
--------------------------
312+
313+
### Quantity vs Float ###
314+
315+
In the past, custom metrics were represented as floats. In general,
316+
however, Kubernetes APIs are not supposed to use floats. The API proposed
317+
above thus uses `resource.Quantity`. This adds a bit of encoding
318+
overhead, but makes the API line up nicely with other Kubernetes APIs.
319+
320+
### Labeled Metrics ###
321+
322+
Many metric systems support labeled metrics, allowing for dimenisionality
323+
beyond the Kubernetes object hierarchy. Since the HPA currently doesn't
324+
support specifying metric labels, this is not supported via this API. We
325+
may wish to explore this in the future.

0 commit comments

Comments
 (0)