Skip to content

Commit eb443a2

Browse files
committed
Proposal: Introduce Custom Metrics API
This proposal details the custom metrics API as proposed in the new monitoring vision. It is designed for use with the HPA, but should be generally useful for controllers that wish to consumer custom metrics.
1 parent 5329551 commit eb443a2

File tree

1 file changed

+302
-0
lines changed

1 file changed

+302
-0
lines changed
Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
Custom Metrics API
2+
==================
3+
4+
The new [metrics monitoring vision](monitoring_architecture.md) proposes
5+
an API that the Horizontal Pod Autoscaler can use to access arbitrary
6+
metrics.
7+
8+
Similarly to the [master metrics API](resource-metrics-api.md), the new
9+
API should be structured around accessing metrics by referring to
10+
kubernetes objects (or groups thereof) and a metric name. For this
11+
reason, the API could be useful for other consumers (most likely
12+
controllers) that want to consume custom metrics (similarly to how the
13+
master metrics API is generally useful to multiple cluster components).
14+
15+
The HPA can refer to metrics describing all pods matching a label
16+
selector, as well as an arbitrary named object.
17+
18+
API Paths
19+
---------
20+
21+
The root API path will look like `/apis/custom-metrics/v1alpha1`. For
22+
brevity, this will be left off below.
23+
24+
- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given
25+
metric for the given non-namespaced object (e.g. Node, PersistentVolume)
26+
27+
- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all
28+
non-namespaced objects of the given type
29+
30+
- `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the
31+
given metric for all non-namespaced objects of the given type matching
32+
the given label selector
33+
34+
- `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`:
35+
retrieve the given metric for the given namespaced object
36+
37+
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all
38+
namespaced objects of the given type
39+
40+
- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given
41+
metric for all namespaced objects of the given type matching the
42+
given label selector
43+
44+
- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
45+
metric which describes the given namespace.
46+
47+
For example, to retrieve the custom metric "hits-per-second" for all
48+
ingress objects matching "app=frontend` in the namespaces "webapp", the
49+
request might look like:
50+
51+
```
52+
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/ingress.extensions/*/hits-per-second?labelSelector=app%3Dfrontend`
53+
54+
---
55+
56+
Verb: GET
57+
Namespace: webapp
58+
APIGroup: custom-metrics
59+
APIVersion: v1alpha1
60+
Resource: ingress.extensions
61+
Subresource: hits-per-second
62+
Name: ResourceAll(*)
63+
```
64+
65+
Notice that getting metrics which describe a namespace follows a slightly
66+
different pattern from other resources; Since namespaces cannot feasibly
67+
have unbounded subresource names (due to collision with resource names,
68+
etc), we introduce a pseudo-resource named "metrics", which represents
69+
metrics describing namespaces, where the resource name is the metric name:
70+
71+
```
72+
GET /apis/custom-metrics/v1alpha1/namespaces/webapp/metrics/queue-length
73+
74+
---
75+
76+
Verb: GET
77+
Namespace: webapp
78+
APIGroup: custom-metrics
79+
APIVersion: v1alpha1
80+
Resource: metrics
81+
Name: queue-length
82+
```
83+
84+
API Path Design, Discovery, and Authorization
85+
---------------------------------------------
86+
87+
The API paths in this proposal are designed to a) resemble normal
88+
kubernetes APIs, b) facilitate writing authorization rules, and c)
89+
allowing for discovery.
90+
91+
Since the API structure follows the same structure as other Kubernetes
92+
APIs, it allows for fine grained control over access to metrics. Access
93+
can be controlled on a per-metric basic (each metric is a subresource, so
94+
metrics may be whitelisted by allowing access to a particular
95+
resource-subresource pair), or granted in general for a namespace (by
96+
allowing access to any resource in the `custom-metrics` API group).
97+
98+
Similarly, since metrics are simply subresources, a normal Kubernetes API
99+
discovery document can be published by the adapter's API server, allowing
100+
clients to discover the available metrics.
101+
102+
Note that we introduce the syntax of having a name of ` * ` here since
103+
there is no current syntax for getting the output of a subresource on
104+
multiple objects.
105+
106+
API Objects
107+
-----------
108+
109+
The request URLs listed above will return either the `MetricValueList` or
110+
`MetricValue` types (depending on whether they describe multiple objects,
111+
or a single one), described below:
112+
113+
```go
114+
115+
// a list of values for a given metric for some set of objects
116+
type MetricValueList struct {
117+
unversioned.TypeMeta `json:",inline"`
118+
unversioned.ListMeta `json:"metadata,omitempty"`
119+
120+
// the value of the metric across the described objects
121+
Items []MetricValue `json:"metricValues"`
122+
}
123+
124+
// a metric value for some object
125+
type MetricValue struct {
126+
// a reference to the described object
127+
DescribedObject ObjectReference `json:"describedObject"`
128+
129+
// the name of the metric
130+
MetricName string `json:"metricName"`
131+
132+
// indicates the time at which the metrics were produced
133+
Timestamp unversioned.Time `json:"timestamp"`
134+
135+
// indicates the window ([Timestamp-Window, Timestamp]) from
136+
// which these metrics were calculated, when returning rate
137+
// metrics calculated from cumulative metrics (or zero for
138+
// non-calculated instantaneous metrics).
139+
Window unversioned.Duration `json:"window"`
140+
141+
// the value of the metric for this
142+
Value resource.Quantity
143+
}
144+
```
145+
146+
For instance, the example request above would yield the following object:
147+
148+
```json
149+
{
150+
"kind": "MetricValueList",
151+
"apiVersion": "custom-metrics/v1alpha1",
152+
"items": [
153+
{
154+
"metricName": "hits-per-second",
155+
"describedObject": {
156+
"kind": "Ingress",
157+
"apiVersion": "extensions",
158+
"name": "server1",
159+
"namespace": "webapp"
160+
},
161+
"timestamp": SOME_TIMESTAMP_HERE,
162+
"window": "10s",
163+
"value": "10"
164+
},
165+
{
166+
"metricName": "hits-per-second",
167+
"describedObject": {
168+
"kind": "Ingress",
169+
"apiVersion": "extensions",
170+
"name": "server2",
171+
"namespace": "webapp"
172+
},
173+
"timestamp": ANOTHER_TIMESTAMP_HERE,
174+
"window": "10s",
175+
"value": "15"
176+
}
177+
]
178+
}
179+
```
180+
181+
Semantics
182+
---------
183+
184+
### Object Types ###
185+
186+
In order to properly identify resources, we must use resource names
187+
qualified with group names (since the group for the requests will always
188+
be `custom-metrics`).
189+
190+
The `object-type` parameter should be the string form of
191+
`unversioned.GroupResource`. Note that we do not include version in this;
192+
we simply wish to uniquely identify all the different types of objects in
193+
Kubernetes.
194+
195+
In the case of cross-group object renames, the adapter should maintain
196+
a list of "equivalent versions" that the monitoring system uses. This is
197+
monitoring-system dependent (for instance, the monitoring system might
198+
record all HorizontalPodAutoscalers as in `autoscaling`, but should be
199+
aware that HorizontalPodAutoscaler also exist in `extensions`).
200+
201+
Note that for namespace metrics, we use a pseudo-resource called
202+
`metrics`. Since there is no resource in the legacy API group, this will
203+
not clash with any existing resources.
204+
205+
### Metric Names ###
206+
207+
Metric names must be escaped so that any given metric appears as a single
208+
subresource (this is especially true for metrics systems which support slashes
209+
in metric name). Otherwise, metric names are open-ended.
210+
211+
### Metric Values and Timing ###
212+
213+
There should be only one metric value per object requested. The returned
214+
metrics should be the most recenly available metrics, as with the resource
215+
metrics API. Implementers *should* attempt to return all metrics with roughly
216+
identical timestamps and windows (when appropriate), but consumers should also
217+
verify that any differences in timestamps are within tolerances for
218+
a particular application (e.g. a dashboard might simply display the older
219+
metric with a note, while the horizontal pod autoscaler controller might choose
220+
to pretend it did not receive that metric value).
221+
222+
### Labeled Metrics (or lack thereof) ###
223+
224+
For metrics systems that support differentiating metrics beyond the Kubernetes
225+
object hierarchy (such as using additional labels), the metrics systems should
226+
have a metric which represents all such series aggregated together.
227+
Additionally, implementors may choose to the individual "sub-metrics" via the
228+
metric name, but this is expected to be fairly rare, since it most likely
229+
requires specific knowledge of individual metrics. For instance, suppose we
230+
record filesystem usage by filesystem inside the container. There should then
231+
be a metric `filesystem/usage`, and the implementors of the API may choose to
232+
expose more detailed metrics like `filesystem/usage/my-first-filesystem`.
233+
234+
Relationship to HPA v2
235+
----------------------
236+
237+
The URL paths in this API are designed to correspond to different source
238+
types in the [HPA v2](hpa-v2.md). Specifially, the `pods` source type
239+
corresponds to a URL of the form
240+
`/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the
241+
`object` source type corresponds to a URL of the form
242+
`/namespaces/$NS/$KIND.$GROUP/$OBJECT_NAME/$METRIC_NAME`.
243+
244+
The HPA then takes the results, aggregates them together (in the case of
245+
the former source type), and uses the resulting value to produce a usage
246+
ratio.
247+
248+
The resource source type is taken from the API provided by the
249+
"metrics" API group (the master/resource metrics API).
250+
251+
The HPA will consume the API as a federated API server.
252+
253+
Relationship to Resource Metrics API
254+
------------------------------------
255+
256+
The metrics presented by this API may be a superset of those present in the
257+
resource metrics API, but this is not guaranteed. Clients that need the
258+
information in the resource metrics API should use that to retrieve those
259+
metrics, and supplement those metrics with this API.
260+
261+
Mechanical Concerns
262+
-------------------
263+
264+
This API is intended to be implemented by monitoring pipelines (e.g.
265+
inside Heapster, or as an adapter on top of a solution like Prometheus).
266+
It shares many mechanical requirements with normal Kubernetes APIs, such
267+
as needed to support encoding different versions of objects in both JSON
268+
and protobuf, as well as acting as a discoverable API server. For these
269+
reasons, it is expected that implemenators will make use of the Kubernetes
270+
genericapiserver code. If implementors choose not to use this, they must
271+
still follow all of the Kubernetes API server conventions in order to work
272+
properly with consumers of the API.
273+
274+
Location
275+
--------
276+
277+
The types and clients for this API will live in a separate repository
278+
under the Kubernetes organization (e.g. `kubernetes/metrics`). This
279+
respository will most likely also house other metrics-related APIs for
280+
Kubernetes (e.g. historical metrics API definitions, the resource metrics
281+
API definitions, etc).
282+
283+
Note that there will not be a canonical implemenation of the custom
284+
metrics API under Kubernetes, just the types and clients. Implementations
285+
will be left up to the monitoring pipelines.
286+
287+
Alternative Considerations
288+
--------------------------
289+
290+
### Quantity vs Float ###
291+
292+
In the past, custom metrics were represented as floats. In general,
293+
however, Kubernetes APIs are not supposed to use floats. The API proposed
294+
above thus uses `resource.Quantity`. This adds a bit of encoding
295+
overhead, but makes the API line up nicely with other Kubernetes APIs.
296+
297+
### Labeled Metrics ###
298+
299+
Many metric systems support labeled metrics, allowing for dimenisionality
300+
beyond the Kubernetes object hierarchy. Since the HPA currently doesn't
301+
support specifying metric labels, this is not supported via this API. We
302+
may wish to explore this in the future.

0 commit comments

Comments
 (0)