-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Proposal: Introduce Custom Metrics API #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Introduce Custom Metrics API #152
Conversation
|
cc @kubernetes/autoscaling @kubernetes/sig-instrumentation @piosz @fgrzadkowski @smarterclayton @derekwaynecarr |
|
+1 for this Any ETA on the first implementation? Also, I heard that InfraStore might be a thing during 2017, are there any proposals/code/docs for that yet? |
71b09fc to
3ed5f62
Compare
|
alright, this should address @jszczepkowski comments. I've tweaked the APIs a bit to make sure they work with the normal mechanics for the API server (basically, we'd register a namespaced resource/subresource of "objects/metrics", where the "name" of the object is the string-form GroupResource, and the metrics subresource is a GetterWithOptions, so that it gets the path after |
|
@luxas first, we need to get the repo created and API types merged. Then I'll probably do a PoC implementation. |
| retrieve the given metric for all non-namespaced objects (e.g. Node, | ||
| PersistentVolume) of the given type matching the given label selector. | ||
|
|
||
| - `/objects/{object-type}/metrics/{metric-name...}?names=foo,bar`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/objects/root-objects?
| Timestamp unversioned.Time `json:"timestamp"` | ||
|
|
||
| // indicates the duration of the time window containing these metrics | ||
| Window unversioned.Duration `json:"window"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please remove extra spaces between "Window" and "unversionde".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment was not applied
| "name": "server2", | ||
| "namespace": "webapp" | ||
| }, | ||
| "timestamp": SOME_TIMESTAMP_HERE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the timestamps in lines 108 and 118 must be the same? I guess no. So, maybe rename it some_timestampa_1 and some_timestamp_2?
|
In general LGTM, just two nits. |
|
I've actually had a conversation with @liggitt as part of the API review, and he had some suggestions for some tweaks to the API paths to make them easier to integrate with our auth patterns, discovery schema, etc. I should have those up shortly. |
3ed5f62 to
5fd3aef
Compare
|
alright, new revision is up, bringing the API more in line with the structure of normal Kubernetes APIs. PTAL |
| Since the HPA itself is only interested in groups of pods (by name or | ||
| label selector) or in individual objects, one could potentially argue that | ||
| it would be better to have separate endpoints for pods vs other objects. | ||
| The complicates the API structure a bit, but does make it possible to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/The complicates/This complicates/
| metric for all non-namespaced objects of the given type matching the | ||
| given label selector | ||
|
|
||
| - `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, object-type cannot equal "metrics"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but there's no type metrics in the legacy API group (which is the only group that has no explicit group name).
| - `/{object-type}/{object-name}/{metric-name...}`: retrieve the given | ||
| metric for the given non-namespaced object (e.g. Node, PersistentVolume) | ||
|
|
||
| - `/{object-type}/*/{metric-name...}`: retrieve the given metric for all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add an explanation what is the motivation behind the star?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
| - `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all | ||
| non-namespaced objects of the given type | ||
|
|
||
| - `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline from HPA point of view when querying for metrics from a set of pods (defined by label selector) we expect to have them from the same point in time to be able to perform some aggregations on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section under semantics indicating that implementors should attempt to return results from roughly the same timestamp, but that clients should verify.
| // these metrics come from some time in [Timestamp-Window, Timestamp]) | ||
| Timestamp unversioned.Time `json:"timestamp"` | ||
|
|
||
| // indicates the duration of the time window containing these metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support both types of metrics: instantaneous and from some time window we may want to introduce the rule that when the window is equal to 0 this means that the metric is instantaneous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
|
cc @kubernetes/sig-instrumentation-misc @kubernetes/sig-autoscaling-misc |
363b555 to
0ffdbe6
Compare
|
@kubernetes/sig-autoscaling-misc updated as discussed in the SIG meeting, PTAL |
|
|
||
| ### Metric Names ### | ||
|
|
||
| Metric names must be escaped so that any given metric appears as a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escaped how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important to get defined as there's been issues with URL escaping previously (such as %2F which Golang http library incorrectly handles). And also if there are restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt do we have a recommended way of escaping then? I would have said normal URL escaping, but I was not aware that Go's HTTP library was broken like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than try to allow a way to include an escaped "/" in an object name, kubernetes objects are simply not allowed to have names containing / or % characters, and are not allowed to be . or ..
that allows you to use normal URL path segment encoding to safely plumb values through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed we could just disallow the same things here.
| The `object-type` parameter should be the string form of | ||
| `unversioned.GroupResource`. Note that we do not include version in this; | ||
| we simply wish to uniquely identify all the different types of objects in | ||
| Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple examples here would be helpful (maybe for pods in the legacy group, and one for jobs in the batch group)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
| For metrics systems that support differentiating metrics beyond the Kubernetes | ||
| object hierarchy (such as using additional labels), the metrics systems should | ||
| have a metric which represents all such series aggregated together. | ||
| Additionally, implementors may choose to the individual "sub-metrics" via the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may choose to identify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
|
|
||
| The types and clients for this API will live in a separate repository | ||
| under the Kubernetes organization (e.g. `kubernetes/metrics`). This | ||
| respository will most likely also house other metrics-related APIs for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
| - `/{object-type}/*/{metric-name...}`: retrieve the given metric for all | ||
| non-namespaced objects of the given type | ||
|
|
||
| - `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is expected to return a MetricValueList?
| The root API path will look like `/apis/custom-metrics/v1alpha1`. For | ||
| brevity, this will be left off below. | ||
|
|
||
| - `/{object-type}/{object-name}/{metric-name...}`: retrieve the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is expected to return a MetricValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this same url pattern (with an object-name of *) is expected to return a MetricValueList, this should as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, all of these URLs return a MetricValueList. I'll clarify that (I missed that the API objects section I had written still said that some URLs returned the singular form).
| unversioned.ListMeta `json:"metadata,omitempty"` | ||
|
|
||
| // the value of the metric across the described objects | ||
| Items []MetricValue `json:"metricValues"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json:"items"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton can correct me if I'm wrong, but I think API lists have to contain runtime.Objects, which means TypeMeta on MetricValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
| given metric for all non-namespaced objects of the given type matching | ||
| the given label selector | ||
|
|
||
| - `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would object-name be expected to match namespace-name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this was supposed to be attached to the "metrics for a namespace" example:
- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
metric which describes the given namespace.
|
|
||
| Note that we introduce the syntax of having a name of ` * ` here since | ||
| there is no current syntax for getting the output of a subresource on | ||
| multiple objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also what might make it necessary to return a MetricsValueList from the subresource, even when dealing with a single named object.
| ### Metric Values and Timing ### | ||
|
|
||
| There should be only one metric value per object requested. The returned | ||
| metrics should be the most recenly available metrics, as with the resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
| corresponds to a URL of the form | ||
| `/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the | ||
| `object` source type corresponds to a URL of the form | ||
| `/namespaces/$NS/$KIND.$GROUP/$OBJECT_NAME/$METRIC_NAME`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$RESOURCE.$GROUP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, typo
| This API is intended to be implemented by monitoring pipelines (e.g. | ||
| inside Heapster, or as an adapter on top of a solution like Prometheus). | ||
| It shares many mechanical requirements with normal Kubernetes APIs, such | ||
| as needed to support encoding different versions of objects in both JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"such as the need"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
| reasons, it is expected that implemenators will make use of the Kubernetes | ||
| genericapiserver code. If implementors choose not to use this, they must | ||
| still follow all of the Kubernetes API server conventions in order to work | ||
| properly with consumers of the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this more specific, especially around which API verbs are supported (appears only to be get?) and what semantics resourceVersion carries for MetricValue objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
| // which these metrics were calculated, when returning rate | ||
| // metrics calculated from cumulative metrics (or zero for | ||
| // non-calculated instantaneous metrics). | ||
| Window unversioned.Duration `json:"window"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of three different types of metrics:
- Guage (memory usage for example)
- Cumulative (API hits since start, uptime, etc)
- Rate (API hits per second)
With Window are we intending to state that the metric is of type 3? What about 2 and 1, how are they represented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From past discussion my take away was that the exact semantic meaning is left to the provider of the custom metrics. Integrating systems behave differently and there's little point in specifying something here we can't enforce anyway.
I'd consider window as a return attribute a gimmick. It's much more likely that one wants to specify it in the request as the aggregation window has significant effect on the value.
Let's say we implemented a proxy for Prometheus. We'd follow the recording rules (materialized queries) naming pattern like:
/...//namespaces/webapp/ingress.extensions/*/requests:rate5m?labelSelector=app%3Dfrontend
The window is specified in the requested metric and the proxy would merely copy that into the window field in the response.
How is window used in the HPA? It seems tedious to extract from the metric name in this example – especially if users violate naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishh for cases where window doesn't apply, it is simply set to zero. Window only applies in cases where you have a derived metric (e.g. cpu usage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider window as a return attribute a gimmick. It's much more likely that one wants to specify it in the request as the aggregation window has significant effect on the value.
Recapping the discussion in SIG instrumentation:
This value is set to inform the consumers what range of data was used to produce a particular value, so they can avoid making decisions on largely overlapping data (e.g. in the HPA, we might want to wait until we get mostly non-overlapping data to make a scale-up decision after an initial scale up, so that we don't over scale).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't timestamp enough for HPA? I assume HPA will only deal with instantaneous metrics?
eb443a2 to
cc963b3
Compare
|
@liggitt I think I've address most of your comments. re the namespace path, it follows a slightly different pattern (we use a "resource" |
do you see it being important to select namespace metrics across all namespaces (or by labelSelector)? |
not particularly, no. Selecting things inside a namespace is useful, but since namespaces are the "boundary", selecting across all namespaces seems less of a big deal. The one case where it might be useful is dashboards, but even there, I'm not sure how useful it would be. |
| // which these metrics were calculated, when returning rate | ||
| // metrics calculated from cumulative metrics (or zero for | ||
| // non-calculated instantaneous metrics). | ||
| Window unversioned.Duration `json:"window"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage of duration has been a problem, do we need subsecond here.
see:
https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#naming-conventions
https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see:
kubernetes/kubernetes#39914 (comment)
if we dont need sub-second, may be best to move the window to WindowSeconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
cc963b3 to
e094b55
Compare
| // which these metrics were calculated, when returning rate | ||
| // metrics calculated from cumulative metrics (or zero for | ||
| // non-calculated instantaneous metrics). | ||
| WindowSeconds int64 `json:"window"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hate to nit here, but why not make this a pointer so we can ignore the zero case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just omitempty so we don't have to deal with a pointer
e094b55 to
a53ae2c
Compare
|
both ActiveDeadlineSeconds and TerminationGracePeriodSeconds are pointers.
so basically, we are always pointers on these things right now.
…On Fri, Jan 20, 2017 at 3:46 PM, Solly Ross ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In contributors/design-proposals/custom-metrics-api.md
<#152>:
> + metav1.TypeMeta`json:",inline"`
+
+ // a reference to the described object
+ DescribedObject ObjectReference `json:"describedObject"`
+
+ // the name of the metric
+ MetricName string `json:"metricName"`
+
+ // indicates the time at which the metrics were produced
+ Timestamp unversioned.Time `json:"timestamp"`
+
+ // indicates the window ([Timestamp-Window, Timestamp]) from
+ // which these metrics were calculated, when returning rate
+ // metrics calculated from cumulative metrics (or zero for
+ // non-calculated instantaneous metrics).
+ WindowSeconds int64 `json:"window"`
or just omitempty so we don't have to deal with a pointer
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#152>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AF8dbJuihToWV2Hp6iWCJzEZA7aWQnQpks5rUR0egaJpZM4LHCWX>
.
|
a53ae2c to
5236fd2
Compare
ok, done. @liggitt @vishh @piosz @derekwaynecarr any more thoughts, or is this good to merge? |
|
LGTM from me. |
| The resource source type is taken from the API provided by the | ||
| "metrics" API group (the master/resource metrics API). | ||
|
|
||
| The HPA will consume the API as a federated API server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assume we mean aggregated API server now?
|
LGTM from me, we should augment in the future how we choose to promote this to a non v1alpha1 form. For example, require community demonstrated adapters that conform to the model. |
This proposal details the custom metrics API as proposed in the new monitoring vision. It is designed for use with the HPA, but should be generally useful for controllers that wish to consumer custom metrics.
5236fd2 to
7cd59be
Compare
|
latest update just clarifies that the LIST operations on branch nodes (i.e. those operations not defined in the list of API paths) are not currently supported. |
|
LGTM |
Remove extraneous info from draft
…etrics-api Proposal: Introduce Custom Metrics API
This proposal details the custom metrics API as proposed in the new
monitoring vision. It is designed for use with the HPA, but should be
generally useful for controllers that wish to consumer custom metrics.
(transferred from kubernetes/kubernetes#34586)