Skip to content

Conversation

@DirectXMan12
Copy link
Contributor

This proposal details the custom metrics API as proposed in the new
monitoring vision. It is designed for use with the HPA, but should be
generally useful for controllers that wish to consumer custom metrics.

(transferred from kubernetes/kubernetes#34586)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 7, 2016
@DirectXMan12
Copy link
Contributor Author

cc @kubernetes/autoscaling @kubernetes/sig-instrumentation @piosz @fgrzadkowski @smarterclayton @derekwaynecarr

@luxas
Copy link
Member

luxas commented Dec 18, 2016

+1 for this

Any ETA on the first implementation?

Also, I heard that InfraStore might be a thing during 2017, are there any proposals/code/docs for that yet?

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from 71b09fc to 3ed5f62 Compare December 19, 2016 21:27
@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Dec 20, 2016

alright, this should address @jszczepkowski comments.

I've tweaked the APIs a bit to make sure they work with the normal mechanics for the API server (basically, we'd register a namespaced resource/subresource of "objects/metrics", where the "name" of the object is the string-form GroupResource, and the metrics subresource is a GetterWithOptions, so that it gets the path after /metrics/ and the URL parameters. @kubernetes/api-review-team unless there's a better way to do this? The alternative seems to be using a custom chain (like the discovery server) and using go-restful manually, but it seems like that would have a higher maintenance burden. WDYT?

@DirectXMan12
Copy link
Contributor Author

@luxas first, we need to get the repo created and API types merged. Then I'll probably do a PoC implementation.

retrieve the given metric for all non-namespaced objects (e.g. Node,
PersistentVolume) of the given type matching the given label selector.

- `/objects/{object-type}/metrics/{metric-name...}?names=foo,bar`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/objects/root-objects?

Timestamp unversioned.Time `json:"timestamp"`

// indicates the duration of the time window containing these metrics
Window unversioned.Duration `json:"window"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove extra spaces between "Window" and "unversionde".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment was not applied

"name": "server2",
"namespace": "webapp"
},
"timestamp": SOME_TIMESTAMP_HERE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the timestamps in lines 108 and 118 must be the same? I guess no. So, maybe rename it some_timestampa_1 and some_timestamp_2?

@jszczepkowski
Copy link
Contributor

In general LGTM, just two nits.

@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Dec 22, 2016

I've actually had a conversation with @liggitt as part of the API review, and he had some suggestions for some tweaks to the API paths to make them easier to integrate with our auth patterns, discovery schema, etc. I should have those up shortly.

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from 3ed5f62 to 5fd3aef Compare January 4, 2017 19:21
@DirectXMan12
Copy link
Contributor Author

alright, new revision is up, bringing the API more in line with the structure of normal Kubernetes APIs. PTAL

Since the HPA itself is only interested in groups of pods (by name or
label selector) or in individual objects, one could potentially argue that
it would be better to have separate endpoints for pods vs other objects.
The complicates the API structure a bit, but does make it possible to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The complicates/This complicates/

metric for all non-namespaced objects of the given type matching the
given label selector

- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, object-type cannot equal "metrics"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but there's no type metrics in the legacy API group (which is the only group that has no explicit group name).

@piosz piosz assigned piosz and unassigned mwielgus Jan 5, 2017
- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given
metric for the given non-namespaced object (e.g. Node, PersistentVolume)

- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an explanation what is the motivation behind the star?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}`: retrieve the given metric for all
non-namespaced objects of the given type

- `/namespaces/{namespace-name}/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline from HPA point of view when querying for metrics from a set of pods (defined by label selector) we expect to have them from the same point in time to be able to perform some aggregations on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a section under semantics indicating that implementors should attempt to return results from roughly the same timestamp, but that clients should verify.

// these metrics come from some time in [Timestamp-Window, Timestamp])
Timestamp unversioned.Time `json:"timestamp"`

// indicates the duration of the time window containing these metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support both types of metrics: instantaneous and from some time window we may want to introduce the rule that when the window is equal to 0 this means that the metric is instantaneous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@piosz
Copy link
Member

piosz commented Jan 5, 2017

cc @kubernetes/sig-instrumentation-misc @kubernetes/sig-autoscaling-misc

@piosz
Copy link
Member

piosz commented Jan 5, 2017

cc @fabxc @irabinovitch

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch 2 times, most recently from 363b555 to 0ffdbe6 Compare January 5, 2017 21:57
@DirectXMan12
Copy link
Contributor Author

@kubernetes/sig-autoscaling-misc updated as discussed in the SIG meeting, PTAL


### Metric Names ###

Metric names must be escaped so that any given metric appears as a single
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escaped how?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important to get defined as there's been issues with URL escaping previously (such as %2F which Golang http library incorrectly handles). And also if there are restrictions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt do we have a recommended way of escaping then? I would have said normal URL escaping, but I was not aware that Go's HTTP library was broken like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than try to allow a way to include an escaped "/" in an object name, kubernetes objects are simply not allowed to have names containing / or % characters, and are not allowed to be . or ..

that allows you to use normal URL path segment encoding to safely plumb values through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed we could just disallow the same things here.

The `object-type` parameter should be the string form of
`unversioned.GroupResource`. Note that we do not include version in this;
we simply wish to uniquely identify all the different types of objects in
Kubernetes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple examples here would be helpful (maybe for pods in the legacy group, and one for jobs in the batch group)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

For metrics systems that support differentiating metrics beyond the Kubernetes
object hierarchy (such as using additional labels), the metrics systems should
have a metric which represents all such series aggregated together.
Additionally, implementors may choose to the individual "sub-metrics" via the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may choose to identify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️


The types and clients for this API will live in a separate repository
under the Kubernetes organization (e.g. `kubernetes/metrics`). This
respository will most likely also house other metrics-related APIs for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repository

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

- `/{object-type}/*/{metric-name...}`: retrieve the given metric for all
non-namespaced objects of the given type

- `/{object-type}/*/{metric-name...}?labelSelector=foo`: retrieve the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected to return a MetricValueList?

The root API path will look like `/apis/custom-metrics/v1alpha1`. For
brevity, this will be left off below.

- `/{object-type}/{object-name}/{metric-name...}`: retrieve the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is expected to return a MetricValue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this same url pattern (with an object-name of *) is expected to return a MetricValueList, this should as well

Copy link
Contributor Author

@DirectXMan12 DirectXMan12 Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, all of these URLs return a MetricValueList. I'll clarify that (I missed that the API objects section I had written still said that some URLs returned the singular form).

unversioned.ListMeta `json:"metadata,omitempty"`

// the value of the metric across the described objects
Items []MetricValue `json:"metricValues"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json:"items"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton can correct me if I'm wrong, but I think API lists have to contain runtime.Objects, which means TypeMeta on MetricValue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

given metric for all non-namespaced objects of the given type matching
the given label selector

- `/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would object-name be expected to match namespace-name?

Copy link
Member

@derekwaynecarr derekwaynecarr Jan 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt - i don't follow.

i assumed this was saying:

/namespaces/ns1/services/myservice/my-metric

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this was supposed to be attached to the "metrics for a namespace" example:

- `/namespaces/{namespace-name}/metrics/{metric-name}`: retrieve the given
  metric which describes the given namespace.


Note that we introduce the syntax of having a name of ` * ` here since
there is no current syntax for getting the output of a subresource on
multiple objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also what might make it necessary to return a MetricsValueList from the subresource, even when dealing with a single named object.

### Metric Values and Timing ###

There should be only one metric value per object requested. The returned
metrics should be the most recenly available metrics, as with the resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

corresponds to a URL of the form
`/namespaces/$NS/pods/*/$METRIC_NAME?labelSelector=foo`, while the
`object` source type corresponds to a URL of the form
`/namespaces/$NS/$KIND.$GROUP/$OBJECT_NAME/$METRIC_NAME`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$RESOURCE.$GROUP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, typo

This API is intended to be implemented by monitoring pipelines (e.g.
inside Heapster, or as an adapter on top of a solution like Prometheus).
It shares many mechanical requirements with normal Kubernetes APIs, such
as needed to support encoding different versions of objects in both JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"such as the need"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

reasons, it is expected that implemenators will make use of the Kubernetes
genericapiserver code. If implementors choose not to use this, they must
still follow all of the Kubernetes API server conventions in order to work
properly with consumers of the API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this more specific, especially around which API verbs are supported (appears only to be get?) and what semantics resourceVersion carries for MetricValue objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

// which these metrics were calculated, when returning rate
// metrics calculated from cumulative metrics (or zero for
// non-calculated instantaneous metrics).
Window unversioned.Duration `json:"window"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of three different types of metrics:

  1. Guage (memory usage for example)
  2. Cumulative (API hits since start, uptime, etc)
  3. Rate (API hits per second)

With Window are we intending to state that the metric is of type 3? What about 2 and 1, how are they represented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From past discussion my take away was that the exact semantic meaning is left to the provider of the custom metrics. Integrating systems behave differently and there's little point in specifying something here we can't enforce anyway.

I'd consider window as a return attribute a gimmick. It's much more likely that one wants to specify it in the request as the aggregation window has significant effect on the value.
Let's say we implemented a proxy for Prometheus. We'd follow the recording rules (materialized queries) naming pattern like:

/...//namespaces/webapp/ingress.extensions/*/requests:rate5m?labelSelector=app%3Dfrontend

The window is specified in the requested metric and the proxy would merely copy that into the window field in the response.

How is window used in the HPA? It seems tedious to extract from the metric name in this example – especially if users violate naming conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishh for cases where window doesn't apply, it is simply set to zero. Window only applies in cases where you have a derived metric (e.g. cpu usage).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider window as a return attribute a gimmick. It's much more likely that one wants to specify it in the request as the aggregation window has significant effect on the value.

Recapping the discussion in SIG instrumentation:

This value is set to inform the consumers what range of data was used to produce a particular value, so they can avoid making decisions on largely overlapping data (e.g. in the HPA, we might want to wait until we get mostly non-overlapping data to make a scale-up decision after an initial scale up, so that we don't over scale).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't timestamp enough for HPA? I assume HPA will only deal with instantaneous metrics?

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from eb443a2 to cc963b3 Compare January 19, 2017 19:41
@DirectXMan12
Copy link
Contributor Author

@liggitt I think I've address most of your comments.

re the namespace path, it follows a slightly different pattern (we use a "resource" metrics, and the object names are metric name), since otherwise we'd have to have redundant object name -- with the former, we have /namespaces/foo/metrics/mymetric whereas with the latter we'd need /namespaces/foo/metrics/foo/mymetric.

@liggitt
Copy link
Member

liggitt commented Jan 19, 2017

re the namespace path, it follows a slightly different pattern (we use a "resource" metrics, and the object names are metric name), since otherwise we'd have to have redundant object name -- with the former, we have /namespaces/foo/metrics/mymetric whereas with the latter we'd need /namespaces/foo/metrics/foo/mymetric.

do you see it being important to select namespace metrics across all namespaces (or by labelSelector)?

@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Jan 19, 2017

do you see it being important to select namespace metrics across all namespaces (or by labelSelector)?

not particularly, no. Selecting things inside a namespace is useful, but since namespaces are the "boundary", selecting across all namespaces seems less of a big deal. The one case where it might be useful is dashboards, but even there, I'm not sure how useful it would be.

// which these metrics were calculated, when returning rate
// metrics calculated from cumulative metrics (or zero for
// non-calculated instantaneous metrics).
Window unversioned.Duration `json:"window"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see:
kubernetes/kubernetes#39914 (comment)

if we dont need sub-second, may be best to move the window to WindowSeconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from cc963b3 to e094b55 Compare January 20, 2017 20:22
// which these metrics were calculated, when returning rate
// metrics calculated from cumulative metrics (or zero for
// non-calculated instantaneous metrics).
WindowSeconds int64 `json:"window"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hate to nit here, but why not make this a pointer so we can ignore the zero case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just omitempty so we don't have to deal with a pointer

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from e094b55 to a53ae2c Compare January 20, 2017 20:47
@derekwaynecarr
Copy link
Member

derekwaynecarr commented Jan 20, 2017 via email

@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from a53ae2c to 5236fd2 Compare January 26, 2017 05:14
@DirectXMan12
Copy link
Contributor Author

both ActiveDeadlineSeconds and TerminationGracePeriodSeconds are pointers. so basically, we are always pointers on these things right now.

ok, done.

@liggitt @vishh @piosz @derekwaynecarr any more thoughts, or is this good to merge?

@piosz
Copy link
Member

piosz commented Jan 31, 2017

LGTM from me.

The resource source type is taken from the API provided by the
"metrics" API group (the master/resource metrics API).

The HPA will consume the API as a federated API server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assume we mean aggregated API server now?

@derekwaynecarr
Copy link
Member

LGTM from me, we should augment in the future how we choose to promote this to a non v1alpha1 form. For example, require community demonstrated adapters that conform to the model.

This proposal details the custom metrics API as proposed in the new
monitoring vision. It is designed for use with the HPA, but should be
generally useful for controllers that wish to consumer custom metrics.
@DirectXMan12 DirectXMan12 force-pushed the proposal/custom-metrics-api branch from 5236fd2 to 7cd59be Compare February 2, 2017 16:03
@DirectXMan12
Copy link
Contributor Author

latest update just clarifies that the LIST operations on branch nodes (i.e. those operations not defined in the list of API paths) are not currently supported.

@liggitt
Copy link
Member

liggitt commented Feb 2, 2017

LGTM

@piosz piosz merged commit 657df96 into kubernetes:master Feb 2, 2017
@DirectXMan12 DirectXMan12 deleted the proposal/custom-metrics-api branch February 2, 2017 16:58
shyamjvs pushed a commit to shyamjvs/community that referenced this pull request Sep 22, 2017
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
…etrics-api

Proposal: Introduce Custom Metrics API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.

Projects

None yet

Development

Successfully merging this pull request may close these issues.