Kubelet raw metrics API proposal #15862

timstclair · 2015-10-19T18:17:31Z

Forked from discussion on #15691
Addresses #12483

cc/ @vishh @jimmidyson @dchen1107 @bgrant0607

timstclair · 2015-10-19T18:18:39Z

docs/proposals/kubelet-raw-metrics-api.md

+- `start` - start time to return metrics from; type json encoded `time.Time`
+- `end` - end time to return metrics to; type json encoded `time.Time`
+- `count` - maximum number of stats to return in each ContainerMetrics instance;
+  type int


As @lavalamp says, this is "poor-man's paging" -- Is there a better way to structure this parameter to support proper paging in the future (e.g. should this be max total stats, as opposed to max per-container stats?)

I wonder if we can get rid of this parameter. What we ideally need is a way to downsample the data returned, to reduce the load on the higher layers and the network.

I'd prefer to have this as a step between results. We could calculate a reasonable step depending on size of window between start & end.

@jimmidyson what do you mean by a step? Time step? In that case, we would need to validate that the raw metrics step evenly divides it.

@vishh is step & downsample the same thing? In either case, would we aggregate samples, or just drop samples not requested?

Yes a time step. IMO metrics should be reported at regular intervals for ease of use & understanding.

I would drop samples not requested.

Yes.
An usual query is something like "give me stats every 30 seconds for the
last five minutes".
I can express the duration with start and end. Adding step or rate
will help us express the downsampling rate.

On Mon, Oct 19, 2015 at 3:30 PM, Tim St. Clair notifications@github.com
wrote:

In docs/proposals/kubelet-raw-metrics-api.md
#15862 (comment)
:

+- /pods - All pod metrics across all namespaces; type []metrics.Pod
+- /namespaces/{namespace}/pods - All pod metrics within namespace; type

[]metrics.Pod
+- /namespaces/{namespace}/pods/{pod} - metrics for specific pod; type

metrics.Pod
+- Unsupported paths return status not found (404)

/namespaces/

/namespaces/{namespace}

+Additionally, all endpoints (except root discovery endpoint) support the
+following optional query parameters:
+
+- start - start time to return metrics from; type json encoded time.Time
+- end - end time to return metrics to; type json encoded time.Time
+- count - maximum number of stats to return in each ContainerMetrics instance;

type int

@jimmidyson https://github.com/jimmidyson what do you mean by a step?
Time step? In that case, we would need to validate that the raw metrics
step evenly divides it.

@vishh https://github.com/vishh is step & downsample the same thing? In
either case, would we aggregate samples, or just drop samples not requested?

—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/15862/files#r42434514.

OK so can we change count? I prefer step or interval to rate as that normally refers to rate of change when talking about metrics.

IMO this should be optional & a reasonable step value should be calculated from the requested window size.

Ok, I went with step. However, I disagree that the default should be variable based on the window size. I think it will be better to have a default (predictable) default size. For instance, if I make a request and get back stats ever 10 seconds, and then increase my window size by 30 seconds, I'd expect to get 3 more stats back, not get the same number spaced 11 seconds apart.

Added count back as there is a heapster use case for it.

cc/ @mwielgus

What is the use case? What is the behaviour if available metrics are greater than requested count? Trim from beginning or end of window?

k8s-github-robot · 2015-10-19T18:35:26Z

Labelling this PR as size/L

vishh · 2015-10-19T18:38:27Z

Thanks for the proposal @timstclair! Much appreciated!

k8s-bot · 2015-10-19T19:02:56Z

GCE e2e test build/test passed for commit 05794bcf947ab20f2de6f12d75f61c43667d9b79.

timstclair · 2015-10-19T21:58:04Z

FYI, added a few notes about implementation.

k8s-bot · 2015-10-19T22:23:27Z

GCE e2e test build/test passed for commit 6c8d1c3ff400b7cb6ad9b169de5bba42acea9c50.

jimmidyson · 2015-10-19T22:25:21Z

docs/proposals/kubelet-raw-metrics-api.md

+// ContainerMetrics is a k8s wrapper around cAdvisor metrics
+type ContainerMetrics struct {
+  ObjectMeta, TypeMeta
+  Info cadvisorv2.ContainerInfo


I'd prefer all type information to be in Kubernetes & provide a converter from cadvisor API to Kubernetes API, even if that is a 1-1 field mapping. More independent of implementation & can be versioned separately.

The cadvisor APIs are also versioned right. What do we get by adding one more layer of versioning here?

It's worth noting that the API is currently completely self contained. I don't have a strong opinion here, but I bet someone else does :)

I think the important pieces are that everything is documented together in the same place. If the cadvisor API has separate documentation, I think that's a pretty strong argument for cloning the API.

IMO cadvisor is an implementation detail. Great that the cadvisor API is versioned, but I think the metrics API needs to be fully self-contained & versioned independently. Also using the cadvisor API directly would leak cadvisor details through to the unversioned representations as per the Kubernetes API conventions.

I'm sold. We'll mirror the cadvisor v2 APIs in the versioned metrics API, and also add ObjectMeta & TypeMeta to each object.

k8s-bot · 2015-10-20T18:00:08Z

GCE e2e build/test failed for commit d716bf7d0c167e4db1a72088d9061215e9d8e25b.

k8s-bot · 2015-10-20T18:29:36Z

GCE e2e test build/test passed for commit a8edd1f22285c08fae7af0c1308507296546f336.

timstclair · 2015-10-21T00:43:50Z

Ok, merged this proposal into the existing DerivedMetrics proposal. I updated that proposal to be consistent with everything proposed here. Comment away!

k8s-bot · 2015-10-21T01:05:11Z

GCE e2e build/test failed for commit 064f35657c6cf02f32831bbe67193a88653518a7.

jimmidyson · 2015-10-21T09:46:45Z

👍 Nice work @timstclair!

timstclair · 2015-10-22T20:10:38Z

@mwielgus - What were the heapster requirements we discussed yesterday? I remember:

pod labels
pod status

Was there anything else?

mwielgus · 2015-10-22T21:42:13Z

@timstclair
Pod status and container CPU/mem request/limit.

cc: @piosz

dchen1107 · 2015-10-22T23:34:02Z

docs/proposals/compute-resource-metrics-api.md

+  - `/nodes/localhost` - When served by kubelet, the only node provided is
+    `localhost`; type metrics.Node
+  - `/nodes/{node}` - metrics for a specific node
+- `/derivedNodes` - host metrics; type `[]metrics.DerivedNode`


Can we document here the difference between /nodes and /derivedNodes? Same for /pods?

From the document, i cannot see any difference.

vishh · 2015-10-23T00:24:59Z

docs/proposals/compute-resource-metrics-api.md

+- `end` - end time to return metrics to; type json encoded `time.Time`
+- `step` - the time step between each stats sample; type int (seconds), default
+  10s, must be a multiple of 10s
+- `count` - maximum number of stats to return in each ContainerMetrics instance;


Why is this necessary? @piosz @mwielgus

k8s-bot · 2015-10-23T00:37:31Z

GCE e2e test build/test passed for commit 3f367bb7fda13da6b93256b712494b3991a8fde0.

k8s-bot · 2015-10-23T00:59:44Z

GCE e2e build/test failed for commit e085448688919be57369e7b6c49681da3682c359.

k8s-bot · 2015-10-23T01:22:15Z

GCE e2e test build/test passed for commit fa3feac0ff04c2705ee1803a555b1e0d35852618.

jimmidyson · 2015-10-24T19:06:10Z

docs/proposals/compute-resource-metrics-api.md

+  - `/rawNodes/localhost` - The only node provided is `localhost`; type
+    metrics.Node
+- `/derivedNodes` - host metrics; type `[]metrics.DerivedNode`
+  - `/nodes/{node}` - derived metrics for a specific node


derivedNodes

dchen1107 · 2015-10-27T18:36:31Z

LGTM

k8s-bot · 2015-10-27T18:58:01Z

GCE e2e build/test failed for commit fa3feac0ff04c2705ee1803a555b1e0d35852618.

timstclair · 2015-10-27T19:09:29Z

@k8s-bot test this

k8s-bot · 2015-10-27T19:40:11Z

GCE e2e test build/test passed for commit fa3feac0ff04c2705ee1803a555b1e0d35852618.

timstclair · 2015-10-27T22:05:29Z

squashed

k8s-github-robot · 2015-10-27T22:09:59Z

PR changed after LGTM, removing LGTM.

k8s-bot · 2015-10-27T22:35:55Z

GCE e2e test build/test passed for commit accb08c.

wojtek-t · 2015-10-29T08:01:48Z

@k8s-bot unit test this please

k8s-github-robot · 2015-10-29T08:35:09Z

Automatic merge from submit-queue

Auto commit by PR queue bot

vishh · 2015-10-29T17:29:48Z

docs/proposals/compute-resource-metrics-api.md

+}
+type RawPod struct {
+  TypeMeta
+  ObjectMeta              // Should include pod name


@timstclair: I just remembered that network usage is at the pod level and volumes disk usage is also at the pod level. So in addition to container metrics, we will have pod level metrics too.

Would we move those stats out of ContainerStats and into RawPod, add additional NetworkStats into the RawPod, or add a new Network stats type?

Yes. Its either that or we can pre-define an infrastructure container and
associate all pod level stats to that container. Even in that case, volume
fs usage cannot belong to any containers.

On Thu, Oct 29, 2015 at 2:29 PM, Tim St. Clair notifications@github.com
wrote:

In docs/proposals/compute-resource-metrics-api.md
#15862 (comment)
:

+## Schema
+
+Types are colocated with other API groups in /pkg/apis/metrics, and follow api
+groups conventions there.
+
+```go
+// Raw metrics are only available through the kubelet API.
+type RawNode struct {

TypeMeta

ObjectMeta // Should include node name

Machine ContainerMetrics

SystemContainers []ContainerMetrics
+}
+type RawPod struct {

TypeMeta

ObjectMeta // Should include pod name

Would we move those stats out of ContainerStats
https://github.com/timstclair/cadvisor/blob/master/info/v2/container.go#L105
and into RawPod, add additional NetworkStats into the RawPod, or add a new
Network stats type?

—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/15862/files#r43446885.

googlebot added the cla: yes label Oct 19, 2015

timstclair reviewed Oct 19, 2015
View reviewed changes

timstclair mentioned this pull request Oct 19, 2015

WIP: kubelet aggregate metrics versioned API endpoint #15691

Closed

k8s-github-robot assigned brendandburns Oct 19, 2015

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Oct 19, 2015

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 19, 2015

timstclair force-pushed the metrics-proposal branch from 05794bc to 6c8d1c3 Compare October 19, 2015 21:57

timstclair added the e2e-not-required label Oct 19, 2015

jimmidyson reviewed Oct 19, 2015
View reviewed changes

timstclair force-pushed the metrics-proposal branch 2 times, most recently from d716bf7 to a8edd1f Compare October 20, 2015 17:42

timstclair force-pushed the metrics-proposal branch from a8edd1f to 064f356 Compare October 21, 2015 00:40

dchen1107 assigned dchen1107 and unassigned brendandburns Oct 22, 2015

dchen1107 reviewed Oct 22, 2015
View reviewed changes

timstclair force-pushed the metrics-proposal branch from 064f356 to 3f367bb Compare October 23, 2015 00:11

vishh reviewed Oct 23, 2015
View reviewed changes

timstclair force-pushed the metrics-proposal branch from 3f367bb to e085448 Compare October 23, 2015 00:25

jimmidyson reviewed Oct 24, 2015
View reviewed changes

dchen1107 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2015

dchen1107 closed this Oct 27, 2015

dchen1107 reopened this Oct 27, 2015

Add kubelet raw metrics API proposal

accb08c

timstclair force-pushed the metrics-proposal branch from fa3feac to accb08c Compare October 27, 2015 22:05

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2015

timstclair added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2015

timstclair mentioned this pull request Oct 28, 2015

Fill in kubelet metrics API #16458

Closed

k8s-github-robot pushed a commit that referenced this pull request Oct 29, 2015

Merge pull request #15862 from timstclair/metrics-proposal

139e158

Auto commit by PR queue bot

k8s-github-robot merged commit 139e158 into kubernetes:master Oct 29, 2015

vishh reviewed Oct 29, 2015
View reviewed changes

timstclair deleted the metrics-proposal branch October 29, 2015 21:26

vishh mentioned this pull request Nov 2, 2015

heapster polling stats for long-dead pods #16168

Closed

dchen1107 mentioned this pull request Dec 17, 2015

Standalone cAdvisor for monitoring #18770

Closed

Kubelet raw metrics API proposal #15862

Kubelet raw metrics API proposal #15862

Conversation

timstclair commented Oct 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-github-robot commented Oct 19, 2015

vishh commented Oct 19, 2015

k8s-bot commented Oct 19, 2015

timstclair commented Oct 19, 2015

k8s-bot commented Oct 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Oct 20, 2015

k8s-bot commented Oct 20, 2015

timstclair commented Oct 21, 2015

k8s-bot commented Oct 21, 2015

jimmidyson commented Oct 21, 2015

timstclair commented Oct 22, 2015

mwielgus commented Oct 22, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Oct 23, 2015

k8s-bot commented Oct 23, 2015

k8s-bot commented Oct 23, 2015

Choose a reason for hiding this comment

dchen1107 commented Oct 27, 2015

k8s-bot commented Oct 27, 2015

timstclair commented Oct 27, 2015

k8s-bot commented Oct 27, 2015

timstclair commented Oct 27, 2015

k8s-github-robot commented Oct 27, 2015

k8s-bot commented Oct 27, 2015

wojtek-t commented Oct 29, 2015

k8s-github-robot commented Oct 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment