New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet raw metrics API proposal #15862
Kubelet raw metrics API proposal #15862
Conversation
- `start` - start time to return metrics from; type json encoded `time.Time` | ||
- `end` - end time to return metrics to; type json encoded `time.Time` | ||
- `count` - maximum number of stats to return in each ContainerMetrics instance; | ||
type int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @lavalamp says, this is "poor-man's paging" -- Is there a better way to structure this parameter to support proper paging in the future (e.g. should this be max total stats, as opposed to max per-container stats?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can get rid of this parameter. What we ideally need is a way to downsample the data returned, to reduce the load on the higher layers and the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have this as a step
between results. We could calculate a reasonable step depending on size of window between start
& end
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimmidyson what do you mean by a step? Time step? In that case, we would need to validate that the raw metrics step evenly divides it.
@vishh is step & downsample the same thing? In either case, would we aggregate samples, or just drop samples not requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes a time step. IMO metrics should be reported at regular intervals for ease of use & understanding.
I would drop samples not requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
An usual query is something like "give me stats every 30 seconds for the
last five minutes".
I can express the duration with start
and end
. Adding step
or rate
will help us express the downsampling rate.
On Mon, Oct 19, 2015 at 3:30 PM, Tim St. Clair notifications@github.com
wrote:
In docs/proposals/kubelet-raw-metrics-api.md
#15862 (comment)
:+-
/pods
- All pod metrics across all namespaces; type[]metrics.Pod
+-/namespaces/{namespace}/pods
- All pod metrics within namespace; type
[]metrics.Pod
+-/namespaces/{namespace}/pods/{pod}
- metrics for specific pod; typemetrics.Pod
+- Unsupported paths return status not found (404)
/namespaces/
/namespaces/{namespace}
+Additionally, all endpoints (except root discovery endpoint) support the
+following optional query parameters:
+
+-start
- start time to return metrics from; type json encodedtime.Time
+-end
- end time to return metrics to; type json encodedtime.Time
+-count
- maximum number of stats to return in each ContainerMetrics instance;
- type int
@jimmidyson https://github.com/jimmidyson what do you mean by a step?
Time step? In that case, we would need to validate that the raw metrics
step evenly divides it.@vishh https://github.com/vishh is step & downsample the same thing? In
either case, would we aggregate samples, or just drop samples not requested?—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/15862/files#r42434514.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so can we change count
? I prefer step
or interval
to rate
as that normally refers to rate of change when talking about metrics.
IMO this should be optional & a reasonable step value should be calculated from the requested window size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I went with step. However, I disagree that the default should be variable based on the window size. I think it will be better to have a default (predictable) default size. For instance, if I make a request and get back stats ever 10 seconds, and then increase my window size by 30 seconds, I'd expect to get 3 more stats back, not get the same number spaced 11 seconds apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added count
back as there is a heapster use case for it.
cc/ @mwielgus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case? What is the behaviour if available metrics are greater than requested count
? Trim from beginning or end of window?
Labelling this PR as size/L |
Thanks for the proposal @timstclair! Much appreciated! |
GCE e2e test build/test passed for commit 05794bcf947ab20f2de6f12d75f61c43667d9b79. |
05794bc
to
6c8d1c3
Compare
FYI, added a few notes about implementation. |
GCE e2e test build/test passed for commit 6c8d1c3ff400b7cb6ad9b169de5bba42acea9c50. |
// ContainerMetrics is a k8s wrapper around cAdvisor metrics | ||
type ContainerMetrics struct { | ||
ObjectMeta, TypeMeta | ||
Info cadvisorv2.ContainerInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer all type information to be in Kubernetes & provide a converter from cadvisor API to Kubernetes API, even if that is a 1-1 field mapping. More independent of implementation & can be versioned separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cadvisor APIs are also versioned right. What do we get by adding one more layer of versioning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting that the API is currently completely self contained. I don't have a strong opinion here, but I bet someone else does :)
I think the important pieces are that everything is documented together in the same place. If the cadvisor API has separate documentation, I think that's a pretty strong argument for cloning the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO cadvisor is an implementation detail. Great that the cadvisor API is versioned, but I think the metrics API needs to be fully self-contained & versioned independently. Also using the cadvisor API directly would leak cadvisor details through to the unversioned representations as per the Kubernetes API conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sold. We'll mirror the cadvisor v2 APIs in the versioned metrics API, and also add ObjectMeta & TypeMeta to each object.
d716bf7
to
a8edd1f
Compare
GCE e2e build/test failed for commit d716bf7d0c167e4db1a72088d9061215e9d8e25b. |
GCE e2e test build/test passed for commit a8edd1f22285c08fae7af0c1308507296546f336. |
a8edd1f
to
064f356
Compare
Ok, merged this proposal into the existing DerivedMetrics proposal. I updated that proposal to be consistent with everything proposed here. Comment away! |
GCE e2e build/test failed for commit 064f35657c6cf02f32831bbe67193a88653518a7. |
👍 Nice work @timstclair! |
@mwielgus - What were the heapster requirements we discussed yesterday? I remember:
Was there anything else? |
@timstclair cc: @piosz |
- `/nodes/localhost` - When served by kubelet, the only node provided is | ||
`localhost`; type metrics.Node | ||
- `/nodes/{node}` - metrics for a specific node | ||
- `/derivedNodes` - host metrics; type `[]metrics.DerivedNode` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document here the difference between /nodes and /derivedNodes? Same for /pods?
From the document, i cannot see any difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
064f356
to
3f367bb
Compare
- `end` - end time to return metrics to; type json encoded `time.Time` | ||
- `step` - the time step between each stats sample; type int (seconds), default | ||
10s, must be a multiple of 10s | ||
- `count` - maximum number of stats to return in each ContainerMetrics instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3f367bb
to
e085448
Compare
GCE e2e test build/test passed for commit 3f367bb7fda13da6b93256b712494b3991a8fde0. |
GCE e2e build/test failed for commit e085448688919be57369e7b6c49681da3682c359. |
GCE e2e test build/test passed for commit fa3feac0ff04c2705ee1803a555b1e0d35852618. |
- `/rawNodes/localhost` - The only node provided is `localhost`; type | ||
metrics.Node | ||
- `/derivedNodes` - host metrics; type `[]metrics.DerivedNode` | ||
- `/nodes/{node}` - derived metrics for a specific node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derivedNodes
LGTM |
GCE e2e build/test failed for commit fa3feac0ff04c2705ee1803a555b1e0d35852618. |
@k8s-bot test this |
GCE e2e test build/test passed for commit fa3feac0ff04c2705ee1803a555b1e0d35852618. |
fa3feac
to
accb08c
Compare
squashed |
PR changed after LGTM, removing LGTM. |
GCE e2e test build/test passed for commit accb08c. |
@k8s-bot unit test this please |
Automatic merge from submit-queue |
Auto commit by PR queue bot
} | ||
type RawPod struct { | ||
TypeMeta | ||
ObjectMeta // Should include pod name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timstclair: I just remembered that network usage is at the pod level and volumes disk usage is also at the pod level. So in addition to container metrics, we will have pod level metrics too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we move those stats out of ContainerStats and into RawPod, add additional NetworkStats into the RawPod, or add a new Network stats type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Its either that or we can pre-define an infrastructure container and
associate all pod level stats to that container. Even in that case, volume
fs usage cannot belong to any containers.
On Thu, Oct 29, 2015 at 2:29 PM, Tim St. Clair notifications@github.com
wrote:
In docs/proposals/compute-resource-metrics-api.md
#15862 (comment)
:+## Schema
+
+Types are colocated with other API groups in/pkg/apis/metrics
, and follow api
+groups conventions there.
+
+```go
+// Raw metrics are only available through the kubelet API.
+type RawNode struct {
- TypeMeta
- ObjectMeta // Should include node name
- Machine ContainerMetrics
- SystemContainers []ContainerMetrics
+}
+type RawPod struct {- TypeMeta
- ObjectMeta // Should include pod name
Would we move those stats out of ContainerStats
https://github.com/timstclair/cadvisor/blob/master/info/v2/container.go#L105
and into RawPod, add additional NetworkStats into the RawPod, or add a new
Network stats type?—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/15862/files#r43446885.
Forked from discussion on #15691
Addresses #12483
cc/ @vishh @jimmidyson @dchen1107 @bgrant0607