-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metricbeat]kube-controller-manager metricset #12409
[metricbeat]kube-controller-manager metricset #12409
Conversation
metricbeat/module/kubernetes/kubecontroller/_meta/testdata/docs.plain-expected.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, but review types of metrics, at least the one that is making tests fail.
metricbeat/module/kubernetes/controllermanager/_meta/fields.yml
Outdated
Show resolved
Hide resolved
metricbeat/module/kubernetes/controllermanager/_meta/fields.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some suggestions to the documentation and reference config. Regarding documentation of fields take into account that we have the descriptions in the prometheus responses that can help.
Before merging we should check that all the fields are documented, I found some of them that seem undocumented, or documented for a previous version with paths for histograms/quantiles. The testdata framework can help as it checks for documented fields.
metricbeat/metricbeat.reference.yml
Outdated
enabled: true | ||
metricsets: | ||
- controllermanager | ||
hosts: ["http://controller-manager:10252"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave it with localhost as we use to do, if deployed in master nodes in the host network it should work.
hosts: ["http://controller-manager:10252"] | |
hosts: ["http://localhost:10252"] |
enabled: true | ||
metricsets: | ||
- controllermanager | ||
hosts: ["http://controller-manager:10252"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
type: long | ||
- name: retries.count | ||
type: long | ||
description: Unfinished processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Unfinished processors | |
description: Number of retries handled by workqueue |
- name: adds.count | ||
type: long | ||
- name: depth.count | ||
type: long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Current depth of workqueue
fields: | ||
- name: longestrunning.sec | ||
type: double | ||
description: Longest running processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Longest running processors | |
description: Time the longest running processor has been running. |
description: Longest running processors | ||
- name: unfinished.sec | ||
type: double | ||
description: Unfinished processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Unfinished processors | |
description: Time in unfinished processors. Large values can indicate stuck threads. |
type: double | ||
description: Unfinished processors | ||
- name: adds.count | ||
type: long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: Number of adds handled by workqueue
- name: request.size.bytes.percentile.* | ||
type: object | ||
object_type: long | ||
description: Request size percentiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These percentiles are not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right, this is a new module, not being used yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should remove these unused fields, as they will appear in documentation and people can expect them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http.request.size.bytes.precentile
is part of the metricset
you mean you want to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, keep them if they are part of the metricset, this is what I meant with being used 🙂
I guess they are generated by:
"http_request_size_bytes": prometheus.Metric("http.request.size.bytes"),
type: long | ||
format: bytes | ||
description: Request size cumulative sum | ||
- name: request.size.bytes.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric is really called request.size.bytes
according to the mapping in the code. I wonder why this is not being checked by the fields validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is http.request.size.bytes.count
.
Can you elaborate the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, In the code I saw it as http.request.size.bytes
, without the count
:
"http_response_size_bytes": prometheus.Metric("http.response.size.bytes"),
But I guess this is an histogram 🤦♂️
ExpectedFile: "./_meta/test/metrics.controllermanager.1.14.expected", | ||
}, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not testing if fields are documented, we may consider changing it to the testdata framework. It would be to replace this with mbtest.TestDataFiles(t, "kubernetes", "controllermanager")
, create a _meta/testdata/config.yml
file and move the test and golden files to their location in _meta/testdata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, open an issue for any global improvement.
This is using those helpers and test libraries as is today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am fine with leaving this with prometheus test helpers, I was thinking that there were some undocumented fields but I was wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I was wrong about the undocumented fields, I got confused by histograms.
There are still some pending suggestions about the descriptions of the fields for the documentation, and I wouldn't add documentation for fields we don't collect, but I won't block this PR because of that. This can be fixed later if you prefer.
- name: request.size.bytes.percentile.* | ||
type: object | ||
object_type: long | ||
description: Request size percentiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should remove these unused fields, as they will appear in documentation and people can expect them.
type: long | ||
format: bytes | ||
description: Request size cumulative sum | ||
- name: request.size.bytes.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, In the code I saw it as http.request.size.bytes
, without the count
:
"http_response_size_bytes": prometheus.Metric("http.response.size.bytes"),
But I guess this is an histogram 🤦♂️
ExpectedFile: "./_meta/test/metrics.controllermanager.1.14.expected", | ||
}, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am fine with leaving this with prometheus test helpers, I was thinking that there were some undocumented fields but I was wrong.
leader_election_master_status | ||
|
||
|
||
## Setup environment for manual tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! 👍
Create kube-controller-manager metricset
Import kube-controller-manager prometheus metrics to the
kubernetes
module at metricbeat