New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MON-2903: add nodeExporter.collectors.systemd settings. #1892
MON-2903: add nodeExporter.collectors.systemd settings. #1892
Conversation
@raptorsun: This pull request references MON-2903 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
6e8e2f9
to
315a4ba
Compare
/hold |
@raptorsun: This pull request references MON-2903 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
315a4ba
to
332d5ed
Compare
/retest |
332d5ed
to
a161f24
Compare
/retest-required |
a161f24
to
8639c4a
Compare
when activating systemd collector 8 new metrics are collected:
The metric |
My take is that enabling the systemd collector should only expose metrics which have a fixed cardinality: By default CMO should set
|
@simonpasquier, I guess Haoyu is not back yet, is this PR ready for testing? |
test PR with cluster-bot, new features implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case you've missed, I had a comment about metrics cardinality and how to limit it.
pkg/manifests/types.go
Outdated
// Among them, the `node_systemd_unit_state` metric is the most useful show the state of each systemd unit. So its cardinality cound be high. | ||
// If you enable this collector, watch the prometheus-k8s deployment closely for excessive memory usage. | ||
type NodeExporterCollectorSystemdConfig struct { | ||
// A Boolean flag that enables or disables the `systemd` colletor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A Boolean flag that enables or disables the `systemd` colletor. | |
// A Boolean flag that enables or disables the `systemd` collector. |
Thanks for reminding me of limiting cardinality, I will later push a new version with these filters. |
pkg/manifests/types.go
Outdated
@@ -297,6 +297,9 @@ type NodeExporterCollectorConfig struct { | |||
// Defines the configuration of the `buddyinfo` collector, which collects statistics about memory fragmentation from the `node_buddyinfo_blocks` metric. This metric collects data from `/proc/buddyinfo`. | |||
// Disabled by default. | |||
BuddyInfo NodeExporterCollectorBuddyInfoConfig `json:"buddyinfo,omitempty"` | |||
// Defines the configuration of the `systemd` collector, which collects statistics on systemd daemon and its managed services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Defines the configuration of the `systemd` collector, which collects statistics on systemd daemon and its managed services. | |
// Defines the configuration of the `systemd` collector, which collects statistics on the `systemd` daemon and its managed services. |
pkg/manifests/types.go
Outdated
// `node_systemd_unit_state`, | ||
// `node_systemd_units`, | ||
// `node_systemd_version`. | ||
// Among them, the `node_systemd_unit_state` metric is the most useful show the state of each systemd unit. So its cardinality cound be high. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Among them, the `node_systemd_unit_state` metric is the most useful show the state of each systemd unit. So its cardinality cound be high. | |
// Of these metrics, the `node_systemd_unit_state` metric is the most useful because it shows the state of each `systemd` unit. However, note that the cardinality for this metric might be high. |
pkg/manifests/types.go
Outdated
// `node_systemd_units`, | ||
// `node_systemd_version`. | ||
// Among them, the `node_systemd_unit_state` metric is the most useful show the state of each systemd unit. So its cardinality cound be high. | ||
// If you enable this collector, watch the prometheus-k8s deployment closely for excessive memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If you enable this collector, watch the prometheus-k8s deployment closely for excessive memory usage. | |
// If you enable this collector, closely monitor the `prometheus-k8s` deployment for excessive memory usage. |
8639c4a
to
c8b5911
Compare
/hold |
@bburt-rh @simonpasquier Thank you for the timely review :D All points has been addressed. The PR is ready for another review now. |
d0fe5b2
to
c06a027
Compare
ready to review again. |
pkg/manifests/manifests.go
Outdated
units := f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Units | ||
for idx, unit := range units { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't modify the config slice.
units := f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Units | |
for idx, unit := range units { | |
patternUnits := make([]string, len(f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Units)) | |
for i, unit := range f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Units { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after removing the addition of brackets, this will not be needed, the unit list will be read only in this function.
pkg/manifests/manifests.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("invalid regexp for systemd unit: %s", unit) | ||
} | ||
units[idx] = fmt.Sprintf("(%s)", unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the enclosing brackets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a second thought, any pattern that needs brackets to limit cannot pass the regex check just before this line. No bracket will be added.
bc71bc7
to
882073c
Compare
882073c
to
2470f9c
Compare
/label docs-approved |
2470f9c
to
cbdcbd7
Compare
/retest-required |
This lgtm but will let @simonpasquier apply the label. |
cbdcbd7
to
86515bf
Compare
86515bf
to
5cc05fa
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bburt-rh, jan--f, raptorsun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@raptorsun: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR is held for the moment, systems collector requires more settings, it will come together later in this PR.
This PR is based on top of #1876 , we should merge that before this one.