Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-2903: add nodeExporter.collectors.systemd settings. #1892

Merged
merged 1 commit into from Jul 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 18 additions & 0 deletions Documentation/api.md
Expand Up @@ -34,6 +34,7 @@ Configuring Cluster Monitoring is optional. If the config does not exist or is e
* [NodeExporterCollectorNetClassConfig](#nodeexportercollectornetclassconfig)
* [NodeExporterCollectorNetDevConfig](#nodeexportercollectornetdevconfig)
* [NodeExporterCollectorProcessesConfig](#nodeexportercollectorprocessesconfig)
* [NodeExporterCollectorSystemdConfig](#nodeexportercollectorsystemdconfig)
* [NodeExporterCollectorTcpStatConfig](#nodeexportercollectortcpstatconfig)
* [NodeExporterConfig](#nodeexporterconfig)
* [OpenShiftStateMetricsConfig](#openshiftstatemetricsconfig)
Expand Down Expand Up @@ -238,6 +239,7 @@ The `NodeExporterCollectorConfig` resource defines settings for individual colle
| mountstats | [NodeExporterCollectorMountStatsConfig](#nodeexportercollectormountstatsconfig) | Defines the configuration of the `mountstats` collector, which collects statistics about NFS volume I/O activities. Disabled by default. |
| ksmd | [NodeExporterCollectorKSMDConfig](#nodeexportercollectorksmdconfig) | Defines the configuration of the `ksmd` collector, which collects statistics from the kernel same-page merger daemon. Disabled by default. |
| processes | [NodeExporterCollectorProcessesConfig](#nodeexportercollectorprocessesconfig) | Defines the configuration of the `processes` collector, which collects statistics from processes and threads running in the system. Disabled by default. |
| systemd | [NodeExporterCollectorSystemdConfig](#nodeexportercollectorsystemdconfig) | Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services. Disabled by default. |

[Back to TOC](#table-of-contents)

Expand Down Expand Up @@ -332,6 +334,22 @@ The `NodeExporterCollectorProcessesConfig` resource works as an on/off switch fo

[Back to TOC](#table-of-contents)

## NodeExporterCollectorSystemdConfig

#### Description
raptorsun marked this conversation as resolved.
Show resolved Hide resolved

The `NodeExporterCollectorSystemdConfig` resource works as an on/off switch for the `systemd` collector of the `node-exporter` agent. By default, the `systemd` collector is disabled. If enabled, the following metrics become available: `node_systemd_system_running`, `node_systemd_timer_last_trigger_seconds`, `node_systemd_units`, `node_systemd_version`. If the unit uses a socket, it also generates these 3 metrics: `node_systemd_socket_accepted_connections_total`, `node_systemd_socket_current_connections`, `node_systemd_socket_refused_connections_total`. You can use the `units` parameter to select the systemd units to be included by the `systemd` collector. The selected units are used to generate the `node_systemd_unit_state` metric, which shows the state of each systemd unit. However, this metric's cardinality might be high (at least 5 series per unit per node). If you enable this collector with a long list of selected units, closely monitor the `prometheus-k8s` deployment for excessive memory usage.


<em>appears in: [NodeExporterCollectorConfig](#nodeexportercollectorconfig)</em>

| Property | Type | Description |
| -------- | ---- | ----------- |
| enabled | bool | A Boolean flag that enables or disables the `systemd` collector. |
| units | []string | A list of regular expression (regex) patterns that match systemd units to be included by the `systemd` collector. By default, the list is empty, so the collector exposes no metrics for systemd units. |

[Back to TOC](#table-of-contents)

## NodeExporterCollectorTcpStatConfig

#### Description
Expand Down
1 change: 1 addition & 0 deletions Documentation/openshiftdocs/index.adoc
Expand Up @@ -54,6 +54,7 @@ The configuration file itself is always defined under the `config.yaml` key in t
* link:modules/nodeexportercollectornetclassconfig.adoc[NodeExporterCollectorNetClassConfig]
* link:modules/nodeexportercollectornetdevconfig.adoc[NodeExporterCollectorNetDevConfig]
* link:modules/nodeexportercollectorprocessesconfig.adoc[NodeExporterCollectorProcessesConfig]
* link:modules/nodeexportercollectorsystemdconfig.adoc[NodeExporterCollectorSystemdConfig]
* link:modules/nodeexportercollectortcpstatconfig.adoc[NodeExporterCollectorTcpStatConfig]
* link:modules/nodeexporterconfig.adoc[NodeExporterConfig]
* link:modules/openshiftstatemetricsconfig.adoc[OpenShiftStateMetricsConfig]
Expand Down
Expand Up @@ -34,6 +34,8 @@ Appears in: link:nodeexporterconfig.adoc[NodeExporterConfig]

|processes|link:nodeexportercollectorprocessesconfig.adoc[NodeExporterCollectorProcessesConfig]|Defines the configuration of the `processes` collector, which collects statistics from processes and threads running in the system. Disabled by default.

|systemd|link:nodeexportercollectorsystemdconfig.adoc[NodeExporterCollectorSystemdConfig]|Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services. Disabled by default.

|===

link:../index.adoc[Back to TOC]
@@ -0,0 +1,27 @@
// DO NOT EDIT THE CONTENT IN THIS FILE. It is automatically generated from the
// source code for the Cluster Monitoring Operator. Any changes made to this
// file will be overwritten when the content is re-generated. If you wish to
// make edits, read the docgen utility instructions in the source code for the
// CMO.
:_content-type: ASSEMBLY

== NodeExporterCollectorSystemdConfig

=== Description

The `NodeExporterCollectorSystemdConfig` resource works as an on/off switch for the `systemd` collector of the `node-exporter` agent. By default, the `systemd` collector is disabled. If enabled, the following metrics become available: `node_systemd_system_running`, `node_systemd_timer_last_trigger_seconds`, `node_systemd_units`, `node_systemd_version`. If the unit uses a socket, it also generates these 3 metrics: `node_systemd_socket_accepted_connections_total`, `node_systemd_socket_current_connections`, `node_systemd_socket_refused_connections_total`. You can use the `units` parameter to select the systemd units to be included by the `systemd` collector. The selected units are used to generate the `node_systemd_unit_state` metric, which shows the state of each systemd unit. However, this metric's cardinality might be high (at least 5 series per unit per node). If you enable this collector with a long list of selected units, closely monitor the `prometheus-k8s` deployment for excessive memory usage.



Appears in: link:nodeexportercollectorconfig.adoc[NodeExporterCollectorConfig]

[options="header"]
|===
| Property | Type | Description
|enabled|bool|A Boolean flag that enables or disables the `systemd` collector.

|units|[]string|A list of regular expression (regex) patterns that match systemd units to be included by the `systemd` collector. By default, the list is empty, so the collector exposes no metrics for systemd units.

|===

link:../index.adoc[Back to TOC]
3 changes: 3 additions & 0 deletions assets/node-exporter/daemonset.yaml
Expand Up @@ -54,6 +54,9 @@ spec:
fi
echo "ts=$(date --iso-8601=seconds) num_cpus=$NUM_CPUS gomaxprocs=$GOMAXPROCS"
exec /bin/node_exporter "$0" "$@"
env:
- name: DBUS_SYSTEM_BUS_ADDRESS
value: unix:path=/host/root/var/run/dbus/system_bus_socket
image: quay.io/prometheus/node-exporter:v1.6.0
name: node-exporter
resources:
Expand Down
7 changes: 7 additions & 0 deletions jsonnet/components/node-exporter.libsonnet
Expand Up @@ -281,6 +281,13 @@ function(params)
// node-exporter has issue in rolling out with security context
// changes in kube-prometheus hence overidding the changes
securityContext: {},
env: [
{
// This is required for the systemd collector to connect to the host's dbus socket.
name: 'DBUS_SYSTEM_BUS_ADDRESS',
value: 'unix:path=/host/root/var/run/dbus/system_bus_socket',
},
],
},
super.containers,
),
Expand Down
3 changes: 3 additions & 0 deletions pkg/manifests/config.go
Expand Up @@ -237,6 +237,9 @@ func defaultClusterMonitoringConfiguration() ClusterMonitoringConfiguration {
Enabled: true,
UseNetlink: true,
},
Systemd: NodeExporterCollectorSystemdConfig{
Enabled: false,
},
},
},
}
Expand Down
19 changes: 19 additions & 0 deletions pkg/manifests/manifests.go
Expand Up @@ -907,10 +907,29 @@ func (f *Factory) updateNodeExporterArgs(args []string) ([]string, error) {
args = setArg(args, "--no-collector.processes", "")
}

if f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Enabled {
args = setArg(args, "--collector.systemd", "")

pattern, err := regexListToArg(f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Units)
if err != nil {
return nil, fmt.Errorf("systemd unit pattern valiation error: %s", err)
}
args = setArg(args, "--collector.systemd.unit-include=", pattern)
} else {
args = setArg(args, "--no-collector.systemd", "")
}

return args, nil
}

// concatenate all patterns into a single regexp using OR
func regexListToArg(list []string) (string, error) {
for _, pattern := range list {
_, err := regexp.Compile(pattern)
if err != nil {
return "", fmt.Errorf("invalid regexp pattern: %s", pattern)
}
}
r := "^(" + strings.Join(list, "|") + ")$"
_, err := regexp.Compile(r)
return r, err
Expand Down
59 changes: 59 additions & 0 deletions pkg/manifests/manifests_test.go
Expand Up @@ -2924,6 +2924,7 @@ func TestNodeExporterCollectorSettings(t *testing.T) {
"--no-collector.processes",
"--collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$",
"--collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$",
"--no-collector.systemd",
},
argsAbsent: []string{"--collector.cpufreq",
"--collector.tcpstat",
Expand All @@ -2932,6 +2933,7 @@ func TestNodeExporterCollectorSettings(t *testing.T) {
"--collector.buddyinfo",
"--collector.ksmd",
"--collector.processes",
"--collector.systemd",
},
},
{
Expand Down Expand Up @@ -3063,6 +3065,33 @@ nodeExporter:
},
argsAbsent: []string{"--no-collector.netclass", "--no-collector.netdev"},
},
{
name: "enable systemd collector without units",
config: `
nodeExporter:
collectors:
systemd:
enabled: true
`,
argsPresent: []string{"--collector.systemd",
"--collector.systemd.unit-include=^()$"},
argsAbsent: []string{"--no-collector.systemd"},
},
{
name: "enable systemd collector with units",
config: `
nodeExporter:
collectors:
systemd:
enabled: true
units:
- network.+
- nss.+
`,
argsPresent: []string{"--collector.systemd",
"--collector.systemd.unit-include=^(network.+|nss.+)$"},
argsAbsent: []string{"--no-collector.systemd"},
},
}

for _, test := range tests {
Expand Down Expand Up @@ -3105,6 +3134,36 @@ nodeExporter:

}

func TestNodeExporterSystemdUnits(t *testing.T) {

testName := "enable systemd collector with invalid units parttern"
config := `
nodeExporter:
collectors:
systemd:
enabled: true
units:
- network.+
- /\
`
t.Run(testName, func(st *testing.T) {
c, err := NewConfigFromString(config, false)
if err != nil {
t.Fatal(err)
}
c.SetImages(map[string]string{
"node-exporter": "docker.io/openshift/origin-prometheus-node-exporter:latest",
"kube-rbac-proxy": "docker.io/openshift/origin-kube-rbac-proxy:latest",
})

f := NewFactory("openshift-monitoring", "openshift-user-workload-monitoring", c, defaultInfrastructureReader(), &fakeProxyReader{}, NewAssets(assetsPath), &APIServerConfig{}, &configv1.Console{})
_, err = f.NodeExporterDaemonSet()
if err == nil || !strings.Contains(err.Error(), "systemd unit pattern valiation error:") {
t.Fatalf(`expected error "systemd unit pattern valiation error:.*", got %v`, err)
}
})
}

func TestNodeExporterGeneralSettings(t *testing.T) {

tests := []struct {
Expand Down
27 changes: 27 additions & 0 deletions pkg/manifests/types.go
Expand Up @@ -317,6 +317,9 @@ type NodeExporterCollectorConfig struct {
// Defines the configuration of the `processes` collector, which collects statistics from processes and threads running in the system.
// Disabled by default.
Processes NodeExporterCollectorProcessesConfig `json:"processes,omitempty"`
// Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services.
// Disabled by default.
Systemd NodeExporterCollectorSystemdConfig `json:"systemd,omitempty"`
}

// The `NodeExporterCollectorCpufreqConfig` resource works as an on/off switch for
Expand Down Expand Up @@ -450,6 +453,30 @@ type NodeExporterCollectorProcessesConfig struct {
Enabled bool `json:"enabled,omitempty"`
}

// The `NodeExporterCollectorSystemdConfig` resource works as an on/off switch for
// the `systemd` collector of the `node-exporter` agent.
// By default, the `systemd` collector is disabled.
// If enabled, the following metrics become available:
// `node_systemd_system_running`,
// `node_systemd_timer_last_trigger_seconds`,
// `node_systemd_units`,
// `node_systemd_version`.
// If the unit uses a socket, it also generates these 3 metrics:
// `node_systemd_socket_accepted_connections_total`,
// `node_systemd_socket_current_connections`,
// `node_systemd_socket_refused_connections_total`.
// You can use the `units` parameter to select the systemd units to be included by the `systemd` collector.
// The selected units are used to generate the `node_systemd_unit_state` metric, which shows the state of each systemd unit.
// However, this metric's cardinality might be high (at least 5 series per unit per node).
// If you enable this collector with a long list of selected units, closely monitor the `prometheus-k8s` deployment for excessive memory usage.
type NodeExporterCollectorSystemdConfig struct {
// A Boolean flag that enables or disables the `systemd` collector.
Enabled bool `json:"enabled,omitempty"`
// A list of regular expression (regex) patterns that match systemd units to be included by the `systemd` collector.
// By default, the list is empty, so the collector exposes no metrics for systemd units.
Units []string `json:"units,omitempty"`
}

// The `UserWorkloadConfiguration` resource defines the settings
// responsible for user-defined projects in the
// `user-workload-monitoring-config` config map in the
Expand Down
85 changes: 85 additions & 0 deletions test/e2e/node_exporter_test.go
Expand Up @@ -75,6 +75,14 @@ nodeExporter:
nodeExporter:
collectors:
processes:
enabled: true`,
},
{
nameCollector: "systemd",
config: `
nodeExporter:
collectors:
systemd:
enabled: true`,
},
}
Expand Down Expand Up @@ -322,3 +330,80 @@ nodeExporter:
})
}
}

func TestNodeExporterSystemdUnits(t *testing.T) {
t.Cleanup(func() {
f.MustDeleteConfigMap(t, f.BuildCMOConfigMap(t, ""))
})
configNoUnits := `
nodeExporter:
collectors:
systemd:
enabled: true
`

t.Run("default without units", func(st *testing.T) {
f.MustCreateOrUpdateConfigMap(t, f.BuildCMOConfigMap(t, configNoUnits))

// Systemd collector should be enabled.
f.PrometheusK8sClient.WaitForQueryReturn(
t, 5*time.Minute, `min(node_scrape_collector_success{collector="systemd"})`,
func(v float64) error {
if v != 1 {
return fmt.Errorf(`expecting min(node_scrape_collector_success{collector="systemd"}) 1 but got %v.`, v)
}
return nil
},
)

// Systemd collector should not collect unit state.
f.PrometheusK8sClient.WaitForQueryReturn(
t, 5*time.Minute, `absent(node_systemd_unit_state)`,
func(v float64) error {
if v != 1 {
return fmt.Errorf(`expecting absent(node_systemd_unit_state) = 1 but got %v.`, v)
}
return nil
},
)

})
configWithUnits := `
nodeExporter:
collectors:
systemd:
enabled: true
units:
- network.+
- nss.+
`

t.Run("enabled with units", func(st *testing.T) {
f.MustCreateOrUpdateConfigMap(t, f.BuildCMOConfigMap(t, configWithUnits))

// Systemd collector should be enabled.
f.PrometheusK8sClient.WaitForQueryReturn(
t, 5*time.Minute, `min(node_scrape_collector_success{collector="systemd"})`,
func(v float64) error {
if v != 1 {
return fmt.Errorf(`expecting min(node_scrape_collector_success{collector="systemd"}) 1 but got %v.`, v)
}
return nil
},
)

// Systemd collector should collect unit state.
// One node_systemd_unit_state metric should be 1 while the rest should be 0 for each unit.
f.PrometheusK8sClient.WaitForQueryReturn(
t, 5*time.Minute, `max(node_systemd_unit_state)`,
func(v float64) error {
if v != 1 {
return fmt.Errorf(`expecting max(node_systemd_unit_state) = 1 but got %v.`, v)
}
return nil
},
)

})

}