Skip to content

Commit

Permalink
add nodeExporter.collectors.systemd settings.
Browse files Browse the repository at this point in the history
  • Loading branch information
raptorsun committed Jun 23, 2023
1 parent b499886 commit d0fe5b2
Show file tree
Hide file tree
Showing 11 changed files with 263 additions and 3 deletions.
18 changes: 18 additions & 0 deletions Documentation/api.md
Expand Up @@ -33,6 +33,7 @@ Configuring Cluster Monitoring is optional. If the config does not exist or is e
* [NodeExporterCollectorMountStatsConfig](#nodeexportercollectormountstatsconfig)
* [NodeExporterCollectorNetClassConfig](#nodeexportercollectornetclassconfig)
* [NodeExporterCollectorNetDevConfig](#nodeexportercollectornetdevconfig)
* [NodeExporterCollectorSystemdConfig](#nodeexportercollectorsystemdconfig)
* [NodeExporterCollectorTcpStatConfig](#nodeexportercollectortcpstatconfig)
* [NodeExporterConfig](#nodeexporterconfig)
* [OpenShiftStateMetricsConfig](#openshiftstatemetricsconfig)
Expand Down Expand Up @@ -236,6 +237,7 @@ The `NodeExporterCollectorConfig` resource defines settings for individual colle
| buddyinfo | [NodeExporterCollectorBuddyInfoConfig](#nodeexportercollectorbuddyinfoconfig) | Defines the configuration of the `buddyinfo` collector, which collects statistics about memory fragmentation from the `node_buddyinfo_blocks` metric. This metric collects data from `/proc/buddyinfo`. Disabled by default. |
| mountstats | [NodeExporterCollectorMountStatsConfig](#nodeexportercollectormountstatsconfig) | Defines the configuration of the `mountstats` collector, which collects statistics about NFS volume I/O activities. Disabled by default. |
| ksmd | [NodeExporterCollectorKSMDConfig](#nodeexportercollectorksmdconfig) | Defines the configuration of the `ksmd` collector, which collects statistics from the kernel same-page merger daemon. Disabled by default. |
| systemd | [NodeExporterCollectorSystemdConfig](#nodeexportercollectorsystemdconfig) | Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services. Disabled by default. |

[Back to TOC](#table-of-contents)

Expand Down Expand Up @@ -315,6 +317,22 @@ The `NodeExporterCollectorNetDevConfig` resource works as an on/off switch for t

[Back to TOC](#table-of-contents)

## NodeExporterCollectorSystemdConfig

#### Description

The `NodeExporterCollectorSystemdConfig` resource works as an on/off switch for the `systemd` collector of the `node-exporter` agent. By default, the `systemd` collector is disabled. If enabled, the following metrics become available: `node_systemd_system_running`, `node_systemd_timer_last_trigger_seconds`, `node_systemd_units`, `node_systemd_version`. You can also use the `units` parameter to select the systemd units to be included by the `systemd` collector. However, this metric's cardinality might be high (at least 5 series per unit per node). The selected units are used to generate the `node_systemd_unit_state` metric, which shows the state of each systemd unit. If the unit uses a socket, it also generates these 3 metrics: `node_systemd_socket_accepted_connections_total`, `node_systemd_socket_current_connections`, `node_systemd_socket_refused_connections_total`. If you enable this collector with a long list of selected units, closely monitor the `prometheus-k8s` deployment for excessive memory usage.


<em>appears in: [NodeExporterCollectorConfig](#nodeexportercollectorconfig)</em>

| Property | Type | Description |
| -------- | ---- | ----------- |
| enabled | bool | A Boolean flag that enables or disables the `systemd` collector. |
| units | []string | A list of regular expression (regex) patterns that match systemd units to be included by the `systemd` collector. By default, the collector exposes no metrics for systemd units. |

[Back to TOC](#table-of-contents)

## NodeExporterCollectorTcpStatConfig

#### Description
Expand Down
1 change: 1 addition & 0 deletions Documentation/openshiftdocs/index.adoc
Expand Up @@ -53,6 +53,7 @@ The configuration file itself is always defined under the `config.yaml` key in t
* link:modules/nodeexportercollectormountstatsconfig.adoc[NodeExporterCollectorMountStatsConfig]
* link:modules/nodeexportercollectornetclassconfig.adoc[NodeExporterCollectorNetClassConfig]
* link:modules/nodeexportercollectornetdevconfig.adoc[NodeExporterCollectorNetDevConfig]
* link:modules/nodeexportercollectorsystemdconfig.adoc[NodeExporterCollectorSystemdConfig]
* link:modules/nodeexportercollectortcpstatconfig.adoc[NodeExporterCollectorTcpStatConfig]
* link:modules/nodeexporterconfig.adoc[NodeExporterConfig]
* link:modules/openshiftstatemetricsconfig.adoc[OpenShiftStateMetricsConfig]
Expand Down
Expand Up @@ -32,6 +32,8 @@ Appears in: link:nodeexporterconfig.adoc[NodeExporterConfig]

|ksmd|link:nodeexportercollectorksmdconfig.adoc[NodeExporterCollectorKSMDConfig]|Defines the configuration of the `ksmd` collector, which collects statistics from the kernel same-page merger daemon. Disabled by default.

|systemd|link:nodeexportercollectorsystemdconfig.adoc[NodeExporterCollectorSystemdConfig]|Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services. Disabled by default.

|===

link:../index.adoc[Back to TOC]
@@ -0,0 +1,27 @@
// DO NOT EDIT THE CONTENT IN THIS FILE. It is automatically generated from the
// source code for the Cluster Monitoring Operator. Any changes made to this
// file will be overwritten when the content is re-generated. If you wish to
// make edits, read the docgen utility instructions in the source code for the
// CMO.
:_content-type: ASSEMBLY

== NodeExporterCollectorSystemdConfig

=== Description

The `NodeExporterCollectorSystemdConfig` resource works as an on/off switch for the `systemd` collector of the `node-exporter` agent. By default, the `systemd` collector is disabled. If enabled, the following metrics become available: `node_systemd_system_running`, `node_systemd_timer_last_trigger_seconds`, `node_systemd_units`, `node_systemd_version`. You can also use the `units` parameter to select the systemd units to be included by the `systemd` collector. However, this metric's cardinality might be high (at least 5 series per unit per node). The selected units are used to generate the `node_systemd_unit_state` metric, which shows the state of each systemd unit. If the unit uses a socket, it also generates these 3 metrics: `node_systemd_socket_accepted_connections_total`, `node_systemd_socket_current_connections`, `node_systemd_socket_refused_connections_total`. If you enable this collector with a long list of selected units, closely monitor the `prometheus-k8s` deployment for excessive memory usage.



Appears in: link:nodeexportercollectorconfig.adoc[NodeExporterCollectorConfig]

[options="header"]
|===
| Property | Type | Description
|enabled|bool|A Boolean flag that enables or disables the `systemd` collector.

|units|[]string|A list of regular expression (regex) patterns that match systemd units to be included by the `systemd` collector. By default, the collector exposes no metrics for systemd units.

|===

link:../index.adoc[Back to TOC]
3 changes: 3 additions & 0 deletions assets/node-exporter/daemonset.yaml
Expand Up @@ -41,6 +41,9 @@ spec:
- --collector.cpu.info
- --collector.textfile.directory=/var/node_exporter/textfile
- --no-collector.btrfs
env:
- name: DBUS_SYSTEM_BUS_ADDRESS
value: unix:path=/host/root/var/run/dbus/system_bus_socket
image: quay.io/prometheus/node-exporter:v1.6.0
name: node-exporter
resources:
Expand Down
7 changes: 7 additions & 0 deletions jsonnet/components/node-exporter.libsonnet
Expand Up @@ -266,6 +266,13 @@ function(params)
// node-exporter has issue in rolling out with security context
// changes in kube-prometheus hence overidding the changes
securityContext: {},
env: [
{
// This is required for the systemd collector to connect to the host's dbus socket.
name: 'DBUS_SYSTEM_BUS_ADDRESS',
value: 'unix:path=/host/root/var/run/dbus/system_bus_socket',
},
],
},
super.containers,
),
Expand Down
4 changes: 4 additions & 0 deletions pkg/manifests/config.go
Expand Up @@ -237,6 +237,10 @@ func defaultClusterMonitoringConfiguration() ClusterMonitoringConfiguration {
Enabled: true,
UseNetlink: false,
},
Systemd: NodeExporterCollectorSystemdConfig{
Enabled: false,
Units: []string{"^$"},
},
},
},
}
Expand Down
26 changes: 23 additions & 3 deletions pkg/manifests/manifests.go
Expand Up @@ -26,6 +26,7 @@ import (
"net"
"net/url"
"path/filepath"
"regexp"
"strconv"
"strings"
"time"
Expand Down Expand Up @@ -836,7 +837,7 @@ func (f *Factory) NodeExporterServiceMonitor() (*monv1.ServiceMonitor, error) {
return f.NewServiceMonitor(f.assets.MustNewAssetReader(NodeExporterServiceMonitor))
}

func (f *Factory) updateNodeExporterArgs(args []string) []string {
func (f *Factory) updateNodeExporterArgs(args []string) ([]string, error) {
args = setArg(args, fmt.Sprintf("--runtime.gomaxprocs=%d", f.config.ClusterMonitoringConfiguration.NodeExporterConfig.MaxProcs), "")
if f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.CpuFreq.Enabled {
args = setArg(args, "--collector.cpufreq", "")
Expand Down Expand Up @@ -883,7 +884,23 @@ func (f *Factory) updateNodeExporterArgs(args []string) []string {
args = setArg(args, "--no-collector.ksmd", "")
}

return args
if f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Enabled {
args = setArg(args, "--collector.systemd", "")
units := f.config.ClusterMonitoringConfiguration.NodeExporterConfig.Collectors.Systemd.Units
for idx, unit := range units {
_, err := regexp.Compile(unit)
if err != nil {
return nil, fmt.Errorf("invalid regexp for systemd unit: %s", unit)
}
units[idx] = fmt.Sprintf("(%s)", unit)
}
patternUnits := strings.Join(units, "|")
args = setArg(args, "--collector.systemd.unit-include=", patternUnits)
} else {
args = setArg(args, "--no-collector.systemd", "")
}

return args, nil
}

func (f *Factory) NodeExporterMinimalServiceMonitor() (*monv1.ServiceMonitor, error) {
Expand All @@ -900,7 +917,10 @@ func (f *Factory) NodeExporterDaemonSet() (*appsv1.DaemonSet, error) {
switch container.Name {
case "node-exporter":
ds.Spec.Template.Spec.Containers[i].Image = f.config.Images.NodeExporter
ds.Spec.Template.Spec.Containers[i].Args = f.updateNodeExporterArgs(ds.Spec.Template.Spec.Containers[i].Args)
ds.Spec.Template.Spec.Containers[i].Args, err = f.updateNodeExporterArgs(ds.Spec.Template.Spec.Containers[i].Args)
if err != nil {
return nil, err
}
case "kube-rbac-proxy":
ds.Spec.Template.Spec.Containers[i].Image = f.config.Images.KubeRbacProxy
ds.Spec.Template.Spec.Containers[i].Args = f.setTLSSecurityConfiguration(container.Args, KubeRbacProxyTLSCipherSuitesFlag, KubeRbacProxyMinTLSVersionFlag)
Expand Down
59 changes: 59 additions & 0 deletions pkg/manifests/manifests_test.go
Expand Up @@ -2920,6 +2920,7 @@ func TestNodeExporterCollectorSettings(t *testing.T) {
"--collector.netclass",
"--no-collector.buddyinfo",
"--no-collector.ksmd",
"--no-collector.systemd",
},
argsAbsent: []string{"--collector.cpufreq",
"--collector.tcpstat",
Expand All @@ -2928,6 +2929,7 @@ func TestNodeExporterCollectorSettings(t *testing.T) {
"--collector.netclass.netlink",
"--collector.buddyinfo",
"--collector.ksmd",
"--collector.systemd",
},
},
{
Expand Down Expand Up @@ -3020,6 +3022,33 @@ nodeExporter:
argsPresent: []string{"--collector.ksmd"},
argsAbsent: []string{"--no-collector.ksmd"},
},
{
name: "enable systemd collector without units",
config: `
nodeExporter:
collectors:
systemd:
enabled: true
`,
argsPresent: []string{"--collector.systemd",
"--collector.systemd.unit-include=(^$)"},
argsAbsent: []string{"--no-collector.systemd"},
},
{
name: "enable systemd collector with units",
config: `
nodeExporter:
collectors:
systemd:
enabled: true
units:
- network.+
- nss.+
`,
argsPresent: []string{"--collector.systemd",
"--collector.systemd.unit-include=(network.+)|(nss.+)"},
argsAbsent: []string{"--no-collector.systemd"},
},
}

for _, test := range tests {
Expand Down Expand Up @@ -3062,6 +3091,36 @@ nodeExporter:

}

func TestNodeExporterSystemdUnits(t *testing.T) {

testName := "enable systemd collector with invalid units parttern"
config := `
nodeExporter:
collectors:
systemd:
enabled: true
units:
- network.+
- /\
`
t.Run(testName, func(st *testing.T) {
c, err := NewConfigFromString(config, false)
if err != nil {
t.Fatal(err)
}
c.SetImages(map[string]string{
"node-exporter": "docker.io/openshift/origin-prometheus-node-exporter:latest",
"kube-rbac-proxy": "docker.io/openshift/origin-kube-rbac-proxy:latest",
})

f := NewFactory("openshift-monitoring", "openshift-user-workload-monitoring", c, defaultInfrastructureReader(), &fakeProxyReader{}, NewAssets(assetsPath), &APIServerConfig{}, &configv1.Console{})
_, err = f.NodeExporterDaemonSet()
if err == nil || !strings.Contains(err.Error(), "invalid regexp for systemd unit") {
t.Fatalf(`expected error "invalid regexp for systemd unit:.*", got %v`, err)
}
})
}

func TestNodeExporterGeneralSettings(t *testing.T) {

tests := []struct {
Expand Down
27 changes: 27 additions & 0 deletions pkg/manifests/types.go
Expand Up @@ -309,6 +309,9 @@ type NodeExporterCollectorConfig struct {
// Defines the configuration of the `ksmd` collector, which collects statistics from the kernel same-page merger daemon.
// Disabled by default.
Ksmd NodeExporterCollectorKSMDConfig `json:"ksmd,omitempty"`
// Defines the configuration of the `systemd` collector, which collects statistics on the systemd daemon and its managed services.
// Disabled by default.
Systemd NodeExporterCollectorSystemdConfig `json:"systemd,omitempty"`
}

// The `NodeExporterCollectorCpufreqConfig` resource works as an on/off switch for
Expand Down Expand Up @@ -424,6 +427,30 @@ type NodeExporterCollectorKSMDConfig struct {
Enabled bool `json:"enabled,omitempty"`
}

// The `NodeExporterCollectorSystemdConfig` resource works as an on/off switch for
// the `systemd` collector of the `node-exporter` agent.
// By default, the `systemd` collector is disabled.
// If enabled, the following metrics become available:
// `node_systemd_system_running`,
// `node_systemd_timer_last_trigger_seconds`,
// `node_systemd_units`,
// `node_systemd_version`.
// You can also use the `units` parameter to select the systemd units to be included by the `systemd` collector.
// However, this metric's cardinality might be high (at least 5 series per unit per node).
// The selected units are used to generate the `node_systemd_unit_state` metric, which shows the state of each systemd unit.
// If the unit uses a socket, it also generates these 3 metrics:
// `node_systemd_socket_accepted_connections_total`,
// `node_systemd_socket_current_connections`,
// `node_systemd_socket_refused_connections_total`.
// If you enable this collector with a long list of selected units, closely monitor the `prometheus-k8s` deployment for excessive memory usage.
type NodeExporterCollectorSystemdConfig struct {
// A Boolean flag that enables or disables the `systemd` collector.
Enabled bool `json:"enabled,omitempty"`
// A list of regular expression (regex) patterns that match systemd units to be included by the `systemd` collector.
// By default, the collector exposes no metrics for systemd units.
Units []string `json:"units,omitempty"`
}

// The `UserWorkloadConfiguration` resource defines the settings
// responsible for user-defined projects in the
// `user-workload-monitoring-config` config map in the
Expand Down

0 comments on commit d0fe5b2

Please sign in to comment.