cpv
is a command line tool for working with collection profiles.
cpv
expects the following set of flags.
Usage of ./cpv:
-address string
Address of the Prometheus instance. (default "http://localhost:9090")
-allow-list-file string
Path to a file containing a list of allow-listed metrics that will always be included within the extracted metrics set. Requires -profile flag to be set.
-bearer-token string
Bearer token for authentication.
-kubeconfig string
Path to kubeconfig file. Defaults to $KUBECONFIG.
-noisy
Enable noisy assumptions: interpret the absence of the collection profiles label as the default 'full' profile (when using the -status flag).
-output-cardinality
Output cardinality of all extracted metrics to a file.
-profile string
Collection profile that the command is being run for.
-quiet
Suppress all output, and use $EDITOR for generated manifests.
-rule-file string
Path to a valid rule file to extract metrics from, for eg., https://github.com/prometheus/prometheus/blob/v0.45.0/model/rulefmt/testdata/test.yaml. Requires -profile flag to be set.
-status
Report collection profiles' implementation status. -profile may be empty to report status for all profiles.
-target-selectors string
Target selectors used to extract metrics, for eg., https://github.com/prometheus/client_golang/blob/644c80d1360fb1409a3fe8dfc5bad4228f282f3b/api/prometheus/v1/api_test.go#L1007. Requires -profile flag to be set.
-validate
Validate the collection profile implementation. Requires -profile flag to be set.
-version
Print version information.
While the utility can be used with the various aforementioned flag combinations to fulfill the desired use-case, the following ones may comparatively be more prominent within the general workflow and thus, have been documented in order to get the developers up-and-running with in no time.
The utility can be used to extract metrics based a set of given parameters that include:
-allow-list-file
: Path to a file containing a list of metrics that will always be included within the extracted metrics set, even if they are not present in the Prometheus instance forwarded at-address
.-rule-file
: Path to a file containing a set ofRuleGroup
s. All metrics used to defineexpr
essions within therules
will be extracted. For example,model/rulefmt/testdata/test.yaml
will result in the extraction of two metrics:errors_total
andrequests_total
.-target-selectors
: A set of constraints (resemblingVectorSelector
s) satisfying thematchTarget
parameter inTargetsMetadata
. For example."{job=\"prometheus\", severity=\"critical\"}"
will result in the extraction of all metrics present in the Prometheus instance forwarded at-address
, that have thejob
label set toprometheus
and theseverity
label set tocritical
.
All these flags are mutually exclusive and require the -profile
flag to be set. Once extracted, the metrics are used to generate a RelabelConfig
that can be dropped into the ServiceMonitor
or PodMonitor
resource.
$ ./cpv -profile="$PROFILE" -rule-file="$RULE_FILE" -target-selectors="$TARGET_SELECTORS" -allow-list-file="$ALLOW_LIST_FILE"
sourcelabels:
- __name__
separator: ""
targetlabel: ""
regex: (foo|bar|...)
modulus: 0
replacement: ""
action: keep
Additionally, -output-cardinality
may be specified to output the cardinality of all extracted metrics to a file, in order to better assess decisions around keeping or dropping certain metrics within the ServiceMonitor
or PodMonitor
resource(s) for a particular profile.
METRIC CARDINALITY
foo 40
bar 10
...
The utility can be used to evaluate the extent to which a collection profile has been implemented for every default ServiceMonitor
or PodMonitor
resource that has opted-in to Collection Profiles feature. For example, with respect to the default
Kube State Metrics ServiceMonitor
(notice the explicit opt-in label), the utility, seeing that this has opted-in to the Collection Profiles feature, will check for the presence of all corresponding SupportedNonDefaultCollectionProfiles
for that ServiceMonitor
and report the status for each of them (whether they exist or not).
For all profiles to be "fully implemented" (i.e., when -status
is used without specifying a particular -profile=$PROFILE
) all of the default opted-in ServiceMonitor
or PodMonitor
resources (i.e., with monitoring.openshift.io/collection-profile
label set to full
) must have the same corresponding resources for every such profile. Here, "corresponding resources" mean the ServiceMonitor
or PodMonitor
resources that have their metadata.name
same as their default opted-in ServiceMonitor
or PodMonitor
resource counterpart appended by the profile they fulfill, and with the monitoring.openshift.io/collection-profile
label set to the profile being checked for.
So, for example, for an opted-in default ServiceMonitor
resource with metadata.name
as kube-state-metrics
and monitoring.openshift.io/collection-profile: full
present within its label set, the corresponding ServiceMonitor
resources for the, say, minimal
profile would be kube-state-metrics-minimal
. The utility will check for the presence of all corresponding resources for every profile with the default resources' metadata.name
as the base and report the status for each of them.
$ ./cpv -profile="$PROFILE" -status
PROFILE SERVICE MONITOR POD MONITOR ERROR
$PROFILE foo-monitor not implemented
$PROFILE bar-monitor not implemented
...
Additionally, a -noisy
flag may be specified to interpret the absence of monitoring.openshift.io/collection-profile: full
within the default ServiceMonitor
or PodMonitor
resources as the default full
profile. This is useful when the ServiceMonitor
or PodMonitor
resources have not been updated to opt-in to the Collection Profiles feature yet.
The utility can be used to validate against any discrepancies that impact the specified ServiceMonitor
or Podmonitor
resources. For this purpose, the utility expects the -profile
flag, i.e, the profile that the validation should run against, and the -validate
flag to be set. The validation works by reporting the hierarchy of any missing metrics that the specified -profile
depends on, the absence of which in turn may end up impacting the resources dependent on those metrics.
$ ./cpv -profile="$PROFILE" -validate
$PROFILE MONITOR GROUP LOCATION RULE QUERY METRIC ERROR
etcd-minimal etcd .../openshift-etcd-operator-etcd-prometheus-rules-....yaml etcdMemberCommunicationSlow histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~".*etcd.*"}[5m])) > 0.15 etcd_network_peer_round_trip_time_seconds_bucket not loaded
...