Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 23 additions & 17 deletions modules/monitoring-creating-scrape-sample-alerts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -30,38 +30,38 @@ metadata:
labels:
prometheus: k8s
role: alert-rules
name: monitoring-stack-alerts <1>
namespace: ns1 <2>
name: monitoring-stack-alerts #<1>
namespace: ns1 #<2>
spec:
groups:
- name: general.rules
rules:
- alert: TargetDown <3>
- alert: TargetDown #<3>
annotations:
message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service
}} targets in {{ $labels.namespace }} namespace are down.' <4>
}} targets in {{ $labels.namespace }} namespace are down.' #<4>
expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job,
namespace, service)) > 10
for: 10m <5>
for: 10m #<5>
labels:
severity: warning <6>
- alert: ApproachingEnforcedSamplesLimit <7>
severity: warning #<6>
- alert: ApproachingEnforcedSamplesLimit #<7>
annotations:
message: '{{ $labels.container }} container of the {{ $labels.pod }} pod in the {{ $labels.namespace }} namespace consumes {{ $value | humanizePercentage }} of the samples limit budget.' <8>
expr: scrape_samples_scraped/50000 > 0.8 <9>
for: 10m <10>
message: '{{ $labels.container }} container of the {{ $labels.pod }} pod in the {{ $labels.namespace }} namespace consumes {{ $value | humanizePercentage }} of the samples limit budget.' #<8>
expr: (scrape_samples_post_metric_relabeling / (scrape_sample_limit > 0)) > 0.9 #<9>
for: 10m #<10>
labels:
severity: warning <11>
severity: warning #<11>
----
<1> Defines the name of the alerting rule.
<2> Specifies the user-defined project where the alerting rule will be deployed.
<3> The `TargetDown` alert will fire if the target cannot be scraped or is not available for the `for` duration.
<4> The message that will be output when the `TargetDown` alert fires.
<2> Specifies the user-defined project where the alerting rule is deployed.
<3> The `TargetDown` alert fires if the target cannot be scraped and is not available for the `for` duration.
<4> The message that is displayed when the `TargetDown` alert fires.
<5> The conditions for the `TargetDown` alert must be true for this duration before the alert is fired.
<6> Defines the severity for the `TargetDown` alert.
<7> The `ApproachingEnforcedSamplesLimit` alert will fire when the defined scrape sample threshold is reached or exceeded for the specified `for` duration.
<8> The message that will be output when the `ApproachingEnforcedSamplesLimit` alert fires.
<9> The threshold for the `ApproachingEnforcedSamplesLimit` alert. In this example the alert will fire when the number of samples per target scrape has exceeded 80% of the enforced sample limit of `50000`. The `for` duration must also have passed before the alert will fire. The `<number>` in the expression `scrape_samples_scraped/<number> > <threshold>` must match the `enforcedSampleLimit` value defined in the `user-workload-monitoring-config` `ConfigMap` object.
<7> The `ApproachingEnforcedSamplesLimit` alert fires when the defined scrape sample threshold is exceeded and lasts for the specified `for` duration.
<8> The message that is displayed when the `ApproachingEnforcedSamplesLimit` alert fires.
<9> The threshold for the `ApproachingEnforcedSamplesLimit` alert. In this example, the alert fires when the number of ingested samples exceeds 90% of the configured limit.
<10> The conditions for the `ApproachingEnforcedSamplesLimit` alert must be true for this duration before the alert is fired.
<11> Defines the severity for the `ApproachingEnforcedSamplesLimit` alert.

Expand All @@ -71,3 +71,9 @@ spec:
----
$ oc apply -f monitoring-stack-alerts.yaml
----

. Additionally, you can check if a target has hit the configured limit:

.. In the *Administrator* perspective of the web console, go to *Observe* -> *Targets* and select an endpoint with a `Down` status that you want to check.
+
The *Scrape failed: sample limit exceeded* message is displayed if the endpoint failed because of an exceeded sample limit.