Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions files/prometheus_alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,13 @@
for: 10m
labels:
severity: warning

- "alert": "ElasticsearchOperatorCSVNotSuccessful"
"annotations":
"message": "Elasticsearch Operator CSV has not reconciled succesfully."
"summary": "Elasticsearch Operator CSV Not Successful"
"expr": |
csv_succeeded{name =~ "elasticsearch-operator.*"} == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this expression simply check the status of the csv to denote if it was successful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, csv_succeeded == 0 is the failed phase

"for": "10m"
"labels":
"severity": "warning"
14 changes: 14 additions & 0 deletions test/files/prometheus-unit-tests/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ tests:
- series: 'es_process_cpu_percent{cluster="elasticsearch", instance="localhost:9090", node="elasticsearch-cdm-1"}'
values: '10+10x8 95+0x100' # 10 20 30 40 50 60 70 80 90 -- 95 (100x)

- series: 'csv_succeeded{name="elasticsearch-operator.currentversion-builddate"}'
values: '0+0x10 1+0x90' # flag as unsuccessful for 10 tics and then flag as successful for the rest

# Rejected indexing requests simulation (note: this simulation also verifies all recording rules)
# Number of rejected write requests grows at constant pace for 10 minutes
# and then we repeat this patterns again. This gives us two 10m segments of the series to test on.
Expand Down Expand Up @@ -142,3 +145,14 @@ tests:
alertname: ElasticsearchProcessCPUHigh
exp_alerts:

# --------- ElasticsearchCSVNotSuccessful ---------
- eval_time: 10m
alertname: ElasticsearchOperatorCSVNotSuccessful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "ElasticSearchNotReconciled" be more descriptive from a user perspective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its descriptive enough, we also would be pointing to documentation from the message of the alert (in the future)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for what it's worth, i don't have any attachment to ElasticsearchOperatorCSVNotSuccessful as an alertname. if we want to change it, i'm happy to do so

exp_alerts:
- exp_labels:
name: elasticsearch-operator.currentversion-builddate
severity: warning
exp_annotations:
summary: "Elasticsearch Operator CSV Not Successful"
message: "Elasticsearch Operator CSV has not reconciled succesfully."