Test alert management v2 sradco by OhadRevah · Pull Request #957 · openshift/monitoring-plugin

OhadRevah · 2026-05-19T05:31:08Z

No description provided.

Signed-off-by: Gabriel Bernal <gbernal@redhat.com>

OU-356: add translations from memsource

OU-1076: monitoring-plugin E2E testing instructions to AGENTS.md

OU-1076: unit testing instructions on AGENTS.md

Signed-off-by: Gabriel Bernal <gbernal@redhat.com>

OU-949: add namespace dropdown to dashboards page

OU-356: fix specific translations and items without translations

…bustness Add CYPRESS_COO_NAMESPACE environment variable to enable flexible namespace configuration for Cluster Observability Operator installations. This allows testing with different namespace configurations (e.g., 'coo' instead of the default 'openshift-cluster-observability-operator'). Changes: - Add CYPRESS_COO_NAMESPACE env var with default value 'openshift-cluster-observability-operator' - Update cypress.config.ts to read the new environment variable - Update configure-env.sh to prompt for and export namespace configuration - Update coo_stage.sh installation script to use configurable namespace - Update all test files to use Cypress.env('COO_NAMESPACE') instead of hardcoded values - Update mock generators to use the env var for namespace - Add --ignore-not-found flag to all oc delete commands in cleanup functions - Update dashboard management to use sed for on-the-fly namespace substitution - Add documentation in README.md for the new configuration option Benefits: - Enables testing against different namespace configurations - Makes cleanup operations idempotent (won't fail if resources don't exist) - Improves flexibility for release pipeline testing

…cript - Fix oc label commands to use singular 'namespace' instead of 'namespaces' - Remove --ignore-not-found flag from cluster-admin role removal - Add test-cypress-incidents-regression npm script - Update dependencies (package-lock files)

OCPBUGS-63458: add missing conversion units

…le-namespace-and-idempotent-cleanups OBSINTA-858: configurable Cypress COO namespace and idempotent cleanup

@smoke

Add tagging structure using @cypress/grep plugin: Tag Structure (4 levels): - Basic tags: @smoke, @demo, @flaky, @xfail, @slow - High-level component tags: @monitoring, @incidents, @coo, @alerts, @metrics, @dashboards - Specific feature tags: @{component}-{label} (e.g., @incidents-redux) - JIRA tags: @JIRA-{ID} for issue tracking Changes: - Add TypeScript definitions (support/test-tags.d.ts) for type-safe tags - Migrate npm scripts from --spec (file paths) to tag-based filtering - Update README.md with tag categories and usage examples - Add PR_MIGRATION_GUIDE.txt for team reference Benefits: - No hardcoded file paths in npm scripts - Run tests by component, feature, or JIRA issue - Exclude flaky/slow/demo tests as needed

Tests that verifies that alerts are not marked as resolved based on not refreshing values or requests with stale timestamp. Updates the incidents commands for creation of an alert with fixed name per test.

…nslation OU-1093: fix: add missing translation for incidents filters

…regressions-firing-alerts OBSINTA-777: [Incidents] Regression tests for firing alerts in Incidents

…ace based on namespace parameter

OBSINTA-857: implement grep tags for selective test execution

Add Prometheus, Thanos, and Alertmanager query support with GET /api/v1/alerting/ alerts endpoint including alert component matching and alerting health status. Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>

Add GET /api/v1/alerting/rules endpoint with Prometheus rule group retrieval, list filtering, and label matching. Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>

Detect and remove orphan AlertRelabelConfig resources that no longer have a matching PrometheusRule, preventing stale relabel configs from accumulating. Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>

Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement 58 test cases across 13 phases for Alert Management API v2. Uses dedicated namespace via f.CreateNamespace instead of default. Adds bearer token support for API requests (BEARER_TOKEN env var). Switch server.go to local dev kubeconfig mode. Known issues: namespace type (platform vs user) and filter keys need fixing based on cluster test results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add bearer token support (BEARER_TOKEN env var + kubeconfig fallback) - Add UWM enablement helpers for future user-namespace tests - Fix namespace filter tests to verify rule-level labels (alert labels from vector(1) don't carry namespace in cluster-monitoring namespaces) - Use unique expressions in platform rule creation to avoid 409 conflicts - Add AlertingRule CR cleanup in Phase 13 - Skip 12 tests requiring UWM user-namespace (not available when plugin runs locally against platform Prometheus only) - Skip 6 metrics tests (metric not yet implemented) Test results: 40 pass, 0 fail, 18 skip (12 UWM + 6 metrics) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use bulk endpoints for single-rule operations (patchSingleRuleViaBulk, deleteSingleRuleViaBulk) since single-rule routes (PATCH/DELETE /rules/{ruleId}) are not yet implemented in the restructured code - Remove metric skip guards — alerts_effective_active_at_timestamp_seconds metric now exists in sradco's code (pkg/management/metrics/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The relabeled rules cache takes 75+ seconds to stamp openshift_io_alert_rule_id on new rules. The poll now tracks both counts separately and waits for all rules to have IDs. Previously the poll only counted rules with IDs, masking the fact that rules were appearing in the API but without IDs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The relabeled rules cache takes too long to stamp openshift_io_alert_rule_id on new rules in new namespaces. Accept rules as 'found' once they appear in the API response, even without IDs. Tests needing IDs will look them up at runtime. Results on sradco's code: 37 pass, 11 fail, 10 skip. All 6 metrics tests now pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add skipIfNoRuleID helper that skips tests when rule IDs are empty due to the relabeled cache re-sync bug (CNV-85482). Tests that only use ids.Watchdog (always has ID from startup) are unaffected. Affected tests: TC-032, TC-041, TC-042, TC-043, TC-048. Tests already skipped for UWM reasons are not double-skipped. TODO(CNV-85482): Remove skips and require IDs in poll once the relabeled rules cache re-sync bug is fixed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- UpdateAlertRuleResponse → UpdateAlertRuleResult - BulkDeleteUserDefinedAlertRulesRequest/Response → BulkDeleteAlertRulesRequest/Response - monitoringv1.Rule → managementrouter.AlertRuleSpec (pointer fields) - management.PrometheusRuleOptions → managementrouter.PrometheusRuleTarget - StatusCode fields now int32 - Remove duplicate strPtr (now in helpers_test.go) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Create separate user namespace (cluster-monitoring=false) for user-defined rule tests when UWM is accessible via Thanos port-forward - Dynamic skipIfNoUWM() replaces hardcoded t.Skip — tests run when UWM is reachable, skip when it's not - Apply CNV-85482 fix: pass server ctx to k8s.NewClient instead of initCtx - Require all seed rule IDs in poll (relabeled cache syncs with fix) - Fix TC-030c missing bearer token on raw HTTP request - Remove skipIfNoRuleID (no longer needed with CNV-85482 fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TC-033/034/035/036/039 still used patchRule (single-rule endpoint) which returns 404 since the route doesn't exist in sradco's code. Convert to patchSingleRuleViaBulk for all user-defined rule operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add refreshRuleID() helper to re-discover rule IDs by alert name before operations, since IDs change when relabeled cache stamps labels - Refresh UserRule and GitOpsRule IDs at start of Phase 6 and 7 - Convert TC-052 lifecycle patchRule/deleteRule to bulk wrappers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rule IDs change when the relabeled cache stamps new labels. Add refreshRuleID calls before every test that uses a non-Watchdog rule ID. Add sleep+refresh for newly created temp rules in TC-049/050/052. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- refreshRuleID now polls up to 90s and validates the ID works by probing the bulk endpoint before returning - Skip TC-036 and TC-052 which require single-rule PATCH with alertingRule body (not supported by bulk endpoint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The bulk PATCH probe in refreshRuleID caused side effects. Revert to simple GET-based refresh. Stale ID timing issues need developer input on how rule IDs are supposed to stabilize after cache re-sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rule IDs change when the relabeled cache stamps new labels (~75s cycle). Add 90s waits after TC-041 (label changes) and before TC-049/050/052 (temp rule creation) to let the cache stabilize before refreshing IDs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-19T05:31:15Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 55b0a86c-1375-4159-9991-6c3ac5c19e00

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-19T05:31:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: OhadRevah
Once this PR has been reviewed and has the lgtm label, please assign peteryurkovich for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Poll GET /rules until the rule ID changes from the old value, meaning Prometheus has re-evaluated the rule with updated labels. Times out after 2 minutes. Replaces fixed 90s sleeps with adaptive waiting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- waitForIDChange only runs when UWM is accessible (IDs change in user NS) - Newly created rules already have correct IDs stamped — use simple refresh - Fixes timeout when label updates don't persist in platform namespace Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The in-cluster plugin reaches UWM via Thanos fallback, not direct Prometheus route. Check both prometheus.status and fallbackReachable in the health response to properly detect UWM accessibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- isIDInCache: probes the relabeled cache via no-op bulk PATCH - waitForIDChange: now verifies both Prometheus AND cache agree on new ID - waitForIDReady: polls until ID exists in both Prometheus and cache - Replace all refreshRuleID with waitForIDReady for cache consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AlertingRuleEnabled=true silently succeeds for all IDs. Use false (drop) which returns 404 for unknown IDs, then immediately re-enable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Single-rule PATCH with alertingRule body is not available in the restructured API. Rewrite TC-036 (update user rule) and TC-052 (lifecycle PATCH step) to update labels via bulk PATCH instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PeterYurkovich and others added 30 commits November 10, 2025 10:42

feature: add namespace dropdown to dashboards page

50e7e77

feature: add dev-monitoring redirects to admin console pages

2bd5c98

fix: prevent loop within effect

a0ab60d

fix: separate path and query params

b1d227b

fix: update namespace from query params to console

daab768

feature: check user permissions to determine alerting path

3d11da5

feature: fix all remaining locations where tenancy is used within alerts

e9d77d5

chore(i18n): update translations

1a580cb

Signed-off-by: Gabriel Bernal <gbernal@redhat.com>

fix: make the query parameter more expressive of its purpose

f77b3a6

fix: add missing conversion units

6967968

Merge pull request openshift#646 from jgbernalp/add-translations

24e74e8

OU-356: add translations from memsource

monitoring-plugin testing instructions - still manual

83aa06f

Merge pull request openshift#649 from etmurasaki/ou-1076

29ca2f9

OU-1076: monitoring-plugin E2E testing instructions to AGENTS.md

unit testing instructions on AGENTS.md

9ae7871

Merge pull request openshift#650 from etmurasaki/ou-1076-1

8298186

OU-1076: unit testing instructions on AGENTS.md

fix specific translations and items without translations

58709b7

Signed-off-by: Gabriel Bernal <gbernal@redhat.com>

Merge pull request openshift#567 from PeterYurkovich/ou-949

3c3fe3b

OU-949: add namespace dropdown to dashboards page

Merge pull request openshift#651 from jgbernalp/fix-translations-2

e37e1cd

OU-356: fix specific translations and items without translations

fix: add missing translation for incidents filters

894c53b

Merge pull request openshift#648 from PeterYurkovich/OCPBUGS-63458

8b54c68

OCPBUGS-63458: add missing conversion units

Merge pull request openshift#633 from DavidRajnoha/cypress-configurab…

5cdc38c

…le-namespace-and-idempotent-cleanups OBSINTA-858: configurable Cypress COO namespace and idempotent cleanup

test(cypress): Regression for Incidents - firing alerts

6b0b78b

Tests that verifies that alerts are not marked as resolved based on not refreshing values or requests with stale timestamp. Updates the incidents commands for creation of an alert with fixed name per test.

Merge pull request openshift#653 from jgbernalp/fix-clear-filters-tra…

d8a518a

…nslation OU-1093: fix: add missing translation for incidents filters

Merge pull request openshift#622 from DavidRajnoha/cypress-incidents-…

516050c

…regressions-firing-alerts OBSINTA-777: [Incidents] Regression tests for firing alerts in Incidents

fix: always use non-tenancy path for admin users and set activeNamesp…

65e0b3b

…ace based on namespace parameter

Merge pull request openshift#601 from DavidRajnoha/cypress-tags

c89fe68

OBSINTA-857: implement grep tags for selective test execution

fix: remove namespace bar when page isn't namespaced

69eb8c4

sradco and others added 20 commits May 6, 2026 12:33

add alerts_effective_active_at metric

8cbd67c

Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>

Switch server.go to local dev kubeconfig mode

2f4eb6d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openshift-ci Bot requested review from PeterYurkovich and jgbernalp May 19, 2026 05:31

OhadRevah and others added 6 commits May 19, 2026 08:50

Fix cache probe: use AlertingRuleEnabled=false to detect rule existence

90e279b

AlertingRuleEnabled=true silently succeeds for all IDs. Use false (drop) which returns 404 for unknown IDs, then immediately re-enable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test alert management v2 sradco#957

Test alert management v2 sradco#957
OhadRevah wants to merge 234 commits into
openshift:alerts-management-apifrom
OhadRevah:test-alert-management-v2-sradco

OhadRevah commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review skipped

Uh oh!

openshift-ci Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Conversation

OhadRevah commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

coderabbitai Bot commented May 19, 2026 •

edited

Loading