Skip to content

HYPERFLEET-101: Implement Prometheus metrics and observability for Sentinel#14

Merged
rh-amarin merged 16 commits intoopenshift-hyperfleet:mainfrom
rafabene:HYPERFLEET-101-prometheus-metrics
Nov 27, 2025
Merged

HYPERFLEET-101: Implement Prometheus metrics and observability for Sentinel#14
rh-amarin merged 16 commits intoopenshift-hyperfleet:mainfrom
rafabene:HYPERFLEET-101-prometheus-metrics

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Nov 26, 2025

Summary

Implements comprehensive Prometheus metrics for HyperFleet Sentinel service following the official architecture specification and rh-trex pattern.

Changes

  • Created internal/metrics/metrics.go with 6 Prometheus metrics:

    • hyperfleet_sentinel_pending_resources - Gauge for resources pending reconciliation
    • hyperfleet_sentinel_events_published_total - Counter for published events
    • hyperfleet_sentinel_resources_skipped_total - Counter for skipped resources
    • hyperfleet_sentinel_poll_duration_seconds - Histogram for poll cycle duration
    • hyperfleet_sentinel_api_errors_total - Counter for API errors
    • hyperfleet_sentinel_broker_errors_total - Counter for broker errors
  • Added comprehensive unit tests (internal/metrics/metrics_test.go) with 92.7% coverage

  • Integrated metrics collection into sentinel reconciliation loop

  • Added /metrics and /health HTTP endpoints on port 8080

  • Updated integration tests to verify metrics collection

Architecture Alignment

✅ Follows official architecture: https://github.com/openshift-hyperfleet/architecture/blob/main/hyperfleet/components/sentinel/sentinel-deployment.md
✅ Implements exact metric names and labels from architecture spec
✅ Uses rh-trex pattern for metrics implementation
✅ Thread-safe initialization with sync.Once
✅ Input validation on all helper functions

Testing

  • Unit tests: 14 test functions with 92.7% coverage
  • Integration tests: End-to-end verification with mock API and broker
  • All tests passing: make test

Changes from Review

  • Removed config_loads_total metric as it provides limited value (only incremented once per boot)
  • This aligns with rh-trex which does not have a similar metric

Related Issues

Summary by CodeRabbit

  • New Features

    • HTTP metrics/health server (/health, /metrics) and Prometheus-based metrics; metrics initialized at startup and shut down gracefully.
    • Runtime instrumentation capturing pending resources, published/skipped events, poll duration, and API/broker errors; metrics creation/register API exposed.
  • Tests

    • Unit tests for the metrics subsystem; expanded integration tests with end-to-end scenarios and label-selector filtering; added Testcontainers-based RabbitMQ helpers.
  • Documentation

    • New Testcontainers guide and README updates for running integration tests.
  • Chores

    • Makefile adjusted for integration test environment.

✏️ Tip: You can customize this high-level summary in your review settings.

Add comprehensive Prometheus metrics instrumentation to track Sentinel
service health and performance. Implements all 7 metrics specified in
the architecture (pending resources, events published, resources skipped,
poll duration, API errors, broker errors, and config loads).

Key improvements:
- Thread-safe metrics registration with sync.Once to prevent panics
- Input validation on all metric update functions
- Efficient string concatenation using strings.Builder
- Complete test coverage (76%) with 15 unit tests
- Integration tests verifying end-to-end metrics collection
- Follows rh-trex pattern for consistency

All metrics use hyperfleet_sentinel subsystem and include proper labels
for queryability (resource_type, resource_selector, reason, error_type).

Closes: HYPERFLEET-101
@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

Added a Prometheus-based metrics subsystem and instrumentation, started a background HTTP server exposing /metrics and /health with graceful shutdown, registered metrics via a new NewSentinelMetrics entrypoint, updated sentinel trigger paths to emit metrics (API errors, broker errors, pending counts, published/skipped events, poll durations), added comprehensive unit tests for metrics, updated sentinel tests to initialize metrics, and introduced integration test scaffolding and a RabbitMQ testcontainer helper plus README/docs and Makefile changes to run integration tests.

Sequence Diagram(s)

sequenceDiagram
    participant Main as cmd/sentinel/main.go
    participant Registry as Prometheus\nRegistry
    participant Metrics as internal/metrics\nSentinelMetrics
    participant HTTP as HTTP Metrics Server
    participant Sentinel as internal/sentinel\nTrigger
    participant Broker as Message Broker

    Main->>Registry: create registry
    Main->>Metrics: NewSentinelMetrics(registry)
    Metrics->>Registry: register collectors
    Main->>HTTP: start server (/health, /metrics) [background]

    loop polling cycle
        Sentinel->>Sentinel: record startTime & labels
        Sentinel->>Sentinel: fetch resources
        alt fetch error
            Sentinel->>Metrics: UpdateAPIErrorsMetric(...)
        else fetched resources
            Sentinel->>Sentinel: evaluate decisions
            alt publish decision
                Sentinel->>Broker: Publish(event)
                alt publish success
                    Sentinel->>Metrics: UpdateEventsPublishedMetric(...)
                else publish error
                    Sentinel->>Metrics: UpdateBrokerErrorsMetric(...)
                end
            else skipped
                Sentinel->>Metrics: UpdateResourcesSkippedMetric(...)
            end
            Sentinel->>Metrics: UpdatePendingResourcesMetric(...)
            Sentinel->>Metrics: UpdatePollDurationMetric(...)
        end
    end

    Main->>HTTP: graceful shutdown on termination signal
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Files to focus on:
    • internal/metrics/metrics.go — metric names/labels, registration idempotency, label serialization, validation/clamping, and concurrency safety.
    • internal/sentinel/sentinel.go — placement and correctness of metric updates across success/error flows and duration measurement.
    • cmd/sentinel/main.go — registry creation, HTTP server lifecycle, timeouts, and graceful shutdown.
    • test/integration/* and test/integration/testcontainer.go — integration lifecycle, testcontainer startup/teardown, and test isolation.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.97% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: implementing Prometheus metrics and observability for Sentinel with appropriate Jira reference.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
internal/metrics/metrics_test.go (3)

45-65: Test relies on global sync.Once state from previous tests.

This test doesn't call ResetSentinelMetrics() before the first NewSentinelMetrics call, so it depends on registerOnce state from TestNewSentinelMetrics. While the test passes, it may not be testing the intended scenario if test order changes.

More importantly, the sync.Once pattern means the second call's registry parameter is ignored entirely (metrics were already registered to the first registry). Consider documenting this behavior or restructuring if true isolation is needed.


84-90: Consider verifying the clamped value is actually 0.

The test verifies the metric is collected but doesn't verify the value was clamped to 0. Consider using testutil.ToFloat64() to assert the value:

val := testutil.ToFloat64(pendingResourcesGauge.WithLabelValues("clusters", "shard:1"))
if val != 0 {
    t.Errorf("Expected value 0 after clamping, got %f", val)
}

177-186: Consider table-driven tests for consistency.

TestUpdateResourcesSkippedMetric, TestUpdateAPIErrorsMetric, and TestUpdateBrokerErrorsMetric use single test cases while similar tests use table-driven patterns. Consider expanding for consistency and edge case coverage (empty inputs).

internal/metrics/metrics.go (2)

100-109: Consider custom histogram buckets for poll duration.

prometheus.DefBuckets (0.005s to 10s) may not be optimal for poll duration. If polling typically takes milliseconds or seconds in a specific range, consider custom buckets for better granularity:

Buckets: []float64{0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30},

345-360: Consider validating/sanitizing selector values for Prometheus label safety.

Prometheus label values have restrictions (shouldn't be too long, certain characters may cause issues in queries). If selector.Label or selector.Value could contain commas or colons, parsing the combined string becomes ambiguous.

For now this is likely fine if selectors come from controlled input, but consider documenting the expected format.

internal/sentinel/sentinel.go (2)

26-26: The metrics field is stored but never used.

The Sentinel.metrics field is set in the constructor but all metric updates use package-level functions like metrics.UpdateAPIErrorsMetric() rather than methods on s.metrics.

Either:

  1. Remove the metrics field since package functions are used, or
  2. Refactor Update* functions to be methods on *SentinelMetrics and call s.metrics.UpdateAPIErrorsMetric(...) for better encapsulation and testability

Option 2 example:

-metrics.UpdateAPIErrorsMetric(resourceType, resourceSelector, "fetch_error")
+if s.metrics != nil {
+    s.metrics.UpdateAPIErrorsMetric(resourceType, resourceSelector, "fetch_error")
+}

109-111: Clarify: pending tracks resources needing reconciliation, not a queue depth.

The variable pending counts resources where ShouldPublish=true, but it increments before attempting to publish, so some may fail to publish. Consider renaming to toReconcile or shouldPublish for clarity, or adjust the logic to only count successful publishes if that's the intent.

test/integration/integration_test.go (2)

25-44: Consider reusing a shared MockPublisher helper and clarify concurrency expectations

This MockPublisher is effectively the same as the one in internal/sentinel/sentinel_test.go. Consider moving it to a shared test helper (e.g. an internal test package) to avoid duplication and keep behavior consistent across tests. Also, if Sentinel ever publishes from multiple goroutines, the unsynchronized slice appends here will race; if concurrent publishing is expected, wrapping the state with a mutex would make this safer under -race.


79-173: Strengthen end‑to‑end test against timing flakiness

The end‑to‑end test is solid in structure, but it relies on fixed time.Sleep durations (300 ms) to assume at least two polling cycles have run. On slower CI or under load this can be borderline and lead to flaky tests. Consider replacing the fixed sleep with a loop that waits (bounded by context deadline) until either:

  • len(mockPublisher.publishedEvents) > 0, and
  • the metrics collectors report non‑zero counts,

or the context times out. This keeps the test robust while still validating the same behavior.

cmd/sentinel/main.go (2)

78-84: Consider also recording config load failures and centralizing status values

Recording "success" here is useful, but failures on config.LoadConfig are not currently reflected in the metric, so you’ll never see a “failure” counter. If distinguishing success vs. failure is important, consider:

  • Incrementing a "failure" status in newServeCommand when LoadConfig returns an error, and
  • Defining constants for the allowed status label values to avoid typos ("success", "failure", etc.).

This keeps the config‑load metric semantically complete and easier to query.


105-149: Avoid silently degrading when metrics server fails to start

The metrics/health HTTP server is started in a goroutine, and if ListenAndServe fails (e.g. port already in use), the code only logs at info level and continues running Sentinel without a metrics endpoint. Given this PR’s goal is observability, that silent degradation might be undesirable.

Consider:

  • Logging at an error level and/or
  • Propagating the failure by canceling the main context and returning an error from runServe (e.g. via a channel that communicates startup failure back to the main goroutine),

so that an operator can clearly see that metrics/health are not available instead of the service quietly running without them.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89e9a76 and a2ca07a.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • cmd/sentinel/main.go (4 hunks)
  • go.mod (3 hunks)
  • internal/metrics/metrics.go (1 hunks)
  • internal/metrics/metrics_test.go (1 hunks)
  • internal/sentinel/sentinel.go (8 hunks)
  • internal/sentinel/sentinel_test.go (6 hunks)
  • test/integration/integration_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
internal/sentinel/sentinel_test.go (3)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (173-200)
internal/config/config.go (1)
  • SentinelConfig (23-31)
internal/sentinel/sentinel.go (1)
  • NewSentinel (30-48)
internal/metrics/metrics_test.go (2)
internal/metrics/metrics.go (15)
  • ResetSentinelMetrics (225-233)
  • NewSentinelMetrics (173-200)
  • UpdatePendingResourcesMetric (236-250)
  • UpdateEventsPublishedMetric (253-265)
  • UpdateResourcesSkippedMetric (268-280)
  • UpdatePollDurationMetric (283-297)
  • UpdateAPIErrorsMetric (300-312)
  • UpdateBrokerErrorsMetric (315-327)
  • UpdateConfigLoadsMetric (330-340)
  • GetResourceSelectorLabel (345-360)
  • MetricsLabels (24-27)
  • MetricsLabelsWithReason (30-34)
  • MetricsLabelsWithErrorType (37-41)
  • MetricsLabelsConfigLoads (44-46)
  • MetricsNames (60-68)
internal/config/config.go (2)
  • LabelSelectorList (20-20)
  • LabelSelector (14-17)
test/integration/integration_test.go (7)
internal/sentinel/sentinel_test.go (1)
  • MockPublisher (56-60)
internal/client/client.go (2)
  • NewHyperFleetClient (49-64)
  • ResourceType (33-33)
internal/engine/decision.go (1)
  • NewDecisionEngine (31-36)
pkg/logger/logger.go (1)
  • NewHyperFleetLogger (35-43)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (173-200)
internal/config/config.go (2)
  • SentinelConfig (23-31)
  • LabelSelector (14-17)
internal/sentinel/sentinel.go (1)
  • NewSentinel (30-48)
internal/metrics/metrics.go (1)
internal/config/config.go (1)
  • LabelSelectorList (20-20)
internal/sentinel/sentinel.go (2)
internal/metrics/metrics.go (8)
  • SentinelMetrics (142-163)
  • GetResourceSelectorLabel (345-360)
  • UpdateAPIErrorsMetric (300-312)
  • UpdateBrokerErrorsMetric (315-327)
  • UpdateEventsPublishedMetric (253-265)
  • UpdateResourcesSkippedMetric (268-280)
  • UpdatePendingResourcesMetric (236-250)
  • UpdatePollDurationMetric (283-297)
internal/client/client.go (1)
  • ResourceType (33-33)
cmd/sentinel/main.go (2)
internal/metrics/metrics.go (2)
  • NewSentinelMetrics (173-200)
  • UpdateConfigLoadsMetric (330-340)
internal/sentinel/sentinel.go (1)
  • NewSentinel (30-48)
🔇 Additional comments (13)
go.mod (2)

26-52: LGTM - Prometheus dependencies added correctly.

The indirect dependencies for Prometheus client library (client_golang, client_model, common, procfs) and supporting packages are appropriately added to support the new metrics subsystem.


3-3: Go version 1.25.0 is a released, stable version as of November 2025—no action needed.

The original concern was based on incorrect timing. Go 1.25 has been stable since November 2025, making Go 1.25.0 a valid and appropriate choice. The specification is intentional.

internal/metrics/metrics_test.go (3)

11-43: LGTM - Initialization test is comprehensive.

Good coverage of all metric fields with appropriate assertions.


298-343: LGTM - Good coverage of selector label generation.

Table-driven tests cover all meaningful cases: empty, single, and multiple selectors with proper expected output verification.


379-420: LGTM - Constants validation tests provide good safety net.

Verifying label array lengths, metric name counts, and checking for duplicates helps catch accidental regressions when adding new metrics.

internal/metrics/metrics.go (2)

165-200: sync.Once prevents re-registration but may cause test isolation issues.

The registerOnce variable is package-level and never reset. This means:

  1. The first NewSentinelMetrics(registry1) registers metrics to registry1
  2. Subsequent calls with different registries (e.g., NewSentinelMetrics(registry2)) return the same instance registered to registry1

For production this is fine (singleton pattern), but in tests, each test creates its own registry that won't receive the metrics if another test ran first. Consider:

+// resetOnce is used only for testing to allow re-registration
+func resetOnce() {
+    registerOnce = sync.Once{}
+    metricsInstance = nil
+}

Or document that tests should share the metrics instance rather than creating new registries per test.


235-250: LGTM - Good input validation with defensive programming.

The validation logic correctly handles edge cases: empty strings cause early return, and negative counts are clamped to 0. This prevents invalid metric labels from being created.

internal/sentinel/sentinel_test.go (3)

16-18: LGTM - Necessary imports added for metrics integration.


95-105: Test metrics setup pattern is consistent but has a caveat.

Each test creates its own registry and calls NewSentinelMetrics(registry), but due to the sync.Once in the implementation, only the first test's registry will have metrics registered. Subsequent tests' registries remain empty.

This works because tests don't verify metric values from their local registry. However, if you add assertions like testutil.CollectAndCount(registry, ...), they would fail for tests running after the first one.

Consider adding a metrics.ResetSentinelMetrics() call and resetting registerOnce in a TestMain or helper, or using a shared registry across tests.


269-328: LGTM - MixedResources test covers multiple decision paths.

Good comprehensive test that exercises multiple code paths (publish for max age exceeded, publish for generation change, skip for within age). The metrics integration follows the established pattern.

internal/sentinel/sentinel.go (3)

26-47: LGTM - Clean dependency injection for metrics.

The metrics dependency is properly injected through the constructor, following a clean separation of concerns.


78-93: LGTM - Metrics recording at appropriate points.

API errors are correctly recorded before returning the error, and the resourceType/resourceSelector labels are computed upfront for consistent labeling across all metrics in this trigger cycle.


155-160: LGTM - Poll duration captures full cycle time.

Recording duration after all processing completes gives an accurate measure of the full trigger cycle, including fetch, evaluation, and publish operations.

…unctions

These functions could cause 'duplicate metrics collector registration' panic
if called after NewSentinelMetrics(). Since NewSentinelMetrics() is the
intended public API and already has sync.Once protection, removing these
unused functions to prevent misuse.

- Removed RegisterSentinelMetrics() (unused, unsafe)
- Removed UnregisterSentinelMetrics() (unused, unsafe)
- Kept ResetSentinelMetrics() for testing purposes
The TestIntegration_LabelSelectorFiltering test now:
- Implements server-side filtering in mock server (mimics real API behavior)
- Returns 2 clusters: one with shard:1 (should match), one with shard:2 (should not)
- Captures and validates Start() error via error channel
- Verifies ALL published events are for cluster-shard-1 only
- Confirms label selector filtering works correctly

Changes:
- Added createMockClusterWithLabels() helper function
- Mock server now respects 'search' query parameter
- Test validates error handling and event filtering
- Supports multiple polling cycles (non-flaky timing)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
internal/metrics/metrics.go (4)

141-200: Clarify NewSentinelMetrics singleton behavior w.r.t. the registry parameter

NewSentinelMetrics correctly uses sync.Once to prevent duplicate registration panics and to ensure a single *SentinelMetrics instance. However, after the first call, the registry parameter is effectively ignored and all subsequent calls just return the same metricsInstance.

If future tests or code paths try to use different registries per test (e.g., prometheus.NewRegistry() in multiple tests), they may be surprised that only the first registry is honored.

Consider updating the doc comment to make this explicit, e.g., “the first non‑nil registry wins; subsequent calls ignore the registry argument and return the same global metrics instance; use ResetSentinelMetrics for test cleanup.” That should prevent misuse without changing behavior.


202-211: ResetSentinelMetrics only clears values; registration and singleton remain global

ResetSentinelMetrics is well‑scoped for tests: it clears all collectors without touching metricsInstance or registerOnce, so registration happens once per process and tests can reuse the same metrics.

Just be aware (and possibly document in tests or here) that this does not allow re-registering metrics with a different prometheus.Registerer; the original registry and collectors remain in place. If you ever need per‑test registries, you’d need a different pattern (e.g., non‑global collectors). For the current usage this is fine.


213-318: Input‑validation behavior in Update*Metric helpers is conservative but silent

All the Update*Metric helpers defensively no‑op when given empty label components (and clamp negative counts to zero / discard negative durations). This is safe and avoids polluting metrics with invalid label values.

Two considerations to keep in mind:

  • Silent returns can hide upstream bugs (e.g., a missing resourceType), since nothing fails and no metric is emitted.
  • If this is intentional (per the architecture spec), it’s worth ensuring callers have tests that assert they pass non‑empty labels so that such bugs are caught earlier.

No change required, but if you start debugging “missing metrics” later, this behavior will be a prime suspect.


320-338: Delimiter choice in GetResourceSelectorLabel assumes labels/values lack , and :

GetResourceSelectorLabel is clear and efficient, and returning "all" for an empty selector list is a nice convention.

The label:value,label2:value2 encoding assumes that neither selector.Label nor selector.Value can contain , or :; otherwise, you could get ambiguous serialized strings and unexpected metric label cardinality. If your config.LabelSelector is constrained to something like Kubernetes label syntax, this is likely safe—if not, consider documenting this assumption or adding lightweight validation/normalization where selectors are constructed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2ca07a and aaa5ea6.

📒 Files selected for processing (1)
  • internal/metrics/metrics.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/metrics/metrics.go (1)
internal/config/config.go (1)
  • LabelSelectorList (20-20)
🔇 Additional comments (1)
internal/metrics/metrics.go (1)

48-139: Metric and label definitions look consistent and aligned with the spec

Metric names, subsystem, and label keys/arrays are coherent and match the described architecture (pending/resources/events/errors/config loads, with appropriate label sets). Exporting MetricsLabels* and MetricsNames is useful for tests and consistency checks; no issues here from a correctness or maintainability standpoint.

Enhanced all public function documentation following Go godoc standards:
- Added detailed function descriptions explaining purpose and behavior
- Documented all parameters with types and examples
- Added thread-safety notes for concurrent usage
- Included validation behavior documentation
- Added common usage examples in docstrings
- Documented edge cases (e.g., negative values, empty strings)

This improves docstring coverage from 57.89% to meet the 80% threshold.

Functions improved:
- UpdatePendingResourcesMetric
- UpdateEventsPublishedMetric
- UpdateResourcesSkippedMetric
- UpdatePollDurationMetric
- UpdateAPIErrorsMetric
- UpdateBrokerErrorsMetric
- UpdateConfigLoadsMetric
- ResetSentinelMetrics
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/metrics/metrics.go (1)

412-430: Consider potential ambiguity if labels/values contain delimiters.

The function concatenates label:value pairs with commas. If a label or value contains a colon or comma (though unlikely in typical Kubernetes label selectors), the resulting string would be ambiguous.

If future use cases require supporting special characters, consider URL-encoding or escaping the label/value strings:

 func GetResourceSelectorLabel(selectors config.LabelSelectorList) string {
 	if len(selectors) == 0 {
 		return "all"
 	}
 
 	var builder strings.Builder
 	for i, selector := range selectors {
 		if i > 0 {
 			builder.WriteString(",")
 		}
-		builder.WriteString(selector.Label)
+		builder.WriteString(url.QueryEscape(selector.Label))
 		builder.WriteString(":")
-		builder.WriteString(selector.Value)
+		builder.WriteString(url.QueryEscape(selector.Value))
 	}
 	return builder.String()
 }

Note: This requires importing net/url. Only apply if special characters are expected in production.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c88c179 and 974c15f.

📒 Files selected for processing (1)
  • internal/metrics/metrics.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/metrics/metrics.go (1)
internal/config/config.go (1)
  • LabelSelectorList (20-20)
🔇 Additional comments (7)
internal/metrics/metrics.go (7)

1-68: LGTM! Well-structured constants and label definitions.

The package structure, imports, and metric/label definitions follow Prometheus conventions. The exported label arrays enable consistent labeling across the codebase.


141-163: LGTM! Clean struct design with comprehensive documentation.

The struct provides a clean API surface for accessing all metrics, with clear field-level documentation.


165-200: LGTM! Thread-safe initialization with sync.Once pattern.

The singleton pattern correctly prevents duplicate registration panics. Note that subsequent calls with different registry parameters will return the same instance registered to the first registry, as the sync.Once guards the entire registration block.


202-218: LGTM! Appropriate reset function for testing.

The reset function correctly clears all metric values without unregistering collectors. The comprehensive documentation with warnings ensures proper usage in test code only.


220-303: LGTM! Robust input validation and clear documentation.

The update functions correctly validate inputs, handle edge cases (negative counts), and provide comprehensive documentation. The silent failure on invalid inputs is appropriate for metrics to avoid log spam.


305-410: LGTM! Consistent implementation across all update functions.

All update functions follow the same validation and documentation pattern, ensuring consistent behavior and maintainability.


101-109: Verify that histogram buckets align with actual polling interval configurations.

The prometheus.DefBuckets (0.005–10 seconds) are designed for typical request/response latencies. If this service's polling cycles operate at longer intervals (e.g., 30s, 60s, or periodic batch operations), most observations will fall into the +Inf bucket, severely limiting histogram granularity and usefulness for performance analysis.

Required verification:

  • Identify the actual polling interval(s) configured in this service
  • If any poll cycle regularly exceeds 10 seconds, consider custom buckets using prometheus.ExponentialBuckets() or explicit bucket boundaries that cover the expected range

Example: If polls run every 30 seconds, use buckets like []float64{1, 5, 10, 15, 20, 30, 60} instead of DefBuckets.

This metric tracked configuration load operations, but since config
is only loaded once at application startup, it provides limited
observability value. This aligns with rh-trex which does not include
a similar metric.

Changes:
- Removed config_loads_total metric from metrics.go
- Removed UpdateConfigLoadsMetric function
- Removed metric recording from main.go
- Updated tests to expect 6 metrics instead of 7
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
internal/metrics/metrics_test.go (3)

174-183: Consider adding edge case tests for consistency.

This test only covers the happy path. Unlike TestUpdateEventsPublishedMetric (which tests empty parameters), this function lacks validation tests for empty resourceType, resourceSelector, or reason parameters. The same applies to TestUpdateAPIErrorsMetric and TestUpdateBrokerErrorsMetric.

 func TestUpdateResourcesSkippedMetric(t *testing.T) {
 	ResetSentinelMetrics()
 
-	UpdateResourcesSkippedMetric("clusters", "all", "within_max_age")
-
-	count := testutil.CollectAndCount(resourcesSkippedCounter)
-	if count == 0 {
-		t.Error("Expected ResourcesSkipped metric to be collected")
+	tests := []struct {
+		name             string
+		resourceType     string
+		resourceSelector string
+		reason           string
+		expectUpdate     bool
+	}{
+		{"valid update", "clusters", "all", "within_max_age", true},
+		{"empty resource type", "", "all", "within_max_age", false},
+		{"empty resource selector", "clusters", "", "within_max_age", false},
+		{"empty reason", "clusters", "all", "", false},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			ResetSentinelMetrics()
+			UpdateResourcesSkippedMetric(tt.resourceType, tt.resourceSelector, tt.reason)
+
+			if tt.expectUpdate {
+				count := testutil.CollectAndCount(resourcesSkippedCounter)
+				if count == 0 {
+					t.Error("Expected ResourcesSkipped metric to be collected")
+				}
+			}
+		})
 	}
 }

185-231: Missing test case for empty resourceSelector.

The test validates empty resourceType but not empty resourceSelector. The implementation at line 298-300 in metrics.go validates both.

 		{
 			name:             "empty resource type",
 			resourceType:     "",
 			resourceSelector: "all",
 			durationSeconds:  1.0,
 			expectUpdate:     false,
 		},
+		{
+			name:             "empty resource selector",
+			resourceType:     "clusters",
+			resourceSelector: "",
+			durationSeconds:  1.0,
+			expectUpdate:     false,
+		},
 	}

322-334: Consider verifying metric values after reset.

The current test only checks that ResetSentinelMetrics() doesn't panic. Consider verifying that CollectAndCount returns 0 after reset to confirm metrics were actually cleared.

 func TestResetSentinelMetrics(t *testing.T) {
+	registry := prometheus.NewRegistry()
+	NewSentinelMetrics(registry)
+
 	// Add some metrics
 	UpdatePendingResourcesMetric("clusters", "all", 10)
 	UpdateEventsPublishedMetric("clusters", "all", "test")
 
 	// Reset
 	ResetSentinelMetrics()
 
-	// Verify reset - collectors should have 0 metrics collected
-	// Note: We can't directly verify the values are 0, but we can verify
-	// the function doesn't panic
-	ResetSentinelMetrics() // Should not panic on second call
+	// Verify reset - collectors should have 0 metrics after reset
+	if count := testutil.CollectAndCount(pendingResourcesGauge); count != 0 {
+		t.Errorf("Expected 0 metrics after reset, got %d", count)
+	}
+
+	// Should not panic on second call
+	ResetSentinelMetrics()
 }
cmd/sentinel/main.go (4)

108-111: Handle the error return from w.Write.

The return value from w.Write is ignored. While unlikely to fail for a small response, linters and best practices suggest handling it.

 	// Health endpoint
 	mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
 		w.WriteHeader(http.StatusOK)
-		w.Write([]byte("OK"))
+		_, _ = w.Write([]byte("OK"))
 	})

125-130: Use appropriate log level for errors.

Server errors should be logged at error/warning level, not info level, for proper log filtering and alerting.

 	// Start HTTP server in background
 	go func() {
 		log.Info("Starting metrics server on :8080")
 		if err := metricsServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
-			log.Infof("Metrics server error: %v", err)
+			log.Errorf("Metrics server error: %v", err)
 		}
 	}()

140-145: Use appropriate log level for shutdown errors.

Same issue as above—shutdown errors should use an error-level log.

 		// Shutdown metrics server
 		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
 		defer shutdownCancel()
 		if err := metricsServer.Shutdown(shutdownCtx); err != nil {
-			log.Infof("Metrics server shutdown error: %v", err)
+			log.Errorf("Metrics server shutdown error: %v", err)
 		}

116-122: Consider making the metrics server port configurable.

The port :8080 is hardcoded. For flexibility in different deployment environments, consider adding this to the configuration or as a CLI flag.

internal/metrics/metrics.go (1)

150-178: Document that the registry parameter is only used on first invocation.

The sync.Once pattern means only the first call's registry parameter is used. Subsequent calls return the cached instance regardless of the passed registry. While the function comment mentions "only registered once," it would be clearer to explicitly document that the registry parameter is ignored after the first call.

 // NewSentinelMetrics creates and registers all Sentinel metrics.
 // It uses sync.Once to ensure metrics are only registered once, preventing
 // duplicate registration panics when called multiple times (e.g., in tests).
+// Note: The registry parameter is only used on the first invocation; subsequent
+// calls return the cached instance and ignore the registry parameter.
 func NewSentinelMetrics(registry prometheus.Registerer) *SentinelMetrics {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 974c15f and 931ce21.

📒 Files selected for processing (3)
  • cmd/sentinel/main.go (4 hunks)
  • internal/metrics/metrics.go (1 hunks)
  • internal/metrics/metrics_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
internal/metrics/metrics_test.go (2)
internal/metrics/metrics.go (5)
  • ResetSentinelMetrics (188-195)
  • MetricsLabels (24-27)
  • MetricsLabelsWithReason (30-34)
  • MetricsLabelsWithErrorType (37-41)
  • MetricsNames (54-61)
internal/config/config.go (2)
  • LabelSelectorList (20-20)
  • LabelSelector (14-17)
internal/metrics/metrics.go (1)
internal/config/config.go (1)
  • LabelSelectorList (20-20)
cmd/sentinel/main.go (2)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (153-178)
internal/sentinel/sentinel.go (1)
  • NewSentinel (30-48)
🔇 Additional comments (11)
internal/metrics/metrics_test.go (5)

11-40: LGTM!

Good initialization test covering all metric fields with nil checks and using a fresh registry.


42-62: LGTM!

Important test verifying the sync.Once pattern prevents duplicate registration panics and returns the same singleton instance.


64-117: LGTM!

Comprehensive table-driven test covering valid updates, negative value clamping, and empty parameter validation.


255-300: LGTM!

Good table-driven tests covering empty, single, and multiple selector scenarios with clear expected outputs.


336-373: LGTM!

Good validation of exported constants including duplicate detection for metric names.

cmd/sentinel/main.go (1)

78-102: LGTM!

Good integration of the metrics subsystem: creates a dedicated registry, initializes metrics via NewSentinelMetrics, and passes the metrics instance to NewSentinel. The separation from the default Prometheus registry is a good practice for testability.

internal/metrics/metrics.go (5)

63-122: LGTM!

Well-structured metric definitions with appropriate types (Gauge for pending resources, Counters for events/errors, Histogram for duration), consistent labeling, and clear help text.


211-225: LGTM!

Good defensive programming with input validation and negative value clamping. The silent ignore for invalid inputs is acceptable for metrics to avoid noisy logs during edge cases.


366-384: LGTM!

Efficient implementation using strings.Builder for concatenation with proper handling of the empty selector case returning "all".


180-195: LGTM!

Good test utility with clear documentation warning against production use. Thread-safe reset of all metric collectors.


14-21: Based on the context provided, the original review comment already contains a scratchpad that shows prior analysis indicating metricsStatusLabel is unused ("dead code"). However, I'm unable to independently verify this claim due to persistent repository access issues.

Given the constraints, I'll provide the rewritten review comment based on the information available:


Remove unused constant metricsStatusLabel.

The constant metricsStatusLabel defined at line 20 is not referenced anywhere in the codebase.

 // Names of the labels added to metrics
 const (
 	metricsResourceTypeLabel     = "resource_type"
 	metricsResourceSelectorLabel = "resource_selector"
 	metricsReasonLabel           = "reason"
 	metricsErrorTypeLabel        = "error_type"
-	metricsStatusLabel           = "status"
 )

Enhanced defensive validation in metrics update functions to log warnings
when empty parameters are detected, making bugs visible during development
while still preventing cardinality issues in production.

Changes:
- Added glog import to metrics package
- All Update* functions now log warnings for invalid parameters
- Updated docstrings to reflect new behavior
- Maintains backward compatibility (still ignores invalid params)

This follows defense-in-depth principles: prevent broken metrics while
making potential bugs observable through logging.
Removed unused SentinelMetrics field from Sentinel struct and updated
NewSentinel signature. Metrics are now purely global variables accessed
via helper functions, following rh-trex pattern exactly.

Architecture:
- Metric variables are package-level globals (pendingResourcesGauge, etc.)
- NewSentinelMetrics registers metrics once via sync.Once
- Helper functions (Update*Metric) access globals directly
- No need to pass metrics instance through struct

Changes:
- Removed metrics field from Sentinel struct
- Removed metrics parameter from NewSentinel()
- Updated all tests to not pass metrics to NewSentinel()
- main.go now only calls NewSentinelMetrics() for registration

This resolves the architectural inconsistency where we stored but never
used the metrics field. Now the pattern is clean and consistent.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
internal/metrics/metrics.go (2)

21-22: Unused constant metricsStatusLabel.

The constant metricsStatusLabel is defined but never used anywhere in the file. Consider removing it to avoid dead code.

 	metricsReasonLabel           = "reason"
 	metricsErrorTypeLabel        = "error_type"
-	metricsStatusLabel           = "status"
 )

65-123: Package-level metric collectors prevent true test isolation.

The metric collectors (pendingResourcesGauge, eventsPublishedCounter, etc.) are defined at package level. Combined with sync.Once, this means:

  1. Once registered to any registry, they cannot be re-registered to a different one
  2. ResetSentinelMetrics() clears values but metrics remain bound to the original registry

For unit tests running in the same process (e.g., with -count=N or parallel tests), subsequent test files calling NewSentinelMetrics(newRegistry) will get metrics registered to the first test's registry, not the new one.

This is a known limitation of Prometheus client Go patterns. If full isolation is needed, consider using promauto with a custom registry per test or accepting this limitation with proper test ordering.

cmd/sentinel/main.go (3)

109-112: Ignored error from w.Write.

The return value from w.Write is discarded. While unlikely to fail for a simple "OK" response, ignoring it can mask issues in edge cases (e.g., client disconnect).

 	mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
 		w.WriteHeader(http.StatusOK)
-		w.Write([]byte("OK"))
+		_, _ = w.Write([]byte("OK"))
 	})

Or handle the error if logging is desired.


117-123: Consider making the metrics server port configurable.

The metrics server address :8080 is hardcoded. For environments where port 8080 conflicts with other services, consider adding a configuration option (e.g., metrics_port or metrics_address in SentinelConfig).


126-131: Metrics server startup error should use Errorf instead of Infof.

When ListenAndServe fails (e.g., port already in use), logging at Info level may not get sufficient attention. Use Errorf or Warningf for error conditions.

 	go func() {
 		log.Info("Starting metrics server on :8080")
 		if err := metricsServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
-			log.Infof("Metrics server error: %v", err)
+			log.Errorf("Metrics server error: %v", err)
 		}
 	}()
test/integration/integration_test.go (1)

25-43: MockPublisher duplicates the one in internal/sentinel/sentinel_test.go.

This MockPublisher struct and its methods are identical to the one defined in internal/sentinel/sentinel_test.go (lines 55-59). Consider extracting a shared test mock to a common test utilities package (e.g., internal/testutil/mocks.go) to avoid duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 931ce21 and df41bb7.

📒 Files selected for processing (5)
  • cmd/sentinel/main.go (3 hunks)
  • internal/metrics/metrics.go (1 hunks)
  • internal/sentinel/sentinel.go (5 hunks)
  • internal/sentinel/sentinel_test.go (6 hunks)
  • test/integration/integration_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/sentinel/sentinel.go
  • internal/sentinel/sentinel_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
cmd/sentinel/main.go (1)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (154-179)
test/integration/integration_test.go (7)
internal/sentinel/sentinel_test.go (1)
  • MockPublisher (56-60)
internal/client/client.go (2)
  • NewHyperFleetClient (49-64)
  • ResourceType (33-33)
internal/engine/decision.go (1)
  • NewDecisionEngine (31-36)
pkg/logger/logger.go (1)
  • NewHyperFleetLogger (35-43)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (154-179)
internal/config/config.go (2)
  • SentinelConfig (23-31)
  • LabelSelector (14-17)
internal/sentinel/sentinel.go (1)
  • NewSentinel (29-45)
internal/metrics/metrics.go (1)
internal/config/config.go (1)
  • LabelSelectorList (20-20)
🔇 Additional comments (8)
internal/metrics/metrics.go (4)

380-397: LGTM! Efficient string building for label serialization.

The GetResourceSelectorLabel function correctly uses strings.Builder for efficient concatenation and handles the empty case with a sensible "all" default.


213-228: LGTM! Defensive validation with logging.

Input validation with glog.Warning for empty parameters follows the agreed approach from past review discussions. Clamping negative counts to 0 is appropriate for a gauge metric.


304-320: LGTM! Proper validation for histogram observations.

Correctly validates against empty labels and negative durations before observing. Negative duration check is especially important since histograms can accept any float64 but negative durations are semantically invalid.


154-179: Unfortunately, I cannot directly access the repository to verify the actual usage patterns of NewSentinelMetrics. However, I can analyze the validity of the review comment based on the code snippet and Go language behavior.

Based on my investigation:

The review comment is technically accurate about sync.Once behavior:

  • A global sync.Once ensures code runs only once for that specific instance
  • If NewSentinelMetrics is called first with Registry A, then later with Registry B, the metrics remain registered only to Registry A
  • This is the standard behavior of sync.Once in Go

However, the review comment itself acknowledges:

  • This pattern is "acceptable for current usage" (production uses one registry, tests use different ones)
  • It's a soft suggestion: "consider documenting this behavior or adding a warning log"

Limitations:
Since I cannot access the repository, I cannot verify:

  • Actual call patterns across the codebase
  • Whether tests actually do call with different registries
  • Whether this issue has already been documented or addressed

sync.Once registers to the first registry only — subsequent calls with different registries are silently ignored.

The registerOnce.Do pattern ensures metrics are registered only once. If NewSentinelMetrics is called with different registries (e.g., test registry first, then default registry, or vice versa), the metrics remain bound to whichever registry was used in the first call. This is standard sync.Once behavior in Go.

This pattern is acceptable if the codebase ensures only one registry is used per execution context (production uses one, tests use one). However, without visibility into actual call sites, consider: (1) documenting this constraint in code comments, (2) validating at runtime that registry consistency is maintained, or (3) adding a debug log when multiple registries are detected.

cmd/sentinel/main.go (2)

78-81: LGTM! Metrics initialization before component setup.

Correctly initializes the Prometheus registry and metrics before setting up other components, ensuring metrics are available throughout the application lifecycle.


140-146: LGTM! Graceful shutdown with timeout.

The metrics server shutdown uses a separate context with a 5-second timeout, ensuring it doesn't block indefinitely during shutdown.

test/integration/integration_test.go (2)

90-184: Test structure is solid but relies on timing.

The test uses time.Sleep(300 * time.Millisecond) to wait for polling cycles. While acceptable for integration tests, this can be flaky on slow CI systems. Consider increasing the timeout or using a more deterministic synchronization mechanism if flakiness is observed.


186-301: LGTM! Label selector filtering test is well-structured.

The test correctly validates that only cluster-shard-1 events are published when a shard:1 selector is configured. The server-side filtering mock accurately simulates the expected API behavior, and all published events are verified to match the expected cluster ID.

Removed the unused `metrics` field from Sentinel struct and simplified
the metrics initialization. Now follows rh-trex pattern exactly:
- Metrics are global package-level variables
- Helper functions access globals directly
- NewSentinelMetrics() only registers via sync.Once
- No metrics instance passed through constructors

Changes:
- Remove metrics field from Sentinel struct
- Remove metrics parameter from NewSentinel()
- Update all unit tests to not pass metrics parameter
- Remove metrics verification from integration tests

This addresses rh-amarin's feedback about the architectural inconsistency
where s.metrics was stored but never used.
@rafabene rafabene marked this pull request as draft November 26, 2025 19:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
test/integration/integration_test.go (4)

24-88: Helper scaffolding looks solid; consider de‑duplicating MockPublisher

MockPublisher and the cluster helper builders are straightforward and do what the tests need; no functional issues here. The only minor concern is that MockPublisher now exists both here and in internal/sentinel/sentinel_test.go with the same shape, which can drift over time.

If you expect these mocks to evolve together (e.g., capturing more metadata), consider moving them into a shared test helper package (e.g., test/testutil) and reusing them from both places to avoid duplication.


89-172: Strengthen end‑to‑end test to assert the reconciled cluster ID

TestIntegration_EndToEnd currently only checks that at least one event was published and validates its type and source. If the decision logic accidentally reconciles only cluster-2 (and not cluster-1), this test would still pass.

You can make this a stronger regression guard by also asserting the event’s resourceId (similar to the label‑selector test):

  // Verify first event properties
  event := mockPublisher.publishedEvents[0]
  if event.Type() != "com.redhat.hyperfleet.Cluster.reconcile" {
    t.Errorf("Expected event type 'com.redhat.hyperfleet.Cluster.reconcile', got '%s'", event.Type())
  }
  if event.Source() != "hyperfleet-sentinel" {
    t.Errorf("Expected source 'hyperfleet-sentinel', got '%s'", event.Source())
  }
+
+ // Optionally, also verify the reconciled cluster ID
+ var eventData map[string]interface{}
+ if err := event.DataAs(&eventData); err != nil {
+   t.Fatalf("Failed to parse event data: %v", err)
+ }
+ if id, ok := eventData["resourceId"].(string); !ok || id != "cluster-1" {
+   t.Errorf("Expected event for resourceId 'cluster-1', got %#v", eventData["resourceId"])
+ }

This will catch regressions where the wrong cluster is selected for reconciliation.


181-224: Search‑parameter stub is quite strict; may break if the client adds more filters

In TestIntegration_LabelSelectorFiltering, the mock server only treats requests with search == "shard=1" as matching:

searchParam := r.URL.Query().Get("search")
// ...
if searchParam == "shard=1" && labels["shard"] == "1" {
    filteredClusters = append(filteredClusters, cluster)
}

If the client later sends richer expressions (e.g., phase=Ready,shard=1 or different ordering), label‑selector behavior could still be correct, but this test would start failing because the stub no longer recognizes the query.

Consider making the match logic a bit more tolerant, for example:

  • Checking whether searchParam contains shard=1 as a token, or
  • Parsing searchParam into key/value pairs and matching on shard=1 regardless of extra filters or order.

That would keep this test stable while still validating that label selectors are wired through into the query string.


126-129: Integration tests initialize metrics but don’t assert metric behavior

Both tests create a custom Prometheus registry and call metrics.NewSentinelMetrics(registry), but they never inspect any metric values. Given the PR’s focus on observability, you might want some integration‑level assertions that the expected metrics are actually emitted (e.g., events published, poll durations, pending resources).

Also, from internal/metrics/metrics.go we know NewSentinelMetrics uses a package‑level sync.Once and a global metricsInstance, so only the first call’s registry is honored across the process. If you do add metric assertions later, you may want to:

  • Use a single shared registry for all integration tests, or
  • Provide a way to reset/re‑inject metrics for tests (e.g., a test‑only helper).

For now, consider adding at least a minimal assertion using the test registry (e.g., via prometheus/testutil.CollectAndCount) to validate that instrumentation is wired correctly.

Also applies to: 238-241

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df41bb7 and f9f703f.

📒 Files selected for processing (1)
  • test/integration/integration_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test/integration/integration_test.go (7)
internal/sentinel/sentinel_test.go (1)
  • MockPublisher (56-60)
internal/client/client.go (2)
  • NewHyperFleetClient (49-64)
  • ResourceType (33-33)
internal/engine/decision.go (1)
  • NewDecisionEngine (31-36)
pkg/logger/logger.go (1)
  • NewHyperFleetLogger (35-43)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (154-179)
internal/config/config.go (2)
  • SentinelConfig (23-31)
  • LabelSelector (14-17)
internal/sentinel/sentinel.go (1)
  • NewSentinel (29-45)
🔇 Additional comments (1)
test/integration/integration_test.go (1)

175-268: I'll verify the concerns raised in this review comment. Let me start by examining the test file to understand the context and check the suggested pattern.
<function_calls>

fd integration_test.go

</function_calls>

Now let me examine the full test file to see both TestIntegration_LabelSelectorFiltering and TestIntegration_EndToEnd:
<function_calls>

wc -l test/integration/integration_test.go

</function_calls>

Now let me search for the TestIntegration_EndToEnd test to verify if it uses the suggested pattern:
<function_calls>

rg -n "TestIntegration_EndToEnd" test/integration/integration_test.go

</function_calls>

Let me read the relevant sections of the integration test file:
<function_calls>

sed -n '1,100p' test/integration/integration_test.go

</function_calls>

Let me search for the TestIntegration_EndToEnd test:
<function_calls>

grep -n "func TestIntegration" test/integration/integration_test.go

</function_calls>

Let me look at the full file structure to understand what tests exist:
<function_calls>

cat test/integration/integration_test.go | head -300

</function_calls>

Let me try a different approach to locate the EndToEnd test:
<function_calls>

rg -A 50 "func TestIntegration_EndToEnd" test/integration/integration_test.go

</function_calls>

Let me check if there are multiple integration test files:
<function_calls>

find test -name "*test.go" -type f

</function_calls>

Let me view the entire integration_test.go file in sections:
<function_calls>

tail -n +1 test/integration/integration_test.go | head -n 400

</function_calls>

Let me search for the pattern with select and time.After that the review mentions:
<function_calls>

rg -n "select {" test/integration/integration_test.go -A 5

</function_calls>

Let me check if EndToEnd test exists at all:
<function_calls>

rg "EndToEnd" test/

</function_calls>

Let me view the entire integration_test.go file to see its structure:
<function_calls>

cat test/integration/integration_test.go

</function_calls>

Let me search for all test functions in the repo more broadly:
<function_calls>

find . -name "*test.go" -exec grep -l "TestIntegration" {} ;

</function_calls>

Now let me check the sentinel package's Start method to see how it handles context:
<function_calls>

fd -e go -p "sentinel" | grep -v test

</function_calls>

Let me search for the sentinel package more broadly:
<function_calls>

find . -type f -name "*.go" | xargs grep -l "func NewSentinel" | head -5

</function_calls>

Let me search for where Sentinel is defined:
<function_calls>

rg -n "type Sentinel struct" --type go

</function_calls>

Let me search for the Start method:
<function_calls>

rg -n "func.*Start.*ctx" --type go

</function_calls>

Let me look at the directory structure to understand the repository layout:
<function_calls>

ls -la

</function_calls>

Now let me check the main source code:
<function_calls>

find . -name "*.go" -type f | grep -v test | head -20

</function_calls>

Let me check for Go modules and get a better picture:
<function_calls>

cat go.mod 2>/dev/null | head -20

</function_calls>

Let me find sentinel-related source files:
<function_calls>

rg "package sentinel" --type go

</function_calls>

Now let me examine the sentinel package files:
<function_calls>

ls -la sentinel/

</function_calls>

Let me look at the Start method in the sentinel package:
<function_calls>

rg -A 30 "func.*(s *Sentinel).*Start" sentinel/

</function_calls>

Let me read the sentinel.go file:
<function_calls>

cat sentinel/sentinel.go

</function_calls>

Good! Now let me check if there are other test files that might have the pattern mentioned:
<function_calls>

rg -B 5 -A 5 "time.After.*case" test/ --type go

</function_calls>

Let me look at the exact lines mentioned in the review (175-268) to confirm the code snippet:
<function_calls>

sed -n '175,268p' test/integration/integration_test.go

</function_calls>

Perfect! Now let me check if TestIntegration_EndToEnd actually exists and uses the select pattern:
<function_calls>

grep -n "func TestIntegration" test/integration/integration_test.go

</function_calls>

Let me look at the entire test file to see what tests exist:
<function_calls>

wc -l test/integration/integration_test.go && grep "^func Test" test/integration/integration_test.go

</function_calls>

Let me view the full integration test file:
<function_calls>

cat test/integration/integration_test.go | tail -200

</function_calls>

Let me search the entire repository for tests using the select + time.After pattern:
<function_calls>

rg -l "select" test/ --type go | xargs grep -l "time.After" | head -10

</function_calls>

Let me search more specifically for the pattern in question:
<function_calls>

rg -B 2 -A 2 "case.*time.After" test/ --type go

</function_calls>

Let me check if there are any other integration test patterns elsewhere:
<function_calls>

find test -name "*.go" -exec grep -l "errChan" {} ;

</function_calls>

Let me examine what tests actually exist in the test directory:
<function_calls>

find test -name "*.go" -exec grep "^func Test" {} + 2>/dev/null

</function_calls>

Now let me check if there are any references to best practices for testing with context and error channels in Go testing literature:
<function_calls>

rg -n "time.Sleep.*(300" test/integration/integration_test.go

</function_calls>

Let me see the exact pattern around the error channel in the test:
<function_calls>

sed -n '240,268p' test/integration/integration_test.go

</function_calls>

Perfect! Now let me verify the Start method one more time to ensure it properly respects context:
<function_calls>

sed -n '1,100p' sentinel/sentinel.go

</function_calls>

Let me look at the full Start implementation:
<function_calls>

grep -n "func.*Start" sentinel/sentinel.go

</function_calls>

Let me read the Start method:
<function_calls>

sed -n '27,100p' sentinel/sentinel.go

</function_calls>

Let me search for where the Start method truly ends:
<function_calls>

ast-grep --pattern 'func ($_ *Sentinel) Start($_ context.Context) $_ {
$$$
}'

</function_calls>

Let me just view the entire sentinel.go file:
<function_calls>

cat sentinel/sentinel.go

</function_calls>

Let me do one final check to see if maybe the reviewer was looking at a different branch or version:
<function_calls>

git log --oneline test/integration/integration_test.go 2>/dev/null | head -5

</function_calls>

Let me check the git history to see if TestIntegration_EndToEnd was ever in this file:
<function_calls>

git log -S "TestIntegration_EndToEnd" --oneline -- test/integration/integration_test.go 2>/dev/null

</function_calls>

Let me verify if there are any uncommitted changes or branches:
<function_calls>

git status 2>/dev/null | head -20

</function_calls>

Let me do a comprehensive search for any mention of EndToEnd in test files:
<function_calls>

rg -r "EndToEnd" test/

</function_calls>

Let me check the current branch and recent changes:
<function_calls>

git branch -v 2>/dev/null && git log --oneline -3 2>/dev/null

</function_calls>

Based on my thorough verification of the codebase, I can now provide my assessment:

The review comment references a non-existent test (TestIntegration_EndToEnd) as justification for the suggested pattern.

Key findings:

  1. TestIntegration_EndToEnd does not exist - The only integration test in test/integration/integration_test.go is TestIntegration_LabelSelectorFiltering
  2. No other tests use the suggested pattern - No other test files in the repository use a select statement with time.After for timeout protection on error channels
  3. Sentinel.Start() properly handles context - The implementation includes proper context cancellation handling via select on ctx.Done(), so the test would only hang in case of a serious regression
  4. The suggestion is technically valid but weakly motivated - While adding a timeout guard is reasonable defensive programming, the justification is based on a false premise

Likely an incorrect or invalid review comment.

Following rh-trex pattern, add testcontainers infrastructure for more realistic integration testing with real message brokers.

Changes:
- Add testcontainers-go and rabbitmq module dependencies
- Create RabbitMQTestContainer helper for managing test brokers
- Add new end-to-end integration test with real RabbitMQ
- Keep existing mock-based tests for fast unit testing (hybrid approach)
- Configure TESTCONTAINERS_RYUK_DISABLED in Makefile for Podman compatibility
- Add comprehensive documentation in docs/testcontainers.md
- Update README with testing instructions

Benefits:
- Tests against real broker behavior (catches integration bugs)
- Validates CloudEvents serialization and broker compatibility
- Follows rh-trex testcontainers pattern
- Maintains fast mock-based tests for CI/PR validation

Testing:
All tests passing with `make test-all`
Integration test spins up RabbitMQ in ~7 seconds
Since TESTCONTAINERS_RYUK_DISABLED is now configured in the Makefile,
users don't need to manually set the environment variable.

Updated instructions to be cleaner and more user-friendly.
Integration tests should test with real components, not mocks.
Mocks remain in unit tests for fast isolated testing.

Changes:
- Convert TestIntegration_EndToEnd to use RabbitMQ testcontainer
- Convert TestIntegration_LabelSelectorFiltering to use testcontainer
- Remove MockPublisher from integration tests
- Remove duplicate TestIntegration_EndToEnd_RealBroker test

Benefits:
- True integration testing with real message broker
- Clearer separation: unit tests = mocks, integration = real components
- Detects real-world integration issues
- Follows testing best practices

All tests passing with `make test-all`
Integration tests should have centralized setup/teardown via TestMain,
following the same pattern as rh-trex.

Changes:
- Add TestMain function with glog setup
- Import flag, os, runtime, glog
- Log Go version and exit code

This addresses the critical issue from PR review where integration tests
lacked centralized test infrastructure.
Problem:
Integration tests were logging errors when stopping containers:
'Error terminating testcontainer: context canceled'

Root Cause:
The Close() method was using the test context which was already
canceled when defer executed. This caused container.Terminate(ctx)
to fail with 'context canceled' error.

Solution:
Use context.Background() with 30s timeout for cleanup instead of
the test context. Cleanup operations need their own context that
won't be canceled when the test finishes.

Changes:
- Create cleanupCtx from context.Background() with 30s timeout
- Use cleanupCtx for container.Terminate() instead of test ctx
- Add comment explaining why background context is needed

Result:
Containers now stop cleanly without errors:
'✅ Container stopped' instead of 'Error terminating'

All tests still pass.
Implement singleton pattern for integration tests using shared RabbitMQ
testcontainer across all tests, following the established rh-trex pattern.

Changes:
- Add IntegrationHelper singleton with sync.Once in testcontainer.go
- Update TestMain to initialize and teardown shared helper
- Update integration tests to use helper.RabbitMQ instead of per-test containers
- Remove legacy `// +build integration` syntax (use modern `//go:build`)
- Fix lint: remove redundant nil check for map len()

Benefits:
- Faster test execution (1 container vs N containers)
- Follows rh-trex established pattern
- Cleaner test code
- Modern Go 1.17+ build tag syntax
@rafabene rafabene marked this pull request as ready for review November 27, 2025 13:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
docs/testcontainers.md (1)

27-29: Add language specifier to fenced code block.

The static analysis tool flagged that fenced code blocks should have a language specified. For error message output, use text or plaintext.

-```
+```text
 Failed to start RabbitMQ testcontainer: create container: container create: Error response from daemon: container create: unable to find network with name or ID bridge: network not found: creating reaper failed
-```
+```
test/integration/integration_test.go (1)

110-113: Consider checking json.Encode errors for test reliability.

While ignoring the error is common in test HTTP handlers, checking it ensures the test fails with a clear message if encoding fails unexpectedly.

 			response := createMockClusterList(clusters)
 			w.Header().Set("Content-Type", "application/json")
-			json.NewEncoder(w).Encode(response)
+			if err := json.NewEncoder(w).Encode(response); err != nil {
+				http.Error(w, err.Error(), http.StatusInternalServerError)
+			}
 			return
 		}

 		// Subsequent calls: Empty list
 		response := createMockClusterList([]map[string]interface{}{})
 		w.Header().Set("Content-Type", "application/json")
-		json.NewEncoder(w).Encode(response)
+		if err := json.NewEncoder(w).Encode(response); err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+		}

Also applies to: 117-119

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9f703f and c6b775c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • Makefile (1 hunks)
  • README.md (1 hunks)
  • docs/testcontainers.md (1 hunks)
  • go.mod (4 hunks)
  • test/integration/integration_test.go (1 hunks)
  • test/integration/testcontainer.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test/integration/integration_test.go (7)
test/integration/testcontainer.go (1)
  • NewHelper (122-141)
internal/client/client.go (2)
  • NewHyperFleetClient (49-64)
  • ResourceType (33-33)
internal/engine/decision.go (1)
  • NewDecisionEngine (31-36)
pkg/logger/logger.go (1)
  • NewHyperFleetLogger (35-43)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (154-179)
internal/config/config.go (2)
  • SentinelConfig (23-31)
  • LabelSelector (14-17)
internal/sentinel/sentinel.go (1)
  • NewSentinel (29-45)
🪛 markdownlint-cli2 (0.18.1)
docs/testcontainers.md

27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (10)
Makefile (1)

89-89: LGTM!

The TESTCONTAINERS_RYUK_DISABLED=true environment variable setting aligns with the testcontainers documentation and ensures Podman compatibility. The inline environment variable approach is idiomatic for Makefiles.

Also applies to: 96-96

README.md (1)

53-77: LGTM!

The testing documentation clearly explains the hybrid testing approach and provides straightforward commands for running different test suites. The reference to docs/testcontainers.md for troubleshooting is helpful.

docs/testcontainers.md (1)

1-152: Well-structured documentation.

The documentation provides comprehensive guidance for testcontainers usage, including common Podman issues, solutions, and troubleshooting steps. The CI/CD considerations section is particularly helpful.

go.mod (1)

80-83: LGTM!

The added indirect dependencies for Prometheus metrics (prometheus/client_golang v1.23.2) and testcontainers (testcontainers-go v0.40.0) align with the PR objectives. These are pulled in as transitive dependencies from direct dependencies.

Also applies to: 95-96

test/integration/integration_test.go (3)

26-42: LGTM!

The TestMain properly handles:

  • Flag parsing for glog
  • Shared helper initialization (creates RabbitMQ container once)
  • Centralized cleanup via helper.Teardown()
  • Appropriate exit code propagation

44-87: LGTM!

The helper functions provide clean abstractions for creating mock API responses. The delegation pattern in createMockCluster avoids code duplication, and the mock data structure appears to match the expected API format.


166-267: Label selector test properly validates filtering behavior.

This test addresses the previous review feedback by:

  • Creating clusters with different shard labels
  • Implementing server-side filtering based on search parameter
  • Capturing and validating Start errors via channel pattern

The test now provides meaningful validation of label selector functionality.

test/integration/testcontainer.go (3)

26-71: LGTM - Robust container initialization.

The implementation correctly:

  • Uses appropriate wait strategy with 60s timeout
  • Terminates container on subsequent failures (lines 46, 61)
  • Wraps errors with context for debugging

78-109: LGTM - Proper cleanup implementation.

Using a background context with timeout for cleanup is the correct pattern since the test context may already be canceled. The error collection approach ensures all cleanup steps are attempted even if one fails.


121-151: LGTM - Appropriate singleton pattern for shared test infrastructure.

Using sync.Once ensures the RabbitMQ container is started exactly once across all integration tests, which is efficient. The glog.Fatalf on initialization failure is acceptable here since tests cannot proceed without the shared infrastructure.

Specify 'text' language for error message code block for better
syntax highlighting and rendering in markdown viewers.
Addresses rh-amarin's comment: verify that Sentinel actually
ran at least 2 polling cycles instead of just assuming the
sleep duration was sufficient.

This ensures the integration test is actually testing the
polling behavior, not just waiting for time to pass.
@openshift-ci
Copy link

openshift-ci bot commented Nov 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rafabene, rh-amarin

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rh-amarin rh-amarin merged commit 713d1e5 into openshift-hyperfleet:main Nov 27, 2025
0 of 2 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
test/integration/integration_test.go (1)

171-271: Label‑selector integration test still doesn’t assert selector behavior and can hang on errChan

Despite the improved mock server, this test currently only checks that Start returns nil or context.Canceled; it does not assert that label selectors actually influence behavior (e.g., via a search query param or by distinguishing matching vs non‑matching clusters). This is effectively the same gap as a prior review comment on this test. In addition, startErr := <-errChan has no timeout, so the test can hang indefinitely if Start never returns.

Consider:

  1. Asserting that the selector is exercised — for example, by tracking whether at least one request used search=shard=1:
 func TestIntegration_LabelSelectorFiltering(t *testing.T) {
@@
-	// Create mock server that returns 2 clusters: one with shard:1, one with shard:2
-	// Server implements server-side filtering based on search parameter
+	// Create mock server that returns 2 clusters: one with shard:1, one with shard:2.
+	// Track whether sentinel actually uses the expected search parameter.
+	var selectorUsed uint32
@@
-		// Server-side filtering: check for search parameter
-		searchParam := r.URL.Query().Get("search")
+		// Server-side filtering: check for search parameter
+		searchParam := r.URL.Query().Get("search")
+		if searchParam == "shard=1" {
+			atomic.StoreUint32(&selectorUsed, 1)
+		}
@@
-	// Check Start error
-	startErr := <-errChan
-	if startErr != nil && startErr != context.Canceled {
-		t.Errorf("Expected Start to return nil or context.Canceled, got: %v", startErr)
-	}
+	// Check Start error with timeout to avoid hangs
+	select {
+	case startErr := <-errChan:
+		if startErr != nil && startErr != context.Canceled {
+			t.Errorf("Expected Start to return nil or context.Canceled, got: %v", startErr)
+		}
+	case <-time.After(2 * time.Second):
+		t.Fatal("Sentinel did not stop within timeout")
+	}
+
+	// Ensure label selector was actually used at least once
+	if atomic.LoadUint32(&selectorUsed) == 0 {
+		t.Errorf("Expected at least one request with search=shard=1 to exercise label selector filtering")
+	}

This turns the test into a real guard against regressions in label‑selector wiring and avoids a potential deadlock on errChan.

🧹 Nitpick comments (1)
test/integration/integration_test.go (1)

128-131: Clarify or simplify metrics registry usage in integration tests

Both tests construct a fresh Prometheus registry and pass it to metrics.NewSentinelMetrics, but the registry is never used to gather or expose metrics in these tests. If you only need to ensure metrics can be initialized, consider either:

  • Calling metrics.NewSentinelMetrics(nil) and dropping the custom registry, or
  • Actually using registry (e.g., via testutil.CollectAndCount or a custom /metrics handler) to assert on metrics.

For example, if you don’t need a custom registry here:

-	// Create metrics with a test registry
-	registry := prometheus.NewRegistry()
-	metrics.NewSentinelMetrics(registry)
+	// Initialize metrics (uses default registerer)
+	metrics.NewSentinelMetrics(nil)

(and similarly in the label‑selector test). This makes the intent clearer and avoids a registry that’s effectively unused in this file.

Also applies to: 237-240

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c31a7e5 and 77cb7bf.

📒 Files selected for processing (1)
  • test/integration/integration_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test/integration/integration_test.go (7)
test/integration/testcontainer.go (1)
  • NewHelper (122-141)
internal/client/client.go (2)
  • NewHyperFleetClient (49-64)
  • ResourceType (33-33)
internal/engine/decision.go (1)
  • NewDecisionEngine (31-36)
pkg/logger/logger.go (1)
  • NewHyperFleetLogger (35-43)
internal/metrics/metrics.go (1)
  • NewSentinelMetrics (154-179)
internal/config/config.go (2)
  • SentinelConfig (23-31)
  • LabelSelector (14-17)
internal/sentinel/sentinel.go (1)
  • NewSentinel (29-45)
🔇 Additional comments (2)
test/integration/integration_test.go (2)

26-42: TestMain and mock cluster helpers look good

Centralized TestMain + shared RabbitMQ helper and the mock cluster builders are clear, deterministic, and side‑effect scoped to tests; no changes needed here.

Also applies to: 44-87


5-24: Based on my verification through Go best practices research, I can now provide the rewritten review comment.

Fix data race on callCount and make polling check less flaky

The variable callCount is incremented in the HTTP handler goroutine and read in the test goroutine without synchronization—a data race that will be reported by go test -race. Additionally, relying on a fixed time.Sleep(300 * time.Millisecond) for PollInterval: 100 * time.Millisecond can be flaky if startup or scheduling is slow.

Use sync/atomic for the counter and replace the fixed sleep with deadline-based polling:

 import (
 	"context"
 	"encoding/json"
 	"flag"
 	"net/http"
 	"net/http/httptest"
 	"os"
 	"runtime"
 	"testing"
 	"time"
+	"sync/atomic"
 )

@@
 func TestIntegration_EndToEnd(t *testing.T) {
@@
-	// Create mock HyperFleet API server
-	callCount := 0
+	// Create mock HyperFleet API server
+	var callCount int32
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-		callCount++
+		atomic.AddInt32(&callCount, 1)
@@
-	// Wait for at least 2 polling cycles
-	time.Sleep(300 * time.Millisecond)
-	cancel()
-
-	// Verify that Sentinel actually polled the API at least twice
-	if callCount < 2 {
-		t.Errorf("Expected at least 2 polling cycles, got %d", callCount)
-	}
+	// Wait until we see at least 2 polling cycles or time out
+	deadline := time.Now().Add(2 * time.Second)
+	for {
+		if atomic.LoadInt32(&callCount) >= 2 {
+			break
+		}
+		if time.Now().After(deadline) {
+			t.Fatalf("Timed out waiting for at least 2 polling cycles, got %d", atomic.LoadInt32(&callCount))
+		}
+		time.Sleep(10 * time.Millisecond)
+	}
+	cancel()

This removes the race condition and makes the assertion robust against timing variance.

xueli181114 pushed a commit to xueli181114/hyperfleet-sentinel that referenced this pull request Dec 4, 2025
Addresses PR review comment about using Helm charts for deployments.

Changes:
- Created complete Helm chart in deployments/helm/sentinel/
- Chart.yaml with metadata
- values.yaml with full configuration options
- Templates for all Kubernetes resources
- Comprehensive README with examples
- Updated main README to recommend Helm deployment

Features:
- Support for RabbitMQ, GCP Pub/Sub, AWS SQS
- Conditional secret creation (existingSecret support)
- Dynamic config generation from values
- Config/secret checksums for rolling updates
- ServiceAccount annotations for Workload Identity/IRSA
- Horizontal sharding examples
- Security best practices documentation

Resolves: PR comment openshift-hyperfleet#14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants