Skip to content

DNM: Test PR for ClusterBot, controller: cap BareMetalHost reconcile backoff at 30 seconds#478

Open
MahnoorAsghar wants to merge 1 commit into
openshift:mainfrom
MahnoorAsghar:rate-limit-openshift
Open

DNM: Test PR for ClusterBot, controller: cap BareMetalHost reconcile backoff at 30 seconds#478
MahnoorAsghar wants to merge 1 commit into
openshift:mainfrom
MahnoorAsghar:rate-limit-openshift

Conversation

@MahnoorAsghar
Copy link
Copy Markdown

@MahnoorAsghar MahnoorAsghar commented Apr 16, 2026

The default controller-runtime exponential rate limiter has a maximum delay of 1000 seconds (~16 minutes). During BMO startup there is a brief race between the reconcile loop becoming active and BMO's own validating-webhook Service endpoint being propagated: any BareMetalHost reconciliation that triggers a webhook call in that window gets a "no endpoints available" error. A burst of such errors is enough to drive the per-item exponential backoff to its ceiling, after which BMO silently waits up to 16 minutes before attempting the next reconcile even though the webhook has long since become reachable.

Replace the default rate limiter with one that is otherwise identical but caps the per-item exponential delay at 30 seconds. This bounds the recovery window to at most one retry interval after the endpoint is propagated, matching the behaviour of other Metal3 controllers.

Generated-by: Cursor, claude-4.6-sonnet-medium model

Summary by CodeRabbit

  • Chores

    • Updated project dependencies to include rate limiting packages.
  • New Features

    • Added configurable rate limiting to the bare metal host controller for managing concurrent request processing.

@MahnoorAsghar
Copy link
Copy Markdown
Author

/test e2e-metal-ipi-bm

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@MahnoorAsghar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 35 minutes and 37 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 35 minutes and 37 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0c31f338-12ae-46c2-99f0-7d93d939bf37

📥 Commits

Reviewing files that changed from the base of the PR and between 141503a and 1ea1fb3.

📒 Files selected for processing (2)
  • go.mod
  • internal/controller/metal3.io/baremetalhost_controller.go

Walkthrough

Added golang.org/x/time v0.9.0 and golang.org/x/tools v0.41.0 as dependencies in go.mod. Implemented a custom rate limiter in BareMetalHostReconciler combining exponential backoff and token-bucket algorithms with configurable parameters.

Changes

Cohort / File(s) Summary
Dependency Management
go.mod
Added golang.org/x/time v0.9.0 (direct) and golang.org/x/tools v0.41.0 (indirect) as module requirements.
Rate Limiter Implementation
internal/controller/metal3.io/baremetalhost_controller.go
Introduced custom RateLimiter in controller setup combining exponential failure backoff and token-bucket algorithms. Added rate limiter configuration constants, required package imports (golang.org/x/time/rate, workqueue utilities), and integrated RateLimiter into controller options alongside MaxConcurrentReconciles.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main technical change: capping BareMetalHost reconcile backoff at 30 seconds via a custom rate limiter.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed The custom check for stable test names is not applicable as the PR modifies only dependencies and controller implementation code with no test code present.
Test Structure And Quality ✅ Passed The custom check reviews Ginkgo test code quality in modified test files. No test files were modified in this PR, only go.mod and baremetalhost_controller.go.
Microshift Test Compatibility ✅ Passed PR modifies only go.mod dependencies and controller rate limiter configuration. No new Ginkgo e2e tests are added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR modifies only go.mod and baremetalhost_controller.go with production code implementing rate limiting, containing no Ginkgo e2e test patterns. Pre-existing e2e test files are unmodified.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies BareMetalHost controller rate limiter without introducing pod scheduling constraints, affinity rules, node selectors, or cluster topology assumptions.
Ote Binary Stdout Contract ✅ Passed PR adds rate limiter config constants and modifies SetupWithManager method. No process-level stdout emissions found; all logging uses structured logr logger.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This pull request does not add any new Ginkgo e2e tests. The changes are limited to updating Go module dependencies and adding rate limiter configuration to the BareMetalHost controller.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MahnoorAsghar
Once this PR has been reviewed and has the lgtm label, please assign zaneb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller.go (1)

64-72: LGTM! The constants are well-documented and the values are appropriate.

Consider whether these rate limiter parameters should be exposed as command-line flags or configuration options for operational flexibility, similar to how maxConcurrentReconcile is passed in. This would allow operators to tune the behavior without code changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/metal3.io/baremetalhost_controller.go` around lines 64 -
72, Replace the hard-coded rate limiter constants (rateLimiterBaseDelay,
rateLimiterMaxDelay, rateLimiterBursts, rateLimiterRequestsPerSecond) with
configurable options exposed as command-line flags or config fields similar to
how maxConcurrentReconcile is passed in: add flags for base delay, max delay,
bursts, and requests-per-second in the controller setup, parse and validate
them, surface them on the reconciler/controller struct (or options struct) and
use those fields when constructing the rate limiter instead of the constants
(look for where the rate limiter is created to wire the new fields in). Ensure
sensible default values equal to the current constants and validate ranges
(e.g., non-negative durations, positive burst and RPS).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 18: The go.mod contains a duplicate entry for the module
golang.org/x/time (present as both a direct dependency and an indirect entry);
remove the redundant indirect listing by running `go mod tidy` to reconcile
dependencies and ensure only the direct golang.org/x/time v0.9.0 entry remains,
or manually delete the duplicate golang.org/x/time line and then run `go mod
tidy` to update go.sum and module metadata.

---

Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller.go`:
- Around line 64-72: Replace the hard-coded rate limiter constants
(rateLimiterBaseDelay, rateLimiterMaxDelay, rateLimiterBursts,
rateLimiterRequestsPerSecond) with configurable options exposed as command-line
flags or config fields similar to how maxConcurrentReconcile is passed in: add
flags for base delay, max delay, bursts, and requests-per-second in the
controller setup, parse and validate them, surface them on the
reconciler/controller struct (or options struct) and use those fields when
constructing the rate limiter instead of the constants (look for where the rate
limiter is created to wire the new fields in). Ensure sensible default values
equal to the current constants and validate ranges (e.g., non-negative
durations, positive burst and RPS).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: fe838af9-b1d6-429e-ba1c-8100bc5d4be4

📥 Commits

Reviewing files that changed from the base of the PR and between c5e5048 and 141503a.

📒 Files selected for processing (2)
  • go.mod
  • internal/controller/metal3.io/baremetalhost_controller.go

Comment thread go.mod
The default controller-runtime exponential rate limiter has a maximum
delay of 1000 seconds (~16 minutes).  During BMO startup there is a
brief race between the reconcile loop becoming active and BMO's own
validating-webhook Service endpoint being propagated: any BareMetalHost
reconciliation that triggers a webhook call in that window gets a
"no endpoints available" error.  A burst of such errors is enough to
drive the per-item exponential backoff to its ceiling, after which BMO
silently waits up to 16 minutes before attempting the next reconcile
even though the webhook has long since become reachable.

Replace the default rate limiter with one that is otherwise identical
but caps the per-item exponential delay at 30 seconds.  This bounds
the recovery window to at most one retry interval after the endpoint
is propagated, matching the behaviour of other Metal3 controllers.

Generated-by: Cursor, claude-4.6-sonnet-medium model
Signed-off-by: MahnoorAsghar <masghar@redhat.com>
@MahnoorAsghar
Copy link
Copy Markdown
Author

/test e2e-metal-ipi-bm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 16, 2026

@MahnoorAsghar: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant