Skip to content

ibmcloud: add preflight quota validation for VPC IPI installations#10589

Open
asadawar wants to merge 1 commit into
openshift:mainfrom
asadawar:ibmcloud-preflight-quota
Open

ibmcloud: add preflight quota validation for VPC IPI installations#10589
asadawar wants to merge 1 commit into
openshift:mainfrom
asadawar:ibmcloud-preflight-quota

Conversation

@asadawar
Copy link
Copy Markdown

@asadawar asadawar commented Jun 2, 2026

Summary

  • Implement PlatformQuotaCheck for IBM Cloud VPC following the existing AWS/GCP pattern
  • Check floating IP, security group, load balancer, and instance counts against known default limits before creating any infrastructure
  • Add paginated list functions (ListFloatingIPs, ListSecurityGroups, ListLoadBalancers, ListInstances) to the IBM Cloud client
  • Security group count is scoped to the target VPC when deploying into an existing VPC (the limit is per-VPC, not per-region)

Why this approach

IBM Cloud VPC IPI installations currently have no preflight quota validation. The installer creates infrastructure (instances, load balancers, security groups, COS bucket, RHCOS image) over 15-20 minutes before discovering resource limits. When a limit is exceeded, the install fails late with orphaned resources that require openshift-install destroy cluster to clean up.

The PlatformQuotaCheck asset already has implementations for AWS (pkg/quota/aws), GCP (pkg/quota/gcp), and OpenStack. IBM Cloud was grouped under "no special provisioning requirements" at line 156 of pkg/asset/quota/quota.go. This change follows the same Constraints/Load/Check pattern.

IBM Cloud does not have a Service Quotas API like AWS. Usage is determined by counting existing resources via the VPC API and comparing against known default limits. The limits are hardcoded with a comment noting they may vary by account. API failures during quota loading log a warning and skip the check rather than blocking the install.

Floating IP count accounts for publish mode: External (default) needs an additional floating IP for the public API load balancer.

Cluster verification

During OCP 4.22 rc5 IPI testing on IBM Cloud VPC:

Attempt 1 (without this change): Install ran for ~20 minutes creating 4 instances, 2 load balancers, 6 security groups, and uploading the RHCOS image before failing at floating IP assignment (40/40 quota). All resources orphaned.

Attempt 3 (without this change): GPU worker (gx3d-160x1792x8h200) failed with cannot_start_capacity. Cluster installed but timed out because no workers joined for ingress.

Both would be caught in <30 seconds with this preflight check.

Test plan

  • go build ./pkg/quota/ibmcloud/... ./pkg/asset/quota/ibmcloud/... ./pkg/asset/quota/... compiles
  • go vet passes on all modified packages
  • gofmt reports no formatting issues
  • go test ./pkg/quota/ibmcloud/... passes (table-driven tests for constraint aggregation)
  • Pagination verified: all list functions use Pager.GetAllWithContext() matching existing codebase pattern
  • Security group count scoped to target VPC when VPCName is set in install-config

RFE: https://issues.redhat.com/browse/RFE-9374

Implement PlatformQuotaCheck for IBM Cloud VPC following the existing
AWS/GCP pattern. Checks floating IP, security group, load balancer,
and instance counts against known default limits before creating any
infrastructure.

During 4.22 rc5 testing, an IBM Cloud IPI install failed 20 minutes
in when the floating IP quota (40/40) was exhausted. The installer
had already created instances, load balancers, security groups, and
uploaded the RHCOS image before discovering the limit. All resources
were orphaned and required manual cleanup.

With this change, the installer validates resource availability
before provisioning and fails fast with a clear quota error.

Assisted-by: Claude Code
RFE-9374
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Warning

Review limit reached

@asadawar, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 8 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f1868876-3625-4c6a-8945-f6c56f3d32ef

📥 Commits

Reviewing files that changed from the base of the PR and between e871a5d and c065c5b.

📒 Files selected for processing (7)
  • pkg/asset/installconfig/ibmcloud/client.go
  • pkg/asset/quota/ibmcloud/OWNERS
  • pkg/asset/quota/ibmcloud/ibmcloud.go
  • pkg/asset/quota/quota.go
  • pkg/quota/ibmcloud/OWNERS
  • pkg/quota/ibmcloud/ibmcloud.go
  • pkg/quota/ibmcloud/ibmcloud_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from rvanderp3 and rwsu June 2, 2026 09:57
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tthvo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 2, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 2, 2026

Hi @asadawar. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant