Skip to content

OCPBUGS-77944: ipi-conf-gcp: filter zones by machine type availability#75839

Open
stbenjam wants to merge 1 commit intoopenshift:mainfrom
stbenjam:fix-gcp-arm64-zone-selection
Open

OCPBUGS-77944: ipi-conf-gcp: filter zones by machine type availability#75839
stbenjam wants to merge 1 commit intoopenshift:mainfrom
stbenjam:fix-gcp-arm64-zone-selection

Conversation

@stbenjam
Copy link
Member

@stbenjam stbenjam commented Mar 6, 2026

Summary

  • Fix GCP zone selection to filter by machine type availability, not just AI zone exclusion
  • Prevents ARM64 jobs from selecting zones where t2a-standard-4 is unavailable (e.g. us-central1-c)
  • Select zones independently for control plane and compute nodes to support heterogeneous clusters

Problem

The get_zones_from_region function in ipi-conf-gcp-commands.sh only filtered out AI zones but did not check whether the requested machine type was actually available in each zone. This caused ARM64 CI jobs to fail with:

controlPlane.platform.gcp.type: Invalid value: "t2a-standard-4":
  instance type not available in zones: [us-central1-c]

Affected job: periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn
Pass rate dropped from 100% to 40%.

Fix

Replace get_zones_from_region with get_zones_for_machine_type which queries gcloud compute machine-types list to find zones where the specific machine type is available, with a fallback to the previous behavior if the query returns no results.

The separate ipi-conf-gcp-zones-commands.sh already had a similar get_zones_by_machine_type function — this brings the same logic to the main ipi-conf-gcp chain.

Bug

https://issues.redhat.com/browse/OCPBUGS-77944

🤖 Generated with Claude Code

The zone selection for GCP control plane and compute nodes in
us-central1 and us-south1 regions only filtered out AI zones but
did not check whether the requested machine type was actually
available in each zone. This caused ARM64 jobs using t2a-standard-4
to fail when us-central1-c was selected, since that zone does not
offer t2a instances.

Replace get_zones_from_region with get_zones_for_machine_type which
queries gcloud compute machine-types list to find zones where the
specific machine type is available, with a fallback to the previous
behavior if the query returns no results.

Also select zones independently for control plane and compute nodes,
since heterogeneous clusters may use different machine types that
are available in different zones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Mar 6, 2026
@openshift-ci-robot
Copy link
Contributor

@stbenjam: This pull request references Jira Issue OCPBUGS-77944, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @droslean

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Fix GCP zone selection to filter by machine type availability, not just AI zone exclusion
  • Prevents ARM64 jobs from selecting zones where t2a-standard-4 is unavailable (e.g. us-central1-c)
  • Select zones independently for control plane and compute nodes to support heterogeneous clusters

Problem

The get_zones_from_region function in ipi-conf-gcp-commands.sh only filtered out AI zones but did not check whether the requested machine type was actually available in each zone. This caused ARM64 CI jobs to fail with:

controlPlane.platform.gcp.type: Invalid value: "t2a-standard-4":
 instance type not available in zones: [us-central1-c]

Affected job: periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn
Pass rate dropped from 100% to 40%.

Fix

Replace get_zones_from_region with get_zones_for_machine_type which queries gcloud compute machine-types list to find zones where the specific machine type is available, with a fallback to the previous behavior if the query returns no results.

The separate ipi-conf-gcp-zones-commands.sh already had a similar get_zones_by_machine_type function — this brings the same logic to the main ipi-conf-gcp chain.

Bug

https://issues.redhat.com/browse/OCPBUGS-77944

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from droslean March 6, 2026 19:51
@stbenjam
Copy link
Member Author

stbenjam commented Mar 6, 2026

/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn

@openshift-ci-robot
Copy link
Contributor

@stbenjam: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci bot requested review from smg247 and xueqzhan March 6, 2026 19:52
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2026
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@stbenjam: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-opendatahub-io-modelmesh-serving-release-v0.11.0-alpha-fvt opendatahub-io/modelmesh-serving presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.7-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.6-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.5-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.4-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.3-e2e-gcp operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.7-e2e-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.6-e2e-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.5-e2e-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.4-e2e-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-operator-framework-operator-lifecycle-manager-release-4.3-e2e-upgrade operator-framework/operator-lifecycle-manager presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.7-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.6-hco-e2e-image-index-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.6-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.9-hco-e2e-image-index-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.9-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.8-hco-e2e-image-index-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.8-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-main-hco-e2e-operator-sdk-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-main-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.16-hco-e2e-operator-sdk-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.16-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.10-hco-e2e-operator-sdk-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.10-hco-e2e-kv-smoke-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed
pull-ci-kubevirt-hyperconverged-cluster-operator-release-1.15-hco-e2e-operator-sdk-gcp kubevirt/hyperconverged-cluster-operator presubmit Registry content changed

A total of 5630 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci-robot
Copy link
Contributor

@stbenjam: job(s): periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn either don't exist or were not found to be affected, and cannot be rehearsed

@stbenjam
Copy link
Member Author

stbenjam commented Mar 6, 2026

/pj-rehearse periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn

@openshift-ci-robot
Copy link
Contributor

@stbenjam: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@stbenjam
Copy link
Member Author

stbenjam commented Mar 6, 2026

/pj-rehearse periodic-ci-openshift-release-main-ci-4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade

@openshift-ci-robot
Copy link
Contributor

@stbenjam: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@stbenjam: job(s): periodic-ci-openshift-multiarch-main-nightly-4.22-upgrade-from-stable-4.21-ocp-e2e-upgrade-gcp-ovn either don't exist or were not found to be affected, and cannot be rehearsed

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 7, 2026

@stbenjam: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants