Skip to content

OCPBUGS-86418: sync CPMS template from master machine providerSpecs to prevent unintended rolling update#10569

Open
chdeshpa-hue wants to merge 1 commit into
openshift:release-4.22from
chdeshpa-hue:fix/OCPBUGS-86418-cpms-template-sync
Open

OCPBUGS-86418: sync CPMS template from master machine providerSpecs to prevent unintended rolling update#10569
chdeshpa-hue wants to merge 1 commit into
openshift:release-4.22from
chdeshpa-hue:fix/OCPBUGS-86418-cpms-template-sync

Conversation

@chdeshpa-hue
Copy link
Copy Markdown

Summary

When users customize master machine manifests after openshift-install create manifests
(e.g. changing instanceType from m6i.xlarge to m6i.4xlarge) but do not also edit the
ControlPlaneMachineSet (CPMS) manifest, the CPMS template retains defaults. Post-install,
the CPMS operator detects drift between its template and the actual Machine providerSpecs,
triggering an unintended rolling update that silently reverts the user's customization.

This PR adds a MasterCPMSSync asset that runs after Master asset generation. It compares
provisioning-relevant AWS fields between master machines and the CPMS template, syncing any
drift and emitting a warning.

Fields synced: instanceType, ami.id, rootVolume (size, type, iops, encrypted, kmsKey),
iamInstanceProfile, metadataServiceOptions.authentication, publicIP

Fields intentionally excluded: subnet, placement.availabilityZone — these are managed
by CPMS FailureDomains and are expected to differ per-machine.

Bug

https://issues.redhat.com/browse/OCPBUGS-86418

Root Cause

The installer generates the CPMS manifest as an independent asset from install-config defaults.
When users edit per-machine manifests after create manifests, the CPMS template is never
updated to reflect those edits. Post-install, the CPMS operator reconciles machines against its
stale template, causing an unwanted rolling replacement.

install-config.yaml
       ├──→ machines.Master (per-machine manifests)        ← user edits here
       └──→ CPMS template (never synced from above)        ← retains defaults

Changes

File Change
pkg/asset/machines/mastercpmssync.go New MasterCPMSSync asset: decodes CPMS and first MAPI master providerSpecs, compares fields, syncs drift, emits warnings
pkg/asset/machines/mastercpmssync_test.go 13 unit tests covering: no-drift, single-field drift (instanceType, AMI, rootVolume, IAM, metadata auth, publicIP, KMS key), multi-field drift, zone-exclusion (subnet, placement AZ not synced), decoder tests
pkg/asset/cluster/cluster.go Wire MasterCPMSSync into Cluster.Dependencies()
pkg/asset/ignition/bootstrap/common.go Wire MasterCPMSSync into Common.Dependencies()

Test Plan

Unit tests

  • TestSyncCPMSAWSFields_NoDrift — no drift when specs match
  • TestSyncCPMSAWSFields_InstanceTypeDrift — instanceType synced
  • TestSyncCPMSAWSFields_RootVolumeDrift — volumeSize, volumeType, iops synced
  • TestSyncCPMSAWSFields_AMIDrift — AMI ID synced
  • TestSyncCPMSAWSFields_IAMProfileDrift — IAM instance profile synced
  • TestSyncCPMSAWSFields_MetadataAuthDrift — metadata auth synced
  • TestSyncCPMSAWSFields_MultiFieldDrift — multiple simultaneous drifts
  • TestSyncCPMSAWSFields_KMSKeyDrift — KMS key synced
  • TestSyncCPMSAWSFields_PublicIPDrift — publicIP synced
  • TestSyncCPMSAWSFields_SubnetNotSynced — subnet excluded (FailureDomains)
  • TestSyncCPMSAWSFields_PlacementNotSynced — AZ excluded (FailureDomains)
  • TestDecodeCPMSProviderSpec_NilInput — nil input handling
  • TestDecodeCPMSProviderSpec_FromRawBytes — raw JSON byte decoding

Manual validation (AWS IPI, OCP 4.22-rc.4)

  • Edited master machines to instanceType: m6i.4xlarge after create manifests
  • Did NOT edit CPMS manifest
  • Ran create cluster with patched binary
  • Installer correctly detected and synced the drift:
    WARNING Detected drift between MAPI master machine providerSpec (openshift/) and CPMS template.
    WARNING     instanceType: machine has "m6i.4xlarge", CPMS template has "m6i.xlarge" → syncing
    WARNING   Syncing master machine values to CPMS template to prevent unintended rolling update.
    
  • Install completed successfully

Related

  • OCPBUGS-86417 — MAPI master machine edits silently ignored by CAPI at provisioning time (companion bug, different lifecycle stage)
  • OCPSTRAT-2661 — long-term MAPI→CAPI migration; this fix provides a minimal safety guardrail for the current installer until the Boxcutter-based installer replaces the manifest-edit workflow entirely

Made with Cursor

When users edit master machine manifests after `create manifests`
(e.g. changing instanceType) but do not also edit the CPMS manifest,
the ControlPlaneMachineSet template retains defaults. Post-install the
CPMS operator detects drift between its template and the actual Machine
providerSpecs, triggering an unintended rolling update that silently
reverts the user's customization.

Add a MasterCPMSSync asset that runs after Master asset generation.
It compares provisioning-relevant AWS fields (instanceType, AMI,
rootVolume, IAM profile, metadata auth, publicIP) between the first
MAPI master machine and the CPMS template, syncing any drift and
emitting a warning.  Zone-specific fields (Subnet, Placement AZ) are
intentionally excluded since they are managed by CPMS FailureDomains.

Bug: https://issues.redhat.com/browse/OCPBUGS-86418
Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 23, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86418, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-86418 to depend on a bug targeting a version in 5.0.0 and in one of the following states: MODIFIED, ON_QA, VERIFIED, but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

When users customize master machine manifests after openshift-install create manifests
(e.g. changing instanceType from m6i.xlarge to m6i.4xlarge) but do not also edit the
ControlPlaneMachineSet (CPMS) manifest, the CPMS template retains defaults. Post-install,
the CPMS operator detects drift between its template and the actual Machine providerSpecs,
triggering an unintended rolling update that silently reverts the user's customization.

This PR adds a MasterCPMSSync asset that runs after Master asset generation. It compares
provisioning-relevant AWS fields between master machines and the CPMS template, syncing any
drift and emitting a warning.

Fields synced: instanceType, ami.id, rootVolume (size, type, iops, encrypted, kmsKey),
iamInstanceProfile, metadataServiceOptions.authentication, publicIP

Fields intentionally excluded: subnet, placement.availabilityZone — these are managed
by CPMS FailureDomains and are expected to differ per-machine.

Bug

https://issues.redhat.com/browse/OCPBUGS-86418

Root Cause

The installer generates the CPMS manifest as an independent asset from install-config defaults.
When users edit per-machine manifests after create manifests, the CPMS template is never
updated to reflect those edits. Post-install, the CPMS operator reconciles machines against its
stale template, causing an unwanted rolling replacement.

install-config.yaml
      ├──→ machines.Master (per-machine manifests)        ← user edits here
      └──→ CPMS template (never synced from above)        ← retains defaults

Changes

File Change
pkg/asset/machines/mastercpmssync.go New MasterCPMSSync asset: decodes CPMS and first MAPI master providerSpecs, compares fields, syncs drift, emits warnings
pkg/asset/machines/mastercpmssync_test.go 13 unit tests covering: no-drift, single-field drift (instanceType, AMI, rootVolume, IAM, metadata auth, publicIP, KMS key), multi-field drift, zone-exclusion (subnet, placement AZ not synced), decoder tests
pkg/asset/cluster/cluster.go Wire MasterCPMSSync into Cluster.Dependencies()
pkg/asset/ignition/bootstrap/common.go Wire MasterCPMSSync into Common.Dependencies()

Test Plan

Unit tests

  • TestSyncCPMSAWSFields_NoDrift — no drift when specs match
  • TestSyncCPMSAWSFields_InstanceTypeDrift — instanceType synced
  • TestSyncCPMSAWSFields_RootVolumeDrift — volumeSize, volumeType, iops synced
  • TestSyncCPMSAWSFields_AMIDrift — AMI ID synced
  • TestSyncCPMSAWSFields_IAMProfileDrift — IAM instance profile synced
  • TestSyncCPMSAWSFields_MetadataAuthDrift — metadata auth synced
  • TestSyncCPMSAWSFields_MultiFieldDrift — multiple simultaneous drifts
  • TestSyncCPMSAWSFields_KMSKeyDrift — KMS key synced
  • TestSyncCPMSAWSFields_PublicIPDrift — publicIP synced
  • TestSyncCPMSAWSFields_SubnetNotSynced — subnet excluded (FailureDomains)
  • TestSyncCPMSAWSFields_PlacementNotSynced — AZ excluded (FailureDomains)
  • TestDecodeCPMSProviderSpec_NilInput — nil input handling
  • TestDecodeCPMSProviderSpec_FromRawBytes — raw JSON byte decoding

Manual validation (AWS IPI, OCP 4.22-rc.4)

  • Edited master machines to instanceType: m6i.4xlarge after create manifests
  • Did NOT edit CPMS manifest
  • Ran create cluster with patched binary
  • Installer correctly detected and synced the drift:
WARNING Detected drift between MAPI master machine providerSpec (openshift/) and CPMS template.
WARNING     instanceType: machine has "m6i.4xlarge", CPMS template has "m6i.xlarge" → syncing
WARNING   Syncing master machine values to CPMS template to prevent unintended rolling update.
  • Install completed successfully

Related

  • OCPBUGS-86417 — MAPI master machine edits silently ignored by CAPI at provisioning time (companion bug, different lifecycle stage)
  • OCPSTRAT-2661 — long-term MAPI→CAPI migration; this fix provides a minimal safety guardrail for the current installer until the Boxcutter-based installer replaces the manifest-edit workflow entirely

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 053a6c5a-cd27-42d5-8df8-71350557a361

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from sadasu and tthvo May 23, 2026 14:27
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jhixson74 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 23, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 23, 2026

Hi @chdeshpa-hue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants