Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-17079: Disable shielded VMs when upgrading on GCP marketplace #80

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nrb
Copy link
Contributor

@nrb nrb commented Feb 2, 2024

The GCP marketplace images are not updated frequently, and the 4.12 release used a 4.8 image. This image was created without support for UEFI, which is required for shielded VM support. When upgrading to OpenShift 4.13, shielded VM support is enabled by default. However, this older image and the defaults cause an error, meaning new Machines will not boot.

This is fixed by detecting whether or not a disk is UEFI-compatible and setting the MachineSet's ProviderConfig.ShieldedInstanceConfig to disable shielded VMs when UEFI is not supported.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Feb 2, 2024
@openshift-ci-robot
Copy link
Contributor

@nrb: This pull request references Jira Issue OCPBUGS-17079, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huali9

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The GCP marketplace images are not updated frequently, and the 4.12 release used a 4.8 image. This image was created without support for UEFI, which is required for shielded VM support. When upgrading to OpenShift 4.13, shielded VM support is enabled by default. However, this older image and the defaults cause an error, meaning new Machines will not boot.

This is fixed by detecting a GCP marketplace image defined in a MachineSet's Template, and setting the ShieldedInstanceConfig settings to disable shielded VMs for any new machines created.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Feb 2, 2024
@openshift-ci openshift-ci bot requested a review from huali9 February 2, 2024 19:36
Copy link
Contributor

openshift-ci bot commented Feb 2, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

providerConfig.ShieldedInstanceConfig.SecureBoot = machinev1.SecureBootPolicyDisabled
providerConfig.ShieldedInstanceConfig.VirtualizedTrustedPlatformModule = machinev1.VirtualizedTrustedPlatformModulePolicyDisabled
providerConfig.ShieldedInstanceConfig.IntegrityMonitoring = machinev1.IntegrityMonitoringPolicyDisabled
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect these settings to be consumed here on machine creation.

@openshift-ci-robot
Copy link
Contributor

@nrb: This pull request references Jira Issue OCPBUGS-17079, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huali9

In response to this:

The GCP marketplace images are not updated frequently, and the 4.12 release used a 4.8 image. This image was created without support for UEFI, which is required for shielded VM support. When upgrading to OpenShift 4.13, shielded VM support is enabled by default. However, this older image and the defaults cause an error, meaning new Machines will not boot.

This is fixed by detecting whether or not a disk is UEFI-compatible and setting the MachineSet's ProviderConfig.ShieldedInstanceConfig to disable shielded VMs when UEFI is not supported.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Comment on lines +180 to +182
// When upgrading from 4.12 on GCP marketplace, the MachineSets refer to images that do not support UEFI & shielded VMs.
// However, GCP defaulted to shielded VMs sometime between 4.12 and 4.13.
// Therefore, we should disable the shielded instance config in the MachineSet's template, so that new Machines created from it will boot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for sure that it is only marketplace images that are affected, have we checked that a 4.1 cluster image is also UEFI compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not actually a function of the marketplace images, but rather the availability of the UEFI image.

I'll update the comment, but I'll also created some clusters with older versions to find the earliest one that supports UEFI. I know 4.12 does, and 4.8 (which marketplace happens to be using) does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earliest version of a cluster I'm able to spin up is 4.6; will double check what it has.

My statement about 4.8 was not accurate. The non-marketplace image has UEFI compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we get anywhere here? We ideally need to go back and check a 4.1-4.5 image as well. I know QE have the ability to spin up those older clusters so perhaps they can validate the images for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCP support starts at 4.2. The earliest image that uses UEFI is 4.5. We're currently waiting on the Chinese holiday to try a 4.5 to 4.13 upgrade chain. There is no automation for this at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned on standup, can use openshift install create manifests for a 4.5 installer to generate the appropriate machine manifest to fetch the image out of, then you can just create a machineset on a modern cluster and try that with the older image, without needing to go through the whole upgrade process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift-install create manifests did not write YAML for machines/machinesets in 4.5 that I could find.

I did create a cluster and inspected a live machineset. The image listed appears to be specific to this cluster.

spec:
  template:
    spec:
      metadata: {}
      providerSpec:
        value:
          disks:
          - autoDelete: true
            boot: true
            image: nrb-test-r9t5n-rhcos-image
            labels: null
            sizeGb: 128
            type: pd-ssd
          machineType: n1-standard-4

Did this change at some point in OpenShift's lifecycle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift/installer@d0a789a

Something about licenses?

pkg/cloud/gcp/actuators/machineset/controller.go Outdated Show resolved Hide resolved
pkg/cloud/gcp/actuators/machineset/controller.go Outdated Show resolved Hide resolved
pkg/cloud/gcp/actuators/machineset/controller.go Outdated Show resolved Hide resolved
pkg/cloud/gcp/actuators/machineset/controller.go Outdated Show resolved Hide resolved
pkg/cloud/gcp/actuators/machineset/controller.go Outdated Show resolved Hide resolved
The GCP marketplace images are not updated frequently, and the 4.12
release used a 4.8 image. This image was created without support for
UEFI, which is required for shielded VM support. When upgrading to
OpenShift 4.13, shielded VM support is enabled by default. However,
this older image and the defaults cause an error, meaning new Machines
will not boot.

This is fixed by detecting whether or not a disk is UEFI-compatible and
setting the MachineSet's ProviderConfig.ShieldedInstanceConfig to
disable shielded VMs when UEFI is not supported.

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Copy link
Contributor

openshift-ci bot commented Feb 6, 2024

@nrb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn 8eaf6aa link true /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants