Skip to content

Conversation

@patrickdillon
Copy link
Contributor

Hive vendors the installer and uses the asset package to generate machinesets for scaleup. Because Hive is using the latest code version but installing multiple previous versions, the machinesets--particularly the values--need to be backward compatible.

In this particular case, the installer switched from using Azure managed images to image galleries in 4.12. In 4.12+ Azure machinesets expect an image referencing an image gallery, while prior to this change the machinesets looked for a managed image.

This commit updates the machineset code to allow a toggle which will allow Hive to generate Azure machinesets utilizing managed images, which should be done with 4.11 and earlier clusters.

This change also future proofs the 4.12+ by switching the machinesets to use the latest version, rather than tying them to a particular RHCOS version.

Hive vendors the installer and uses the asset package to generate
machinesets for scaleup. Because Hive is using the latest code version
but installing multiple previous versions, the machinesets--particularly
the values--need to be backward compatible.

In this particular case, the installer switched from using Azure
managed images to image galleries in 4.12. In 4.12+ Azure machinesets
expect an image referencing an image gallery, while prior to this change
the machinesets looked for a managed image.

This commit updates the machineset code to allow a toggle which will
allow Hive to generate Azure machinesets utilizing managed images,
which should be done with 4.11 and earlier clusters.

This change also future proofs the 4.12+ by switching the machinesets
to use the latest version, rather than tying them to a particular RHCOS
version.
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-4809, which is invalid:

  • expected the bug to target the "4.13.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Hive vendors the installer and uses the asset package to generate machinesets for scaleup. Because Hive is using the latest code version but installing multiple previous versions, the machinesets--particularly the values--need to be backward compatible.

In this particular case, the installer switched from using Azure managed images to image galleries in 4.12. In 4.12+ Azure machinesets expect an image referencing an image gallery, while prior to this change the machinesets looked for a managed image.

This commit updates the machineset code to allow a toggle which will allow Hive to generate Azure machinesets utilizing managed images, which should be done with 4.11 and earlier clusters.

This change also future proofs the 4.12+ by switching the machinesets to use the latest version, rather than tying them to a particular RHCOS version.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 13, 2022
@openshift-ci openshift-ci bot requested review from jhixson74 and mtulio December 13, 2022 17:48
@patrickdillon
Copy link
Contributor Author

cc @2uasimojo @abutcher @dlom

@patrickdillon
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 13, 2022
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-4809, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.0) matches configured target version for branch (4.13.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@r4f4
Copy link
Contributor

r4f4 commented Dec 13, 2022

/cc @Prashanth684

@openshift-ci openshift-ci bot requested a review from Prashanth684 December 13, 2022 18:06
imageID := fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/galleries/gallery_%s/images/%s/versions/latest", rg, galleryName, id)
image.ResourceID = imageID
} else {
image.ResourceID = fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/images/%s", rg, clusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to append "-gen2" here to the image name if the VM supports hyperVGen == "V2"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed in e26a478

id += "-gen2"
}
imageID := fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/galleries/gallery_%s/images/%s/versions/%s", rg, galleryName, id, rhcosVersion)
imageID := fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/galleries/gallery_%s/images/%s/versions/latest", rg, galleryName, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using latest here work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prashanth684 I remember you had issues when using anything other than an actual version here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it did - looking back at this comment: #6304 (comment), it did not like latest when used as a version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See name here https://learn.microsoft.com/en-us/azure/templates/microsoft.compute/2021-10-01/galleries/images/versions?pivots=deployment-language-arm-template. This is the ARM template but iirc the same restrictions apply

Character limit: 32-bit integer

Valid characters:
Numbers and periods.
(Each segment is converted to an int32. So each segment has a max value of 2,147,483,647.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it did - looking back at this comment: #6304 (comment), it did not like latest when used as a version

It doesn't like latest when creating an image gallery (version), but it is ok when specifying the image to be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense!

replicas++
}
provider, err := provider(platform, mpool, osImage, userDataSecret, clusterID, role, &idx, capabilities, rhcosVersion)
useImageGallery := platform.CloudName != azure.StackCloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "toggle" here seems to be hardcoded to say that if the platform is ASH use managed images, if not use image gallery? does this mean hive uses ASH? i don't understand how this toggle helps with backwards compatibility in this case.

Copy link
Contributor Author

@patrickdillon patrickdillon Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just learning about this myself... Within the installer, there is no backward compatibility and Azure Stack using the bool is more of a coincidence. Hive vendors the asset package and calls the function to create machinesets:

https://github.com/openshift/hive/blob/33e68f241d3c784c5362b48febea14020107dd39/pkg/controller/machinepool/azureactuator.go#L139-L146

So basically they will set the bool based on whether the version is 4.12+.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah..ok..i see..good to know..thanks

@2uasimojo
Copy link
Member

This looks like it does what we talked about yesterday ✓
Thanks @patrickdillon!

@r4f4
Copy link
Contributor

r4f4 commented Dec 14, 2022

/test e2e-azurestack

@r4f4
Copy link
Contributor

r4f4 commented Dec 14, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2022
@patrickdillon
Copy link
Contributor Author

/approve

@patrickdillon
Copy link
Contributor Author

/skip
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 14, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2022
@patrickdillon
Copy link
Contributor Author

/override ci/prow/golint

golint job is timing out. Fix is in progress. Overriding here after running the test locally.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 14, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/golint

Details

In response to this:

/override ci/prow/golint

golint job is timing out. Fix is in progress. Overriding here after running the test locally.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 5a8e993 and 2 for PR HEAD e26a478 in total

@patrickdillon
Copy link
Contributor Author

/skip

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD f2195aa and 1 for PR HEAD e26a478 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 9377cb3 and 0 for PR HEAD e26a478 in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision e26a478 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2022
@patrickdillon
Copy link
Contributor Author

/hold cancel
/override ci/prow/e2e-azure-ovn
/override ci/prow/e2e-gcp-ovn

These actually didn't fail... hitting timeouts. Probably due to long image build times.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-azure-ovn, ci/prow/e2e-gcp-ovn

Details

In response to this:

/hold cancel
/override ci/prow/e2e-azure-ovn
/override ci/prow/e2e-gcp-ovn

These actually didn't fail... hitting timeouts. Probably due to long image build times.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4a0d158 and 2 for PR HEAD e26a478 in total

@patrickdillon
Copy link
Contributor Author

/label ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2022

@patrickdillon: The label(s) /label ? cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@patrickdillon
Copy link
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Dec 16, 2022
@r4f4
Copy link
Contributor

r4f4 commented Dec 16, 2022

Existing linter issues are being fixed in #6712 and running the linter locally for this PR looks good:

$: ~/go/bin/golangci-lint run --timeout=10m --new-from-rev=HEAD~2
$: 

@patrickdillon
Copy link
Contributor Author

/override ci/prow/e2e-azure-ovn ci/prow/e2e-gcp-ovn ci/prow/e2e-vsphere-ovn ci/prow/golint

Timeouts and a known vsphere issue... overriding

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-azure-ovn, ci/prow/e2e-gcp-ovn, ci/prow/e2e-vsphere-ovn, ci/prow/golint

Details

In response to this:

/override ci/prow/e2e-azure-ovn ci/prow/e2e-gcp-ovn ci/prow/e2e-vsphere-ovn ci/prow/golint

Timeouts and a known vsphere issue... overriding

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2022

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-shared-vpc e26a478 link false /test e2e-azure-ovn-shared-vpc
ci/prow/e2e-aws-ovn-disruptive e26a478 link false /test e2e-aws-ovn-disruptive
ci/prow/e2e-aws-ovn-upgrade e26a478 link false /test e2e-aws-ovn-upgrade
ci/prow/e2e-azurestack e26a478 link false /test e2e-azurestack
ci/prow/e2e-metal-assisted e26a478 link false /test e2e-metal-assisted
ci/prow/e2e-aws-ovn-workers-rhel8 e26a478 link false /test e2e-aws-ovn-workers-rhel8
ci/prow/e2e-azure-ovn-resourcegroup e26a478 link false /test e2e-azure-ovn-resourcegroup
ci/prow/okd-e2e-aws-ovn-upgrade e26a478 link false /test okd-e2e-aws-ovn-upgrade
ci/prow/e2e-ibmcloud-ovn e26a478 link false /test e2e-ibmcloud-ovn
ci/prow/e2e-libvirt e26a478 link false /test e2e-libvirt
ci/prow/e2e-openstack e26a478 link false /test e2e-openstack
ci/prow/okd-scos-e2e-aws-upgrade e26a478 link false /test okd-scos-e2e-aws-upgrade
ci/prow/golint e26a478 link true /test golint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 4442bbd into openshift:master Dec 16, 2022
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-4809 has been moved to the MODIFIED state.

Details

In response to this:

Hive vendors the installer and uses the asset package to generate machinesets for scaleup. Because Hive is using the latest code version but installing multiple previous versions, the machinesets--particularly the values--need to be backward compatible.

In this particular case, the installer switched from using Azure managed images to image galleries in 4.12. In 4.12+ Azure machinesets expect an image referencing an image gallery, while prior to this change the machinesets looked for a managed image.

This commit updates the machineset code to allow a toggle which will allow Hive to generate Azure machinesets utilizing managed images, which should be done with 4.11 and earlier clusters.

This change also future proofs the 4.12+ by switching the machinesets to use the latest version, rather than tying them to a particular RHCOS version.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dlom
Copy link
Contributor

dlom commented Dec 16, 2022

@patrickdillon any chance we can get this cherry-picked to 4.12?

@dlom
Copy link
Contributor

dlom commented Dec 16, 2022

let's try it...

/cherry-pick release-4.12

@openshift-cherrypick-robot

@dlom: new pull request created: #6715

Details

In response to this:

let's try it...

/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dlom added a commit to dlom/installer that referenced this pull request Jan 5, 2023
This allows end-users of the MachineSets() function to choose whether
they want to use the image gallery or not (<=4.11 vs >=4.12).
Additionally, the unused parameter rhcosVerison has been removed, and
a variable has been shadowed to appease the golint CI check.

See openshift#6694 for more details concerning why this change is necessary.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/installer that referenced this pull request Jan 9, 2023
This allows end-users of the MachineSets() function to choose whether
they want to use the image gallery or not (<=4.11 vs >=4.12).
Additionally, the unused parameter rhcosVerison has been removed, and
a variable has been shadowed to appease the golint CI check.

See openshift#6694 for more details concerning why this change is necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants