-
Notifications
You must be signed in to change notification settings - Fork 1.5k
OCPBUGS-4809: Azure: toggle image in machinesets #6694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-4809: Azure: toggle image in machinesets #6694
Conversation
Hive vendors the installer and uses the asset package to generate machinesets for scaleup. Because Hive is using the latest code version but installing multiple previous versions, the machinesets--particularly the values--need to be backward compatible. In this particular case, the installer switched from using Azure managed images to image galleries in 4.12. In 4.12+ Azure machinesets expect an image referencing an image gallery, while prior to this change the machinesets looked for a managed image. This commit updates the machineset code to allow a toggle which will allow Hive to generate Azure machinesets utilizing managed images, which should be done with 4.11 and earlier clusters. This change also future proofs the 4.12+ by switching the machinesets to use the latest version, rather than tying them to a particular RHCOS version.
|
@patrickdillon: This pull request references Jira Issue OCPBUGS-4809, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
|
@patrickdillon: This pull request references Jira Issue OCPBUGS-4809, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cc @Prashanth684 |
pkg/asset/machines/azure/machines.go
Outdated
| imageID := fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/galleries/gallery_%s/images/%s/versions/latest", rg, galleryName, id) | ||
| image.ResourceID = imageID | ||
| } else { | ||
| image.ResourceID = fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/images/%s", rg, clusterID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to append "-gen2" here to the image name if the VM supports hyperVGen == "V2"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed in e26a478
| id += "-gen2" | ||
| } | ||
| imageID := fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/galleries/gallery_%s/images/%s/versions/%s", rg, galleryName, id, rhcosVersion) | ||
| imageID := fmt.Sprintf("/resourceGroups/%s/providers/Microsoft.Compute/galleries/gallery_%s/images/%s/versions/latest", rg, galleryName, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using latest here work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Prashanth684 I remember you had issues when using anything other than an actual version here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it did - looking back at this comment: #6304 (comment), it did not like latest when used as a version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See name here https://learn.microsoft.com/en-us/azure/templates/microsoft.compute/2021-10-01/galleries/images/versions?pivots=deployment-language-arm-template. This is the ARM template but iirc the same restrictions apply
Character limit: 32-bit integer
Valid characters:
Numbers and periods.
(Each segment is converted to an int32. So each segment has a max value of 2,147,483,647.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it did - looking back at this comment: #6304 (comment), it did not like latest when used as a version
It doesn't like latest when creating an image gallery (version), but it is ok when specifying the image to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense!
| replicas++ | ||
| } | ||
| provider, err := provider(platform, mpool, osImage, userDataSecret, clusterID, role, &idx, capabilities, rhcosVersion) | ||
| useImageGallery := platform.CloudName != azure.StackCloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this "toggle" here seems to be hardcoded to say that if the platform is ASH use managed images, if not use image gallery? does this mean hive uses ASH? i don't understand how this toggle helps with backwards compatibility in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just learning about this myself... Within the installer, there is no backward compatibility and Azure Stack using the bool is more of a coincidence. Hive vendors the asset package and calls the function to create machinesets:
So basically they will set the bool based on whether the version is 4.12+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah..ok..i see..good to know..thanks
|
This looks like it does what we talked about yesterday ✓ |
|
/test e2e-azurestack |
|
/lgtm |
|
/approve |
|
/skip |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/golint golint job is timing out. Fix is in progress. Overriding here after running the test locally. |
|
@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/golint DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/skip |
|
/hold Revision e26a478 was retested 3 times: holding |
|
/hold cancel These actually didn't fail... hitting timeouts. Probably due to long image build times. |
|
@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-azure-ovn, ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/label ? |
|
@patrickdillon: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/label tide/merge-method-squash |
|
Existing linter issues are being fixed in #6712 and running the linter locally for this PR looks good: $: ~/go/bin/golangci-lint run --timeout=10m --new-from-rev=HEAD~2
$: |
|
/override ci/prow/e2e-azure-ovn ci/prow/e2e-gcp-ovn ci/prow/e2e-vsphere-ovn ci/prow/golint Timeouts and a known vsphere issue... overriding |
|
@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-azure-ovn, ci/prow/e2e-gcp-ovn, ci/prow/e2e-vsphere-ovn, ci/prow/golint DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@patrickdillon: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@patrickdillon: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-4809 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@patrickdillon any chance we can get this cherry-picked to 4.12? |
|
let's try it... /cherry-pick release-4.12 |
|
@dlom: new pull request created: #6715 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This allows end-users of the MachineSets() function to choose whether they want to use the image gallery or not (<=4.11 vs >=4.12). Additionally, the unused parameter rhcosVerison has been removed, and a variable has been shadowed to appease the golint CI check. See openshift#6694 for more details concerning why this change is necessary.
This allows end-users of the MachineSets() function to choose whether they want to use the image gallery or not (<=4.11 vs >=4.12). Additionally, the unused parameter rhcosVerison has been removed, and a variable has been shadowed to appease the golint CI check. See openshift#6694 for more details concerning why this change is necessary.
Hive vendors the installer and uses the asset package to generate machinesets for scaleup. Because Hive is using the latest code version but installing multiple previous versions, the machinesets--particularly the values--need to be backward compatible.
In this particular case, the installer switched from using Azure managed images to image galleries in 4.12. In 4.12+ Azure machinesets expect an image referencing an image gallery, while prior to this change the machinesets looked for a managed image.
This commit updates the machineset code to allow a toggle which will allow Hive to generate Azure machinesets utilizing managed images, which should be done with 4.11 and earlier clusters.
This change also future proofs the 4.12+ by switching the machinesets to use the latest version, rather than tying them to a particular RHCOS version.