New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-25725: ManagedBootImages: failed to fetch architecture type of machineset no linked machine found #4088
OCPBUGS-25725: ManagedBootImages: failed to fetch architecture type of machineset no linked machine found #4088
Conversation
Skipping CI for Draft Pull Request. |
@djoshy: This pull request references Jira Issue OCPBUGS-25725, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @rioliu-rh |
c08d343
to
98434a2
Compare
/unhold Unholding as #4083 has merged |
/test ci/prow/e2e-hypershift |
@djoshy: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-hypershift |
/hold for QE verification |
setup a GCP cluster with version 4.14.8 $ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-9vz55 | grep 'CoreOS aleph version'
I0109 03:47:17.496869 1511 coreos.go:54] CoreOS aleph version: mtime=2023-10-21 04:41:51.929 +0000 UTC build=414.92.202310210434-0 imgid=rhcos-414.92.202310210434-0-qemu.x86_64.qcow2 do pre-upgrade snapshot of configmap coreos-bootimages $ oc get cm coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_pre_upgrade.yaml upgrade cluster to CI image $ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-pfzgxlb/release:latest --force --allow-explicit-upgrade
$ oc get clusterversion version -o yaml | yq '.status.history[]|.version,.state'
"4.15.0-0.ci.test-2024-01-09-032426-ci-ln-pfzgxlb-latest"
"Completed"
"4.14.8"
"Completed" do post-upgrade snapshot for configmap coreos-bootimages $ oc get cm coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_post_upgrade.yaml do diff b/w pre/post upgrade snapshot $ diff /tmp/coreos-bootimages_pre_upgrade.yaml /tmp/coreos-bootimages_post_upgrade.yaml | egrep 'MCO|gcp'
> MCOReleaseImageVersion: 4.15.0-0.ci.test-2024-01-09-032426-ci-ln-pfzgxlb-latest
> MCOVersionHash: 53d9c7eecacc24e70d449e823e500f7cec356d7c
< "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.14-9.2/builds/414.92.202310210434-0/aarch64/rhcos-414.92.202310210434-0-gcp.aarch64.tar.gz",
> "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202311241643-0/aarch64/rhcos-415.92.202311241643-0-gcp.aarch64.tar.gz",
< "name": "rhcos-414-92-202310210434-0-gcp-aarch64"
> "name": "rhcos-415-92-202311241643-0-gcp-aarch64"
< "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.14-9.2/builds/414.92.202310210434-0/x86_64/rhcos-414.92.202310210434-0-gcp.x86_64.tar.gz",
> "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202311241643-0/x86_64/rhcos-415.92.202311241643-0-gcp.x86_64.tar.gz",
< "name": "rhcos-414-92-202310210434-0-gcp-x86-64"
> "name": "rhcos-415-92-202311241643-0-gcp-x86-64" check featuregate state $ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[]|select(.version=="4.15.0-0.ci.test-2024-01-09-032426-ci-ln-pfzgxlb-latest")|.disabled' | grep ManagedBootImages
- name: ManagedBootImages enabled featuregate ManagedBootImages $ oc apply -f ~/mco_test/mc/featuregate_techpreview.yaml
Warning: resource featuregates/cluster is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically.
featuregate.config.openshift.io/cluster configured check MCC log about machineset patching I0109 06:27:52.038767 1 machine_set_boot_image_controller.go:139] "FeatureGates changed" enabled=["AdminNetworkPolicy","AlibabaPlatform","AutomatedEtcdBackup","AzureWorkloadIdentity","BuildCSIVolumes","CSIDriverSharedResource","CloudDualStackNodeIPs","DNSNameResolver","DynamicResourceAllocation","ExternalCloudProvider","ExternalCloudProviderAzure","ExternalCloudProviderExternal","ExternalCloudProviderGCP","GCPClusterHostedDNS","GCPLabelsTags","GatewayAPI","InsightsConfigAPI","InstallAlternateInfrastructureAWS","MachineAPIProviderOpenStack","MachineConfigNodes","ManagedBootImages","MaxUnavailableStatefulSet","MetricsServer","MixedCPUsAllocation","NetworkLiveMigration","NodeSwap","OnClusterBuild","OpenShiftPodSecurityAdmission","PrivateHostedZoneAWS","RouteExternalCertificate","SignatureStores","SigstoreImageVerification","VSphereControlPlaneMachineSet","VSphereStaticIPs","ValidatingAdmissionPolicy"] disabled=["ClusterAPIInstall","DisableKubeletCloudCredentialProviders","EventedPLEG","MachineAPIOperatorDisableMachineHealthCheckController"]
I0109 06:27:52.038820 1 machine_set_boot_image_controller.go:152] Trigger a sync as this feature was turned on
I0109 06:27:52.038888 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.043208 1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.043223 1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.043289 1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.048853 1 event.go:298] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-machine-config-operator", Name:"openshift-machine-config-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesModified' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AdminNetworkPolicy", "AlibabaPlatform", "AutomatedEtcdBackup", "AzureWorkloadIdentity", "BuildCSIVolumes", "CSIDriverSharedResource", "CloudDualStackNodeIPs", "DNSNameResolver", "DynamicResourceAllocation", "ExternalCloudProvider", "ExternalCloudProviderAzure", "ExternalCloudProviderExternal", "ExternalCloudProviderGCP", "GCPClusterHostedDNS", "GCPLabelsTags", "GatewayAPI", "InsightsConfigAPI", "InstallAlternateInfrastructureAWS", "MachineAPIProviderOpenStack", "MachineConfigNodes", "ManagedBootImages", "MaxUnavailableStatefulSet", "MetricsServer", "MixedCPUsAllocation", "NetworkLiveMigration", "NodeSwap", "OnClusterBuild", "OpenShiftPodSecurityAdmission", "PrivateHostedZoneAWS", "RouteExternalCertificate", "SignatureStores", "SigstoreImageVerification", "VSphereControlPlaneMachineSet", "VSphereStaticIPs", "ValidatingAdmissionPolicy"}, Disabled:[]v1.FeatureGateName{"ClusterAPIInstall", "DisableKubeletCloudCredentialProviders", "EventedPLEG", "MachineAPIOperatorDisableMachineHealthCheckController"}}
I0109 06:27:52.048968 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.050380 1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.050450 1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.050543 1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.136183 1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-b updated, reconciling all machinesets
I0109 06:27:52.142853 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.144149 1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.144240 1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.144352 1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.155202 1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-a updated, reconciling all machinesets
I0109 06:27:52.155890 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-f on GCP, with arch x86_64
I0109 06:27:52.157134 1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.157806 1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.157938 1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-f
I0109 06:27:52.157766 1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-b updated, reconciling all machinesets
I0109 06:27:52.186478 1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-c updated, reconciling all machinesets
I0109 06:27:52.202363 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.219834 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.219885 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.221123 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.221173 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.225658 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.225697 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.226866 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.250917 1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-c updated, reconciling all machinesets
I0109 06:27:52.250970 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.253730 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.253829 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.254959 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.255047 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.257826 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-f on GCP, with arch x86_64
I0109 06:27:52.259034 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-f
I0109 06:27:52.259623 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.271580 1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-f updated, reconciling all machinesets
I0109 06:27:52.272578 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.272684 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.274476 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.274514 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.275255 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.275340 1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-f on GCP, with arch x86_64
I0109 06:27:52.276323 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.277068 1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-f find 0 replica machineset $ machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
rioliu-0109a-4dmvd-worker-a 1 1 1 1 3h3m
rioliu-0109a-4dmvd-worker-b 1 1 1 1 3h3m
rioliu-0109a-4dmvd-worker-c 1 1 1 1 3h3m
rioliu-0109a-4dmvd-worker-f 0 0 3h3m check boot image of 0 replica, if it is patched with new boot-image $ machineset rioliu-0109a-4dmvd-worker-f -o yaml | yq '.spec.template.spec.providerSpec.value.disks'
[
{
"autoDelete": true,
"boot": true,
"image": "projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64",
"labels": {},
"sizeGb": 128,
"type": "pd-ssd"
}
] the boot-image is patched with new image, arch is correct. $ oc scale --replicas=1 machineset.machine.openshift.io/rioliu-0109a-4dmvd-worker-f -n openshift-machine-api
machineset.machine.openshift.io/rioliu-0109a-4dmvd-worker-f scaled
$ machineset rioliu-0109a-4dmvd-worker-f
NAME DESIRED CURRENT READY AVAILABLE AGE
rioliu-0109a-4dmvd-worker-f 1 1 1 1 3h19m check boot image on new node $ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-6s8tj | grep -A6 'CoreOS'
I0109 06:51:49.577306 1499 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
{
"build": "415.92.202311241643-0",
"imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
"ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
"ref": ""
} |
/unhold |
@djoshy: This pull request references Jira Issue OCPBUGS-25725, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test e2e-hypershift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/override ci/prow/e2e-gcp-op-single-node |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cdoern, djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cdoern: Overrode contexts on behalf of cdoern: ci/prow/e2e-gcp-op-single-node In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-hypershift |
@djoshy: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
8c06841
into
openshift:master
@djoshy: Jira Issue OCPBUGS-25725: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-25725 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build openshift-proxy-pull-test-container-v4.16.0-202401160407.p0.g8c06841.assembly.stream for distgit openshift-proxy-pull-test. |
Fixes scale-up issue found here: #4083 (comment)
This should only merge after #4083 merges.
This PR changes the way the MCO finds the architecture of a machineset to this method. Originally, I was mapping the machineset to a node to determine it. However, for machinesets that have no nodes scaled up yet, this would cause an error, and the very first scale-up would take place with the older boot image. This fix only requires a label on the machineset to determine the architecture, if the label is not present the MCO will default to the control plane architecture.