Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-25725: ManagedBootImages: failed to fetch architecture type of machineset no linked machine found #4088

Merged

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented Jan 2, 2024

Fixes scale-up issue found here: #4083 (comment)
This should only merge after #4083 merges.

This PR changes the way the MCO finds the architecture of a machineset to this method. Originally, I was mapping the machineset to a node to determine it. However, for machinesets that have no nodes scaled up yet, this would cause an error, and the very first scale-up would take place with the older boot image. This fix only requires a label on the machineset to determine the architecture, if the label is not present the MCO will default to the control plane architecture.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 2, 2024
Copy link
Contributor

openshift-ci bot commented Jan 2, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 2, 2024
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-25725, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Fixes scale-up issue found here: #4083 (comment)
This should only merge after #4083 merges

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr January 2, 2024 16:46
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 2, 2024
@rioliu-rh
Copy link

/cc @rioliu-rh

@openshift-ci openshift-ci bot requested a review from rioliu-rh January 3, 2024 09:18
@djoshy djoshy force-pushed the manage-boot-images-scale-up-bug branch from c08d343 to 98434a2 Compare January 8, 2024 17:23
@djoshy
Copy link
Contributor Author

djoshy commented Jan 8, 2024

/unhold

Unholding as #4083 has merged

@djoshy djoshy marked this pull request as ready for review January 8, 2024 17:24
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2024
@djoshy
Copy link
Contributor Author

djoshy commented Jan 8, 2024

/test ci/prow/e2e-hypershift

Copy link
Contributor

openshift-ci bot commented Jan 8, 2024

@djoshy: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test 4.12-upgrade-from-stable-4.11-images
  • /test cluster-bootimages
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-upgrade
  • /test e2e-gcp-op
  • /test e2e-gcp-op-single-node
  • /test e2e-hypershift
  • /test images
  • /test okd-scos-images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
  • /test bootstrap-unit
  • /test e2e-alibabacloud-ovn
  • /test e2e-aws-disruptive
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-fips-op
  • /test e2e-aws-ovn-upgrade-out-of-change
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-proxy
  • /test e2e-aws-serial
  • /test e2e-aws-single-node
  • /test e2e-aws-upgrade-single-node
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-ovn-upgrade
  • /test e2e-azure-ovn-upgrade-out-of-change
  • /test e2e-azure-upgrade
  • /test e2e-gcp-op-layering
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-gcp-rt
  • /test e2e-gcp-rt-op
  • /test e2e-gcp-single-node
  • /test e2e-gcp-upgrade
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-externallb
  • /test e2e-openstack-parallel
  • /test e2e-ovirt
  • /test e2e-ovirt-upgrade
  • /test e2e-ovn-step-registry
  • /test e2e-vsphere
  • /test e2e-vsphere-upgrade
  • /test e2e-vsphere-upi
  • /test e2e-vsphere-upi-zones
  • /test e2e-vsphere-zones
  • /test okd-e2e-aws
  • /test okd-e2e-gcp-op
  • /test okd-e2e-upgrade
  • /test okd-e2e-vsphere
  • /test okd-images
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-e2e-gcp-op
  • /test okd-scos-e2e-gcp-ovn-upgrade
  • /test okd-scos-e2e-vsphere

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-machine-config-operator-master-bootstrap-unit
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade-out-of-change
  • pull-ci-openshift-machine-config-operator-master-e2e-azure-ovn-upgrade-out-of-change
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-hypershift
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-okd-scos-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test ci/prow/e2e-hypershift

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@djoshy
Copy link
Contributor Author

djoshy commented Jan 8, 2024

/test e2e-hypershift

@rioliu-rh
Copy link

/hold for QE verification

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2024
@rioliu-rh
Copy link

rioliu-rh commented Jan 9, 2024

setup a GCP cluster with version 4.14.8
check boot image in MCD log

$ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-9vz55 | grep 'CoreOS aleph version'
I0109 03:47:17.496869    1511 coreos.go:54] CoreOS aleph version: mtime=2023-10-21 04:41:51.929 +0000 UTC build=414.92.202310210434-0 imgid=rhcos-414.92.202310210434-0-qemu.x86_64.qcow2

do pre-upgrade snapshot of configmap coreos-bootimages

$ oc get cm coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_pre_upgrade.yaml

upgrade cluster to CI image

$ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-pfzgxlb/release:latest --force --allow-explicit-upgrade

$ oc get clusterversion version -o yaml | yq '.status.history[]|.version,.state'
"4.15.0-0.ci.test-2024-01-09-032426-ci-ln-pfzgxlb-latest"
"Completed"
"4.14.8"
"Completed"

do post-upgrade snapshot for configmap coreos-bootimages

$ oc get cm coreos-bootimages -n openshift-machine-config-operator -o yaml > /tmp/coreos-bootimages_post_upgrade.yaml

do diff b/w pre/post upgrade snapshot

$ diff /tmp/coreos-bootimages_pre_upgrade.yaml /tmp/coreos-bootimages_post_upgrade.yaml | egrep 'MCO|gcp'
>   MCOReleaseImageVersion: 4.15.0-0.ci.test-2024-01-09-032426-ci-ln-pfzgxlb-latest
>   MCOVersionHash: 53d9c7eecacc24e70d449e823e500f7cec356d7c
<                     "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.14-9.2/builds/414.92.202310210434-0/aarch64/rhcos-414.92.202310210434-0-gcp.aarch64.tar.gz",
>                     "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202311241643-0/aarch64/rhcos-415.92.202311241643-0-gcp.aarch64.tar.gz",
<               "name": "rhcos-414-92-202310210434-0-gcp-aarch64"
>               "name": "rhcos-415-92-202311241643-0-gcp-aarch64"
<                     "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.14-9.2/builds/414.92.202310210434-0/x86_64/rhcos-414.92.202310210434-0-gcp.x86_64.tar.gz",
>                     "location": "https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.15-9.2/builds/415.92.202311241643-0/x86_64/rhcos-415.92.202311241643-0-gcp.x86_64.tar.gz",
<               "name": "rhcos-414-92-202310210434-0-gcp-x86-64"
>               "name": "rhcos-415-92-202311241643-0-gcp-x86-64"

check featuregate state

$ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[]|select(.version=="4.15.0-0.ci.test-2024-01-09-032426-ci-ln-pfzgxlb-latest")|.disabled' | grep ManagedBootImages
- name: ManagedBootImages

enabled featuregate ManagedBootImages

$ oc apply -f ~/mco_test/mc/featuregate_techpreview.yaml
Warning: resource featuregates/cluster is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically.
featuregate.config.openshift.io/cluster configured

check MCC log about machineset patching

I0109 06:27:52.038767       1 machine_set_boot_image_controller.go:139] "FeatureGates changed" enabled=["AdminNetworkPolicy","AlibabaPlatform","AutomatedEtcdBackup","AzureWorkloadIdentity","BuildCSIVolumes","CSIDriverSharedResource","CloudDualStackNodeIPs","DNSNameResolver","DynamicResourceAllocation","ExternalCloudProvider","ExternalCloudProviderAzure","ExternalCloudProviderExternal","ExternalCloudProviderGCP","GCPClusterHostedDNS","GCPLabelsTags","GatewayAPI","InsightsConfigAPI","InstallAlternateInfrastructureAWS","MachineAPIProviderOpenStack","MachineConfigNodes","ManagedBootImages","MaxUnavailableStatefulSet","MetricsServer","MixedCPUsAllocation","NetworkLiveMigration","NodeSwap","OnClusterBuild","OpenShiftPodSecurityAdmission","PrivateHostedZoneAWS","RouteExternalCertificate","SignatureStores","SigstoreImageVerification","VSphereControlPlaneMachineSet","VSphereStaticIPs","ValidatingAdmissionPolicy"] disabled=["ClusterAPIInstall","DisableKubeletCloudCredentialProviders","EventedPLEG","MachineAPIOperatorDisableMachineHealthCheckController"]
I0109 06:27:52.038820       1 machine_set_boot_image_controller.go:152] Trigger a sync as this feature was turned on
I0109 06:27:52.038888       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.043208       1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.043223       1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.043289       1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.048853       1 event.go:298] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-machine-config-operator", Name:"openshift-machine-config-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesModified' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AdminNetworkPolicy", "AlibabaPlatform", "AutomatedEtcdBackup", "AzureWorkloadIdentity", "BuildCSIVolumes", "CSIDriverSharedResource", "CloudDualStackNodeIPs", "DNSNameResolver", "DynamicResourceAllocation", "ExternalCloudProvider", "ExternalCloudProviderAzure", "ExternalCloudProviderExternal", "ExternalCloudProviderGCP", "GCPClusterHostedDNS", "GCPLabelsTags", "GatewayAPI", "InsightsConfigAPI", "InstallAlternateInfrastructureAWS", "MachineAPIProviderOpenStack", "MachineConfigNodes", "ManagedBootImages", "MaxUnavailableStatefulSet", "MetricsServer", "MixedCPUsAllocation", "NetworkLiveMigration", "NodeSwap", "OnClusterBuild", "OpenShiftPodSecurityAdmission", "PrivateHostedZoneAWS", "RouteExternalCertificate", "SignatureStores", "SigstoreImageVerification", "VSphereControlPlaneMachineSet", "VSphereStaticIPs", "ValidatingAdmissionPolicy"}, Disabled:[]v1.FeatureGateName{"ClusterAPIInstall", "DisableKubeletCloudCredentialProviders", "EventedPLEG", "MachineAPIOperatorDisableMachineHealthCheckController"}}
I0109 06:27:52.048968       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.050380       1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.050450       1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.050543       1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.136183       1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-b updated, reconciling all machinesets
I0109 06:27:52.142853       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.144149       1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.144240       1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.144352       1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.155202       1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-a updated, reconciling all machinesets
I0109 06:27:52.155890       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-f on GCP, with arch x86_64
I0109 06:27:52.157134       1 machine_set_boot_image_controller.go:554] New target boot image: projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64
I0109 06:27:52.157806       1 machine_set_boot_image_controller.go:555] Current image: projects/rhcos-cloud/global/images/rhcos-414-92-202310210434-0-gcp-x86-64
I0109 06:27:52.157938       1 machine_set_boot_image_controller.go:395] Patching machineset rioliu-0109a-4dmvd-worker-f
I0109 06:27:52.157766       1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-b updated, reconciling all machinesets
I0109 06:27:52.186478       1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-c updated, reconciling all machinesets
I0109 06:27:52.202363       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.219834       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.219885       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.221123       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.221173       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.225658       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.225697       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.226866       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.250917       1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-c updated, reconciling all machinesets
I0109 06:27:52.250970       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.253730       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.253829       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.254959       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.255047       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.257826       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-f on GCP, with arch x86_64
I0109 06:27:52.259034       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-f
I0109 06:27:52.259623       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.271580       1 machine_set_boot_image_controller.go:244] MachineSet rioliu-0109a-4dmvd-worker-f updated, reconciling all machinesets
I0109 06:27:52.272578       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-a on GCP, with arch x86_64
I0109 06:27:52.272684       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-b on GCP, with arch x86_64
I0109 06:27:52.274476       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-a
I0109 06:27:52.274514       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-c on GCP, with arch x86_64
I0109 06:27:52.275255       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-b
I0109 06:27:52.275340       1 machine_set_boot_image_controller.go:529] Reconciling machineset rioliu-0109a-4dmvd-worker-f on GCP, with arch x86_64
I0109 06:27:52.276323       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-c
I0109 06:27:52.277068       1 machine_set_boot_image_controller.go:398] No patching required for machineset rioliu-0109a-4dmvd-worker-f

find 0 replica machineset

$ machineset
NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
rioliu-0109a-4dmvd-worker-a   1         1         1       1           3h3m
rioliu-0109a-4dmvd-worker-b   1         1         1       1           3h3m
rioliu-0109a-4dmvd-worker-c   1         1         1       1           3h3m
rioliu-0109a-4dmvd-worker-f   0         0                             3h3m

check boot image of 0 replica, if it is patched with new boot-image

$ machineset rioliu-0109a-4dmvd-worker-f -o yaml | yq '.spec.template.spec.providerSpec.value.disks'
[
  {
    "autoDelete": true,
    "boot": true,
    "image": "projects/rhcos-cloud/global/images/rhcos-415-92-202311241643-0-gcp-x86-64",
    "labels": {},
    "sizeGb": 128,
    "type": "pd-ssd"
  }
]

the boot-image is patched with new image, arch is correct.
scale up this machineset to see whether new machine can join the cluster

$ oc scale --replicas=1 machineset.machine.openshift.io/rioliu-0109a-4dmvd-worker-f -n openshift-machine-api
machineset.machine.openshift.io/rioliu-0109a-4dmvd-worker-f scaled

$ machineset rioliu-0109a-4dmvd-worker-f
NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
rioliu-0109a-4dmvd-worker-f   1         1         1       1           3h19m

check boot image on new node

$ oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-6s8tj | grep -A6 'CoreOS'
I0109 06:51:49.577306    1499 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
{
   "build": "415.92.202311241643-0",
   "imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
   "ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
   "ref": ""
}

@rioliu-rh
Copy link

/unhold
/label qe-approved

@openshift-ci openshift-ci bot added qe-approved Signifies that QE has signed off on this PR and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 9, 2024
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-25725, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

In response to this:

Fixes scale-up issue found here: #4083 (comment)
This should only merge after #4083 merges

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Jan 9, 2024

/test e2e-hypershift

Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@cdoern
Copy link
Contributor

cdoern commented Jan 15, 2024

/override ci/prow/e2e-gcp-op-single-node

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2024
Copy link
Contributor

openshift-ci bot commented Jan 15, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, djoshy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Jan 15, 2024

@cdoern: Overrode contexts on behalf of cdoern: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@djoshy
Copy link
Contributor Author

djoshy commented Jan 15, 2024

/test e2e-hypershift

Copy link
Contributor

openshift-ci bot commented Jan 15, 2024

@djoshy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-layering 98434a2 link false /test e2e-gcp-op-layering

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 1fe4220 and 2 for PR HEAD 98434a2 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 8c06841 into openshift:master Jan 16, 2024
13 of 14 checks passed
@openshift-ci-robot
Copy link
Contributor

@djoshy: Jira Issue OCPBUGS-25725: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-25725 has been moved to the MODIFIED state.

In response to this:

Fixes scale-up issue found here: #4083 (comment)
This should only merge after #4083 merges.

This PR changes the way the MCO finds the architecture of a machineset to this method. Originally, I was mapping the machineset to a node to determine it. However, for machinesets that have no nodes scaled up yet, this would cause an error, and the very first scale-up would take place with the older boot image. This fix only requires a label on the machineset to determine the architecture, if the label is not present the MCO will default to the control plane architecture.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-proxy-pull-test-container-v4.16.0-202401160407.p0.g8c06841.assembly.stream for distgit openshift-proxy-pull-test.
All builds following this will include this PR.

@djoshy djoshy deleted the manage-boot-images-scale-up-bug branch February 12, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants