OCPBUGS-24416: Sync MCN with node creation and deletion #4062

cdoern · 2023-12-07T14:19:57Z

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

openshift-ci-robot · 2023-12-07T14:20:04Z

@cdoern: This pull request references Jira Issue OCPBUGS-24416, which is invalid:

expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rioliu-rh · 2023-12-07T14:22:54Z

/hold

cdoern · 2023-12-07T14:54:49Z

/jira refresh

openshift-ci-robot · 2023-12-07T14:54:55Z

@cdoern: This pull request references Jira Issue OCPBUGS-24416, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.15.0) matches configured target version for branch (4.15.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

inesqyx · 2023-12-07T20:24:09Z

/test unit

rioliu-rh · 2023-12-08T07:10:56Z

build a cluster with this PR.
enabled featureGate: MachineConfigNodes

$ oc get featuregate/cluster -o yaml | yq -y '.spec'
featureSet: TechPreviewNoUpgrade

$ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[].enabled' | grep MachineConfigNodes
- name: MachineConfigNodes

scale down machineset, check whether corresponding mcn is removed as well.

$ oc scale --replicas=1 machineset/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a -n openshift-machine-api
machineset.machine.openshift.io/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a scaled

$ oc get machine -n openshift-machine-api | grep Deleting
ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a-9rwbf   Deleting   m6a.xlarge   us-east-1   us-east-1a   62m

$ oc get node | grep worker
NAME                          STATUS                     ROLES    AGE   VERSION
ip-10-0-19-203.ec2.internal   Ready,SchedulingDisabled   worker   60m   v1.28.4+9f68be2
ip-10-0-49-186.ec2.internal   Ready                      worker   55m   v1.28.4+9f68be2
ip-10-0-73-125.ec2.internal   Ready                      worker   55m   v1.28.4+9f68be2

# node ip-10-0-19-203.ec2.internal is deleted
$ oc get node/ip-10-0-19-203.ec2.internal
Error from server (NotFound): nodes "ip-10-0-19-203.ec2.internal" not found

# corresponding mcn is deleted as well
$ oc get machineconfignode ip-10-0-19-203.ec2.internal
Error from server (NotFound): machineconfignodes.machineconfiguration.openshift.io "ip-10-0-19-203.ec2.internal" not found

scale up machineset, check whether new mcn is created when new node is ready

$ oc scale --replicas=2 machineset/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a -n openshift-machine-api
machineset.machine.openshift.io/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a scaled

$ oc get machine -n openshift-machine-api | grep Provisioning
ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a-sjcrc   Provisioning   m6a.xlarge   us-east-1   us-east-1a   29s

# when the node is provisioning/provisioned, it's not ready, mcn does not exist
...
# when node is ready, mcn is created
$ oc get machine/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a-sjcrc -n openshift-machine-api  -o yaml | yq -y '.status.nodeRef.name'
ip-10-0-0-241.ec2.internal

$ oc get node/ip-10-0-0-241.ec2.internal
NAME                         STATUS   ROLES    AGE     VERSION
ip-10-0-0-241.ec2.internal   Ready    worker   2m11s   v1.28.4+9f68be2

$ oc get machineconfignode ip-10-0-0-241.ec2.internal
NAME                         UPDATED   UPDATEPREPARED   UPDATEEXECUTED   UPDATEPOSTACTIONCOMPLETE   UPDATECOMPLETE   RESUMED
ip-10-0-0-241.ec2.internal   True      False            False            False                      False            False

Issue: value of .spec.configVersion.desired in new mcn is incorrect, it is NotYetSet

$ oc get machineconfignode ip-10-0-0-241.ec2.internal -o yaml | yq -y '.spec'
configVersion:
  desired: NotYetSet
node:
  name: ip-10-0-0-241.ec2.internal
pool:
  name: worker

$ oc get machineconfignode ip-10-0-0-241.ec2.internal -o yaml | yq -y '.status.configVersion'
current: rendered-worker-4577d3da1cbbf95a97e61bdfcf052939
desired: rendered-worker-4577d3da1cbbf95a97e61bdfcf052939

@cdoern

cdoern · 2023-12-08T15:26:44Z

@rioliu-rh are there any errors in the daemon logs about updating the MCN objects?

rioliu-rh · 2023-12-11T07:22:59Z

@rioliu-rh are there any errors in the daemon logs about updating the MCN objects?

I can see below error in daemon log of new node

E1211 07:14:58.343469    2267 upgrade_monitor.go:192] Error applying MCN status: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-11-120.us-west-2.compute.internal" is invalid: configVersion.current: Invalid value: "": configVersion.current in body should match '^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$' Invalid 0xc0011927e0 422}})
E1211 07:14:58.343551    2267 daemon.go:749] Error making MCN for Resumed true: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-11-120.us-west-2.compute.internal" is invalid: configVersion.current: Invalid value: "": configVersion.current in body should match '^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$' Invalid 0xc0011927e0 422}})

finally, the value of spec.configVersion.desired can be updated

$ mcn ip-10-0-11-120.us-west-2.compute.internal -o yaml | yq -y '.spec.configVersion'
desired: rendered-worker-81008544a40254ec4bace9b1b61a1724

$ mcn ip-10-0-11-120.us-west-2.compute.internal -o yaml | yq -y '.status.configVersion'
current: rendered-worker-81008544a40254ec4bace9b1b61a1724
desired: rendered-worker-81008544a40254ec4bace9b1b61a1724

xref: mcn: aliased to oc get machineconfignodes

rioliu-rh · 2023-12-11T09:28:12Z

@cdoern WDYT about the error message above, seems like it is not related to spec.configVersion.desired

cdoern · 2023-12-11T16:48:19Z

@rioliu-rh I think I know the cause here. Applying patch, I am accidentally specifying the currentConfig at all times. I should only do this if it exists on the node annotation

rioliu-rh · 2023-12-12T03:59:36Z

build a new image to verify
scale up machineset to provision new node. when node is ready, check daemon log, now mcn related error found

2023-12-12T03:15:43.187171822+00:00 stderr F I1212 03:15:43.187110    2269 start.go:62] Version: machine-config-daemon-4.6.0-202006240615.p0-2473-gcdbc171e-dirty (cdbc171ea6cfccf4741582364be8324982d70ec4)
2023-12-12T03:15:43.187311395+00:00 stderr F I1212 03:15:43.187269    2269 update.go:2316] Running: mount --rbind /run/secrets /rootfs/run/secrets
2023-12-12T03:15:43.217279958+00:00 stderr F I1212 03:15:43.217253    2269 update.go:2316] Running: mount --rbind /usr/bin /rootfs/run/machine-config-daemon-bin
2023-12-12T03:15:43.221994703+00:00 stderr F I1212 03:15:43.221974    2269 daemon.go:472] container is rhel8, target is rhel9
2023-12-12T03:15:43.507893069+00:00 stderr F I1212 03:15:43.507848    2269 daemon.go:540] Invoking re-exec /run/bin/machine-config-daemon
2023-12-12T03:15:43.533152916+00:00 stderr F I1212 03:15:43.533112    2269 start.go:62] Version: machine-config-daemon-4.6.0-202006240615.p0-2473-gcdbc171e-dirty (cdbc171ea6cfccf4741582364be8324982d70ec4)
2023-12-12T03:15:43.533476664+00:00 stderr F I1212 03:15:43.533459    2269 update.go:2316] Running: systemctl daemon-reload
2023-12-12T03:15:44.025935958+00:00 stderr F I1212 03:15:44.025604    2269 rpm-ostree.go:88] Enabled workaround for bug 2111817
2023-12-12T03:15:44.027028554+00:00 stderr F I1212 03:15:44.027011    2269 rpm-ostree.go:263] Linking ostree authfile to /etc/mco/internal-registry-pull-secret.json
2023-12-12T03:15:44.190325845+00:00 stderr F I1212 03:15:44.190114    2269 daemon.go:280] Booted osImageURL: registry.build05.ci.openshift.org/ci-ln-5l3213b/stable@sha256:eec9959e717b7cf4a2e1b3e2ffee3928231723ff5ab471fae105ccfe49a9d5ef (415.92.202312082200-0) 1c03bcccf9dbd8257342a15dea7f285f9af85e0502bcf4be5526f97fd2fd1702
2023-12-12T03:15:44.191068003+00:00 stderr F I1212 03:15:44.191054    2269 start.go:125] overriding kubernetes api to https://api-int.ci-ln-5l3213b-76ef8.origin-ci-int-aws.dev.rhcloud.com:6443
2023-12-12T03:15:44.191664168+00:00 stderr F I1212 03:15:44.191645    2269 metrics.go:92] Registering Prometheus metrics
2023-12-12T03:15:44.191806581+00:00 stderr F I1212 03:15:44.191779    2269 metrics.go:99] Starting metrics listener on 127.0.0.1:8797
2023-12-12T03:15:44.216702160+00:00 stderr F W1212 03:15:44.216650    2269 controller_context.go:111] unable to get owner reference (falling back to namespace): replicasets.apps "machine-config-controller-757d56f64b" is forbidden: User "system:serviceaccount:openshift-machine-config-operator:machine-config-daemon" cannot get resource "replicasets" in API group "apps" in the namespace "openshift-machine-config-operator"
2023-12-12T03:15:44.218339569+00:00 stderr F I1212 03:15:44.218316    2269 simple_featuregate_reader.go:171] Starting feature-gate-detector
2023-12-12T03:15:44.219217641+00:00 stderr F I1212 03:15:44.219197    2269 writer.go:88] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig
2023-12-12T03:15:44.223478975+00:00 stderr F I1212 03:15:44.223440    2269 start.go:199] FeatureGates initialized: knownFeatureGates=[AdminNetworkPolicy AlibabaPlatform AutomatedEtcdBackup AzureWorkloadIdentity BuildCSIVolumes CSIDriverSharedResource CloudDualStackNodeIPs ClusterAPIInstall DNSNameResolver DisableKubeletCloudCredentialProviders DynamicResourceAllocation EventedPLEG ExternalCloudProvider ExternalCloudProviderAzure ExternalCloudProviderExternal ExternalCloudProviderGCP GCPClusterHostedDNS GCPLabelsTags GatewayAPI InsightsConfigAPI InstallAlternateInfrastructureAWS MachineAPIOperatorDisableMachineHealthCheckController MachineAPIProviderOpenStack MachineConfigNodes ManagedBootImages MaxUnavailableStatefulSet MetricsServer MixedCPUsAllocation NetworkLiveMigration NodeSwap OnClusterBuild OpenShiftPodSecurityAdmission PrivateHostedZoneAWS RouteExternalCertificate SignatureStores SigstoreImageVerification VSphereControlPlaneMachineSet VSphereStaticIPs ValidatingAdmissionPolicy]
2023-12-12T03:15:44.223511026+00:00 stderr F I1212 03:15:44.223489    2269 event.go:298] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-machine-config-operator", Name:"openshift-machine-config-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AdminNetworkPolicy", "AlibabaPlatform", "AutomatedEtcdBackup", "AzureWorkloadIdentity", "BuildCSIVolumes", "CSIDriverSharedResource", "CloudDualStackNodeIPs", "DNSNameResolver", "DynamicResourceAllocation", "ExternalCloudProvider", "ExternalCloudProviderAzure", "ExternalCloudProviderExternal", "ExternalCloudProviderGCP", "GCPClusterHostedDNS", "GCPLabelsTags", "GatewayAPI", "InsightsConfigAPI", "InstallAlternateInfrastructureAWS", "MachineAPIProviderOpenStack", "MachineConfigNodes", "ManagedBootImages", "MaxUnavailableStatefulSet", "MetricsServer", "MixedCPUsAllocation", "NetworkLiveMigration", "NodeSwap", "OnClusterBuild", "OpenShiftPodSecurityAdmission", "PrivateHostedZoneAWS", "RouteExternalCertificate", "SignatureStores", "SigstoreImageVerification", "VSphereControlPlaneMachineSet", "VSphereStaticIPs", "ValidatingAdmissionPolicy"}, Disabled:[]v1.FeatureGateName{"ClusterAPIInstall", "DisableKubeletCloudCredentialProviders", "EventedPLEG", "MachineAPIOperatorDisableMachineHealthCheckController"}}
2023-12-12T03:15:44.223561277+00:00 stderr F I1212 03:15:44.223509    2269 update.go:2331] Starting to manage node: ip-10-0-48-192.us-west-1.compute.internal
2023-12-12T03:15:44.228723313+00:00 stderr F I1212 03:15:44.228701    2269 rpm-ostree.go:308] Running captured: rpm-ostree status
2023-12-12T03:15:44.271416477+00:00 stderr F I1212 03:15:44.271369    2269 daemon.go:1598] State: idle
2023-12-12T03:15:44.271416477+00:00 stderr F Deployments:
2023-12-12T03:15:44.271416477+00:00 stderr F * ostree-unverified-registry:registry.build05.ci.openshift.org/ci-ln-5l3213b/stable@sha256:eec9959e717b7cf4a2e1b3e2ffee3928231723ff5ab471fae105ccfe49a9d5ef
2023-12-12T03:15:44.271416477+00:00 stderr F                    Digest: sha256:eec9959e717b7cf4a2e1b3e2ffee3928231723ff5ab471fae105ccfe49a9d5ef
2023-12-12T03:15:44.271416477+00:00 stderr F                   Version: 415.92.202312082200-0 (2023-12-12T03:14:45Z)
2023-12-12T03:15:44.271416477+00:00 stderr F
2023-12-12T03:15:44.271416477+00:00 stderr F   3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086
2023-12-12T03:15:44.271416477+00:00 stderr F                   Version: 415.92.202311241643-0 (2023-11-24T16:47:00Z)
2023-12-12T03:15:44.271965110+00:00 stderr F I1212 03:15:44.271933    2269 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
2023-12-12T03:15:44.271965110+00:00 stderr F {
2023-12-12T03:15:44.271965110+00:00 stderr F    "build": "415.92.202311241643-0",
2023-12-12T03:15:44.271965110+00:00 stderr F    "imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
2023-12-12T03:15:44.271965110+00:00 stderr F    "ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
2023-12-12T03:15:44.271965110+00:00 stderr F    "ref": ""
2023-12-12T03:15:44.271965110+00:00 stderr F }
2023-12-12T03:15:44.271994191+00:00 stderr F I1212 03:15:44.271986    2269 coreos.go:70] Ignition provisioning: time=2023-12-12T03:13:37Z
2023-12-12T03:15:44.271999481+00:00 stderr F I1212 03:15:44.271992    2269 rpm-ostree.go:308] Running captured: journalctl --list-boots
2023-12-12T03:15:44.277501965+00:00 stderr F I1212 03:15:44.277479    2269 daemon.go:1607] journalctl --list-boots:
2023-12-12T03:15:44.277501965+00:00 stderr F IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
2023-12-12T03:15:44.277501965+00:00 stderr F  -1 a2f9077437e941f7adf5646cb3787542 Tue 2023-12-12 03:13:27 UTC Tue 2023-12-12 03:14:55 UTC
2023-12-12T03:15:44.277501965+00:00 stderr F   0 dd3f4c054ea843c3b1b044df1ec2f20b Tue 2023-12-12 03:15:04 UTC Tue 2023-12-12 03:15:44 UTC
2023-12-12T03:15:44.277561097+00:00 stderr F I1212 03:15:44.277528    2269 rpm-ostree.go:308] Running captured: systemctl list-units --state=failed --no-legend
2023-12-12T03:15:44.282997509+00:00 stderr F I1212 03:15:44.282980    2269 daemon.go:1622] systemd service state: OK
2023-12-12T03:15:44.283032840+00:00 stderr F I1212 03:15:44.283026    2269 daemon.go:1243] Starting MachineConfigDaemon
2023-12-12T03:15:44.283075951+00:00 stderr F I1212 03:15:44.283069    2269 daemon.go:1250] Enabling Kubelet Healthz Monitor
2023-12-12T03:15:45.229467288+00:00 stderr F I1212 03:15:45.229418    2269 daemon.go:627] Node ip-10-0-48-192.us-west-1.compute.internal is not labeled node-role.kubernetes.io/master
2023-12-12T03:15:45.230142314+00:00 stderr F I1212 03:15:45.230121    2269 daemon.go:1770] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs1642409453 --cleanup
2023-12-12T03:15:45.254395447+00:00 stderr F [2023-12-12T03:15:45Z INFO  nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up
2023-12-12T03:15:45.254431118+00:00 stderr F [2023-12-12T03:15:45Z INFO  nmstatectl::persist_nic] /etc/systemd/network/.nmstate-persist.stamp does not exist, no need to clean up
2023-12-12T03:15:45.254472899+00:00 stderr P std::io::Error: No such file or directory (os error 2)
2023-12-12T03:15:45.254485449+00:00 stderr F
2023-12-12T03:15:45.254741085+00:00 stderr F I1212 03:15:45.254711    2269 daemon.go:1776] Cleanup error ignored: %!w(*exec.ExitError=&{0xc0004865d0 []})
2023-12-12T03:15:45.254814247+00:00 stderr F I1212 03:15:45.254782    2269 node.go:23] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-48-192.us-west-1.compute.internal: map[cloud.network.openshift.io/egress-ipconfig:[{"interface":"eni-069d2b1732d89ac68","ifaddr":{"ipv4":"10.0.0.0/18"},"capacity":{"ipv4":14,"ipv6":15}}] k8s.ovn.org/network-ids:{"default":"0"} k8s.ovn.org/node-gateway-router-lrp-ifaddr:{"ipv4":"100.64.0.5/16"} k8s.ovn.org/node-id:5 k8s.ovn.org/node-subnets:{"default":["10.131.0.0/23"]} k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.5/16"} machine.openshift.io/machine:openshift-machine-api/ci-ln-5l3213b-76ef8-bk2xn-worker-us-west-1a-zw6jj machineconfiguration.openshift.io/controlPlaneTopology:HighlyAvailable volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
2023-12-12T03:15:45.255397381+00:00 stderr F I1212 03:15:45.255385    2269 node.go:52] Setting initial node config: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:15:45.266515183+00:00 stderr F I1212 03:15:45.266496    2269 daemon.go:1495] In bootstrap mode
2023-12-12T03:15:45.273265018+00:00 stderr F I1212 03:15:45.273249    2269 daemon.go:1433] Previous boot ostree-finalize-staged.service appears successful
2023-12-12T03:15:45.273314049+00:00 stderr F I1212 03:15:45.273306    2269 daemon.go:1550] Current+desired config: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:15:45.273332649+00:00 stderr F I1212 03:15:45.273326    2269 daemon.go:1566] state: Done
2023-12-12T03:15:45.273379651+00:00 stderr F I1212 03:15:45.273359    2269 update.go:2316] Running: rpm-ostree cleanup -r
2023-12-12T03:16:03.262473862+00:00 stdout F Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
2023-12-12T03:16:22.702733769+00:00 stdout F Freed: 1.0?GB (pkgcache branches: 0)
2023-12-12T03:16:23.430963655+00:00 stderr F I1212 03:16:23.430915    2269 update.go:2331] No bootstrap pivot required; unlinking bootstrap node annotations
2023-12-12T03:16:23.435814396+00:00 stderr F I1212 03:16:23.435790    2269 daemon.go:1966] Validating against current config rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:16:23.436057532+00:00 stderr F I1212 03:16:23.436044    2269 daemon.go:1856] SSH key location update required. Moving SSH keys from "/home/core/.ssh/authorized_keys" to "/home/core/.ssh/authorized_keys.d/ignition".
2023-12-12T03:16:23.450432091+00:00 stderr F I1212 03:16:23.450406    2269 update.go:2037] updating SSH keys
2023-12-12T03:16:23.450669866+00:00 stderr F I1212 03:16:23.450654    2269 update.go:1938] Writing SSH keys to "/home/core/.ssh/authorized_keys.d/ignition"
2023-12-12T03:16:23.450713417+00:00 stderr F I1212 03:16:23.450705    2269 update.go:1903] Creating missing SSH key dir at /home/core/.ssh/authorized_keys.d
2023-12-12T03:16:24.644408181+00:00 stderr F I1212 03:16:24.644175    2269 rpm-ostree.go:308] Running captured: rpm-ostree kargs
2023-12-12T03:16:24.707736159+00:00 stderr F I1212 03:16:24.707702    2269 update.go:2331] Validated on-disk state
2023-12-12T03:16:24.762077812+00:00 stderr F I1212 03:16:24.762033    2269 daemon.go:2063] Completing update to target MachineConfig: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:16:34.813076702+00:00 stderr F I1212 03:16:34.813038    2269 update.go:2331] Update completed for config rendered-worker-a99da9545839a203c18a42bf6702b982 and node has been successfully uncordoned
2023-12-12T03:16:34.838661131+00:00 stderr F I1212 03:16:34.838341    2269 daemon.go:2088] In desired state MachineConfig: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:16:34.969651717+00:00 stderr F I1212 03:16:34.969615    2269 config_drift_monitor.go:246] Config Drift Monitor started
2023-12-12T03:16:44.402381613+00:00 stderr F I1212 03:16:44.402336    2269 certificate_writer.go:183] Certificate was synced from controllerconfig resourceVersion 27716

now status.configVersion.current is missed on all the nodes.

$ mcn ip-10-0-48-192.us-west-1.compute.internal -o yaml | yq -y '.status.configVersion'
desired: rendered-worker-a99da9545839a203c18a42bf6702b982

rioliu-rh · 2023-12-14T23:35:46Z

now status.configVersion.current is missed on all the nodes.

$ mcn ip-10-0-48-192.us-west-1.compute.internal -o yaml | yq -y '.status.configVersion'
desired: rendered-worker-a99da9545839a203c18a42bf6702b982

@cdoern Can you take a look at this issue, thx

cdoern · 2023-12-15T16:09:41Z

@rioliu-rh it was a typo... sorry. Pushing changes now

rioliu-rh · 2023-12-16T06:51:28Z

{  error occurred handling build machine-config-operator-amd64: the build machine-config-operator-amd64 failed after 2m20s with reason DockerBuildFailed: Dockerfile build strategy has failed.}

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4062/pull-ci-openshift-machine-config-operator-master-images/1735694006349729792

rioliu-rh · 2023-12-17T23:53:57Z

/retest-required

openshift-ci-robot · 2023-12-19T02:31:23Z

@cdoern: This pull request references Jira Issue OCPBUGS-24416, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

In response to this:

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cdoern · 2023-12-19T20:25:19Z

/retest-required

pkg/operator/sync.go

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status Signed-off-by: Charlie Doern <cdoern@redhat.com>

cdoern · 2024-01-05T14:05:33Z

/retest-required

yuqi-zhang

/lgtm

openshift-ci · 2024-01-08T23:41:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cdoern,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yuqi-zhang · 2024-01-08T23:42:13Z

/override e2e-gcp-op-single-node

openshift-ci · 2024-01-08T23:42:29Z

@yuqi-zhang: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

e2e-gcp-op-single-node

Only the following failed contexts/checkruns were expected:

ci/prow/bootstrap-unit
ci/prow/e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade
ci/prow/e2e-gcp-op
ci/prow/e2e-gcp-op-layering
ci/prow/e2e-gcp-op-single-node
ci/prow/e2e-hypershift
ci/prow/images
ci/prow/okd-images
ci/prow/okd-scos-e2e-aws-ovn
ci/prow/okd-scos-images
ci/prow/unit
ci/prow/verify
pull-ci-openshift-machine-config-operator-master-bootstrap-unit
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
pull-ci-openshift-machine-config-operator-master-e2e-hypershift
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-okd-scos-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify
tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yuqi-zhang · 2024-01-09T00:06:02Z

/override ci/prow/e2e-gcp-op-single-node

openshift-ci · 2024-01-09T00:06:18Z

@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2024-01-09T00:13:10Z

/retest-required

Remaining retests: 0 against base HEAD 7dae64a and 2 for PR HEAD c73baa5 in total

openshift-ci-robot · 2024-01-09T02:45:23Z

/retest-required

Remaining retests: 0 against base HEAD 84da3e5 and 1 for PR HEAD c73baa5 in total

cdoern · 2024-01-09T20:34:29Z

/override ci/prow/e2e-gcp-single-node

cdoern · 2024-01-09T20:34:43Z

/retest-required

openshift-ci · 2024-01-09T20:34:46Z

@cdoern: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

ci/prow/e2e-gcp-single-node

Only the following failed contexts/checkruns were expected:

ci/prow/bootstrap-unit
ci/prow/e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade
ci/prow/e2e-gcp-op
ci/prow/e2e-gcp-op-layering
ci/prow/e2e-gcp-op-single-node
ci/prow/e2e-hypershift
ci/prow/images
ci/prow/okd-images
ci/prow/okd-scos-e2e-aws-ovn
ci/prow/okd-scos-images
ci/prow/unit
ci/prow/verify
pull-ci-openshift-machine-config-operator-master-bootstrap-unit
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
pull-ci-openshift-machine-config-operator-master-e2e-hypershift
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-okd-scos-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify
tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override ci/prow/e2e-gcp-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cdoern · 2024-01-09T20:35:44Z

/override ci/prow/e2e-gcp-op-single-node

openshift-ci · 2024-01-09T20:36:06Z

@cdoern: Overrode contexts on behalf of cdoern: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cdoern · 2024-01-10T16:50:49Z

/retest-required

openshift-ci · 2024-01-10T18:17:29Z

@cdoern: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-op-layering	`c73baa5`	link	false	`/test e2e-gcp-op-layering`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cdoern · 2024-01-10T18:32:29Z

/override ci/prow/e2e-gcp-op-single-node

openshift-ci · 2024-01-10T18:35:50Z

@cdoern: Overrode contexts on behalf of cdoern: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2024-01-10T18:38:54Z

@cdoern: Jira Issue OCPBUGS-24416: All pull requests linked via external trackers have merged:

openshift/machine-config-operator#4062

Jira Issue OCPBUGS-24416 has been moved to the MODIFIED state.

In response to this:

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-bot · 2024-01-11T12:59:49Z

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-proxy-pull-test-container-v4.16.0-202401110907.p0.g8c4ca51.assembly.stream for distgit openshift-proxy-pull-test.
All builds following this will include this PR.

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 7, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2023

openshift-ci bot requested review from djoshy and dkhater-redhat December 7, 2023 14:25

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2023

cdoern force-pushed the mcn branch from f549b36 to 2d9f7b4 Compare December 7, 2023 14:52

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 7, 2023

openshift-ci bot requested a review from rioliu-rh December 7, 2023 14:55

cdoern force-pushed the mcn branch from 2d9f7b4 to 7a1944e Compare December 7, 2023 16:58

cdoern force-pushed the mcn branch from 7a1944e to 52073e5 Compare December 8, 2023 15:48

cdoern force-pushed the mcn branch from 52073e5 to 3ffb71f Compare December 11, 2023 16:54

cdoern force-pushed the mcn branch from 3ffb71f to 468f73b Compare December 15, 2023 16:10

cdoern force-pushed the mcn branch from 468f73b to 0e82a4d Compare December 18, 2023 14:17

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2023

cdoern force-pushed the mcn branch from 0e82a4d to b767d60 Compare December 19, 2023 15:33

yuqi-zhang reviewed Dec 19, 2023

View reviewed changes

pkg/operator/sync.go Show resolved Hide resolved

cdoern force-pushed the mcn branch from b767d60 to c73baa5 Compare January 4, 2024 15:08

yuqi-zhang approved these changes Jan 8, 2024

View reviewed changes

openshift-ci bot assigned yuqi-zhang Jan 8, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 8, 2024

openshift-merge-bot bot merged commit 8c4ca51 into openshift:master Jan 10, 2024
13 of 14 checks passed

OCPBUGS-24416: Sync MCN with node creation and deletion #4062

OCPBUGS-24416: Sync MCN with node creation and deletion #4062

Conversation

cdoern commented Dec 7, 2023

openshift-ci-robot commented Dec 7, 2023

rioliu-rh commented Dec 7, 2023

cdoern commented Dec 7, 2023

openshift-ci-robot commented Dec 7, 2023

inesqyx commented Dec 7, 2023

rioliu-rh commented Dec 8, 2023 • edited

cdoern commented Dec 8, 2023

rioliu-rh commented Dec 11, 2023

rioliu-rh commented Dec 11, 2023

cdoern commented Dec 11, 2023

rioliu-rh commented Dec 12, 2023

rioliu-rh commented Dec 14, 2023

cdoern commented Dec 15, 2023

rioliu-rh commented Dec 16, 2023

rioliu-rh commented Dec 17, 2023

openshift-ci-robot commented Dec 19, 2023

cdoern commented Dec 19, 2023

cdoern commented Jan 5, 2024

yuqi-zhang left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 8, 2024

yuqi-zhang commented Jan 8, 2024

openshift-ci bot commented Jan 8, 2024

yuqi-zhang commented Jan 9, 2024

openshift-ci bot commented Jan 9, 2024

openshift-ci-robot commented Jan 9, 2024

openshift-ci-robot commented Jan 9, 2024

cdoern commented Jan 9, 2024

cdoern commented Jan 9, 2024

openshift-ci bot commented Jan 9, 2024

cdoern commented Jan 9, 2024

openshift-ci bot commented Jan 9, 2024

cdoern commented Jan 10, 2024

openshift-ci bot commented Jan 10, 2024 • edited

cdoern commented Jan 10, 2024

openshift-ci bot commented Jan 10, 2024

openshift-ci-robot commented Jan 10, 2024

openshift-bot commented Jan 11, 2024

rioliu-rh commented Dec 8, 2023 •

edited

openshift-ci bot commented Jan 10, 2024 •

edited