Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-24416: Sync MCN with node creation and deletion #4062

Merged
merged 1 commit into from Jan 10, 2024

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Dec 7, 2023

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 7, 2023
@openshift-ci-robot
Copy link
Contributor

@cdoern: This pull request references Jira Issue OCPBUGS-24416, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rioliu-rh
Copy link

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2023
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2023
@cdoern
Copy link
Contributor Author

cdoern commented Dec 7, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 7, 2023
@openshift-ci-robot
Copy link
Contributor

@cdoern: This pull request references Jira Issue OCPBUGS-24416, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@inesqyx
Copy link
Contributor

inesqyx commented Dec 7, 2023

/test unit

@rioliu-rh
Copy link

rioliu-rh commented Dec 8, 2023

build a cluster with this PR.
enabled featureGate: MachineConfigNodes

$ oc get featuregate/cluster -o yaml | yq -y '.spec'
featureSet: TechPreviewNoUpgrade

$ oc get featuregate/cluster -o yaml | yq -y '.status.featureGates[].enabled' | grep MachineConfigNodes
- name: MachineConfigNodes

scale down machineset, check whether corresponding mcn is removed as well.

$ oc scale --replicas=1 machineset/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a -n openshift-machine-api
machineset.machine.openshift.io/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a scaled

$ oc get machine -n openshift-machine-api | grep Deleting
ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a-9rwbf   Deleting   m6a.xlarge   us-east-1   us-east-1a   62m

$ oc get node | grep worker
NAME                          STATUS                     ROLES    AGE   VERSION
ip-10-0-19-203.ec2.internal   Ready,SchedulingDisabled   worker   60m   v1.28.4+9f68be2
ip-10-0-49-186.ec2.internal   Ready                      worker   55m   v1.28.4+9f68be2
ip-10-0-73-125.ec2.internal   Ready                      worker   55m   v1.28.4+9f68be2

# node ip-10-0-19-203.ec2.internal is deleted
$ oc get node/ip-10-0-19-203.ec2.internal
Error from server (NotFound): nodes "ip-10-0-19-203.ec2.internal" not found

# corresponding mcn is deleted as well
$ oc get machineconfignode ip-10-0-19-203.ec2.internal
Error from server (NotFound): machineconfignodes.machineconfiguration.openshift.io "ip-10-0-19-203.ec2.internal" not found

scale up machineset, check whether new mcn is created when new node is ready

$ oc scale --replicas=2 machineset/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a -n openshift-machine-api
machineset.machine.openshift.io/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a scaled

$ oc get machine -n openshift-machine-api | grep Provisioning
ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a-sjcrc   Provisioning   m6a.xlarge   us-east-1   us-east-1a   29s

# when the node is provisioning/provisioned, it's not ready, mcn does not exist
...
# when node is ready, mcn is created
$ oc get machine/ci-ln-mtgxtvb-76ef8-h9mv4-worker-us-east-1a-sjcrc -n openshift-machine-api  -o yaml | yq -y '.status.nodeRef.name'
ip-10-0-0-241.ec2.internal

$ oc get node/ip-10-0-0-241.ec2.internal
NAME                         STATUS   ROLES    AGE     VERSION
ip-10-0-0-241.ec2.internal   Ready    worker   2m11s   v1.28.4+9f68be2

$ oc get machineconfignode ip-10-0-0-241.ec2.internal
NAME                         UPDATED   UPDATEPREPARED   UPDATEEXECUTED   UPDATEPOSTACTIONCOMPLETE   UPDATECOMPLETE   RESUMED
ip-10-0-0-241.ec2.internal   True      False            False            False                      False            False

Issue: value of .spec.configVersion.desired in new mcn is incorrect, it is NotYetSet

$ oc get machineconfignode ip-10-0-0-241.ec2.internal -o yaml | yq -y '.spec'
configVersion:
  desired: NotYetSet
node:
  name: ip-10-0-0-241.ec2.internal
pool:
  name: worker

$ oc get machineconfignode ip-10-0-0-241.ec2.internal -o yaml | yq -y '.status.configVersion'
current: rendered-worker-4577d3da1cbbf95a97e61bdfcf052939
desired: rendered-worker-4577d3da1cbbf95a97e61bdfcf052939

@cdoern

@cdoern
Copy link
Contributor Author

cdoern commented Dec 8, 2023

@rioliu-rh are there any errors in the daemon logs about updating the MCN objects?

@rioliu-rh
Copy link

@rioliu-rh are there any errors in the daemon logs about updating the MCN objects?

I can see below error in daemon log of new node

E1211 07:14:58.343469    2267 upgrade_monitor.go:192] Error applying MCN status: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-11-120.us-west-2.compute.internal" is invalid: configVersion.current: Invalid value: "": configVersion.current in body should match '^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$' Invalid 0xc0011927e0 422}})
E1211 07:14:58.343551    2267 daemon.go:749] Error making MCN for Resumed true: %!w(*errors.StatusError=&{{{ } {   <nil>} Failure MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-11-120.us-west-2.compute.internal" is invalid: configVersion.current: Invalid value: "": configVersion.current in body should match '^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$' Invalid 0xc0011927e0 422}})

finally, the value of spec.configVersion.desired can be updated

$ mcn ip-10-0-11-120.us-west-2.compute.internal -o yaml | yq -y '.spec.configVersion'
desired: rendered-worker-81008544a40254ec4bace9b1b61a1724

$ mcn ip-10-0-11-120.us-west-2.compute.internal -o yaml | yq -y '.status.configVersion'
current: rendered-worker-81008544a40254ec4bace9b1b61a1724
desired: rendered-worker-81008544a40254ec4bace9b1b61a1724

xref: mcn: aliased to oc get machineconfignodes

@rioliu-rh
Copy link

@cdoern WDYT about the error message above, seems like it is not related to spec.configVersion.desired

@cdoern
Copy link
Contributor Author

cdoern commented Dec 11, 2023

@rioliu-rh I think I know the cause here. Applying patch, I am accidentally specifying the currentConfig at all times. I should only do this if it exists on the node annotation

@rioliu-rh
Copy link

build a new image to verify
scale up machineset to provision new node. when node is ready, check daemon log, now mcn related error found

2023-12-12T03:15:43.187171822+00:00 stderr F I1212 03:15:43.187110    2269 start.go:62] Version: machine-config-daemon-4.6.0-202006240615.p0-2473-gcdbc171e-dirty (cdbc171ea6cfccf4741582364be8324982d70ec4)
2023-12-12T03:15:43.187311395+00:00 stderr F I1212 03:15:43.187269    2269 update.go:2316] Running: mount --rbind /run/secrets /rootfs/run/secrets
2023-12-12T03:15:43.217279958+00:00 stderr F I1212 03:15:43.217253    2269 update.go:2316] Running: mount --rbind /usr/bin /rootfs/run/machine-config-daemon-bin
2023-12-12T03:15:43.221994703+00:00 stderr F I1212 03:15:43.221974    2269 daemon.go:472] container is rhel8, target is rhel9
2023-12-12T03:15:43.507893069+00:00 stderr F I1212 03:15:43.507848    2269 daemon.go:540] Invoking re-exec /run/bin/machine-config-daemon
2023-12-12T03:15:43.533152916+00:00 stderr F I1212 03:15:43.533112    2269 start.go:62] Version: machine-config-daemon-4.6.0-202006240615.p0-2473-gcdbc171e-dirty (cdbc171ea6cfccf4741582364be8324982d70ec4)
2023-12-12T03:15:43.533476664+00:00 stderr F I1212 03:15:43.533459    2269 update.go:2316] Running: systemctl daemon-reload
2023-12-12T03:15:44.025935958+00:00 stderr F I1212 03:15:44.025604    2269 rpm-ostree.go:88] Enabled workaround for bug 2111817
2023-12-12T03:15:44.027028554+00:00 stderr F I1212 03:15:44.027011    2269 rpm-ostree.go:263] Linking ostree authfile to /etc/mco/internal-registry-pull-secret.json
2023-12-12T03:15:44.190325845+00:00 stderr F I1212 03:15:44.190114    2269 daemon.go:280] Booted osImageURL: registry.build05.ci.openshift.org/ci-ln-5l3213b/stable@sha256:eec9959e717b7cf4a2e1b3e2ffee3928231723ff5ab471fae105ccfe49a9d5ef (415.92.202312082200-0) 1c03bcccf9dbd8257342a15dea7f285f9af85e0502bcf4be5526f97fd2fd1702
2023-12-12T03:15:44.191068003+00:00 stderr F I1212 03:15:44.191054    2269 start.go:125] overriding kubernetes api to https://api-int.ci-ln-5l3213b-76ef8.origin-ci-int-aws.dev.rhcloud.com:6443
2023-12-12T03:15:44.191664168+00:00 stderr F I1212 03:15:44.191645    2269 metrics.go:92] Registering Prometheus metrics
2023-12-12T03:15:44.191806581+00:00 stderr F I1212 03:15:44.191779    2269 metrics.go:99] Starting metrics listener on 127.0.0.1:8797
2023-12-12T03:15:44.216702160+00:00 stderr F W1212 03:15:44.216650    2269 controller_context.go:111] unable to get owner reference (falling back to namespace): replicasets.apps "machine-config-controller-757d56f64b" is forbidden: User "system:serviceaccount:openshift-machine-config-operator:machine-config-daemon" cannot get resource "replicasets" in API group "apps" in the namespace "openshift-machine-config-operator"
2023-12-12T03:15:44.218339569+00:00 stderr F I1212 03:15:44.218316    2269 simple_featuregate_reader.go:171] Starting feature-gate-detector
2023-12-12T03:15:44.219217641+00:00 stderr F I1212 03:15:44.219197    2269 writer.go:88] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig
2023-12-12T03:15:44.223478975+00:00 stderr F I1212 03:15:44.223440    2269 start.go:199] FeatureGates initialized: knownFeatureGates=[AdminNetworkPolicy AlibabaPlatform AutomatedEtcdBackup AzureWorkloadIdentity BuildCSIVolumes CSIDriverSharedResource CloudDualStackNodeIPs ClusterAPIInstall DNSNameResolver DisableKubeletCloudCredentialProviders DynamicResourceAllocation EventedPLEG ExternalCloudProvider ExternalCloudProviderAzure ExternalCloudProviderExternal ExternalCloudProviderGCP GCPClusterHostedDNS GCPLabelsTags GatewayAPI InsightsConfigAPI InstallAlternateInfrastructureAWS MachineAPIOperatorDisableMachineHealthCheckController MachineAPIProviderOpenStack MachineConfigNodes ManagedBootImages MaxUnavailableStatefulSet MetricsServer MixedCPUsAllocation NetworkLiveMigration NodeSwap OnClusterBuild OpenShiftPodSecurityAdmission PrivateHostedZoneAWS RouteExternalCertificate SignatureStores SigstoreImageVerification VSphereControlPlaneMachineSet VSphereStaticIPs ValidatingAdmissionPolicy]
2023-12-12T03:15:44.223511026+00:00 stderr F I1212 03:15:44.223489    2269 event.go:298] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"openshift-machine-config-operator", Name:"openshift-machine-config-operator", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'FeatureGatesInitialized' FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{"AdminNetworkPolicy", "AlibabaPlatform", "AutomatedEtcdBackup", "AzureWorkloadIdentity", "BuildCSIVolumes", "CSIDriverSharedResource", "CloudDualStackNodeIPs", "DNSNameResolver", "DynamicResourceAllocation", "ExternalCloudProvider", "ExternalCloudProviderAzure", "ExternalCloudProviderExternal", "ExternalCloudProviderGCP", "GCPClusterHostedDNS", "GCPLabelsTags", "GatewayAPI", "InsightsConfigAPI", "InstallAlternateInfrastructureAWS", "MachineAPIProviderOpenStack", "MachineConfigNodes", "ManagedBootImages", "MaxUnavailableStatefulSet", "MetricsServer", "MixedCPUsAllocation", "NetworkLiveMigration", "NodeSwap", "OnClusterBuild", "OpenShiftPodSecurityAdmission", "PrivateHostedZoneAWS", "RouteExternalCertificate", "SignatureStores", "SigstoreImageVerification", "VSphereControlPlaneMachineSet", "VSphereStaticIPs", "ValidatingAdmissionPolicy"}, Disabled:[]v1.FeatureGateName{"ClusterAPIInstall", "DisableKubeletCloudCredentialProviders", "EventedPLEG", "MachineAPIOperatorDisableMachineHealthCheckController"}}
2023-12-12T03:15:44.223561277+00:00 stderr F I1212 03:15:44.223509    2269 update.go:2331] Starting to manage node: ip-10-0-48-192.us-west-1.compute.internal
2023-12-12T03:15:44.228723313+00:00 stderr F I1212 03:15:44.228701    2269 rpm-ostree.go:308] Running captured: rpm-ostree status
2023-12-12T03:15:44.271416477+00:00 stderr F I1212 03:15:44.271369    2269 daemon.go:1598] State: idle
2023-12-12T03:15:44.271416477+00:00 stderr F Deployments:
2023-12-12T03:15:44.271416477+00:00 stderr F * ostree-unverified-registry:registry.build05.ci.openshift.org/ci-ln-5l3213b/stable@sha256:eec9959e717b7cf4a2e1b3e2ffee3928231723ff5ab471fae105ccfe49a9d5ef
2023-12-12T03:15:44.271416477+00:00 stderr F                    Digest: sha256:eec9959e717b7cf4a2e1b3e2ffee3928231723ff5ab471fae105ccfe49a9d5ef
2023-12-12T03:15:44.271416477+00:00 stderr F                   Version: 415.92.202312082200-0 (2023-12-12T03:14:45Z)
2023-12-12T03:15:44.271416477+00:00 stderr F
2023-12-12T03:15:44.271416477+00:00 stderr F   3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086
2023-12-12T03:15:44.271416477+00:00 stderr F                   Version: 415.92.202311241643-0 (2023-11-24T16:47:00Z)
2023-12-12T03:15:44.271965110+00:00 stderr F I1212 03:15:44.271933    2269 coreos.go:53] CoreOS aleph version: mtime=2023-11-24 16:50:34.214 +0000 UTC
2023-12-12T03:15:44.271965110+00:00 stderr F {
2023-12-12T03:15:44.271965110+00:00 stderr F    "build": "415.92.202311241643-0",
2023-12-12T03:15:44.271965110+00:00 stderr F    "imgid": "rhcos-415.92.202311241643-0-qemu.x86_64.qcow2",
2023-12-12T03:15:44.271965110+00:00 stderr F    "ostree-commit": "3aff20eacec06af854303111319e74d9dc84c241af5c57dc8ae3330a8ae5b086",
2023-12-12T03:15:44.271965110+00:00 stderr F    "ref": ""
2023-12-12T03:15:44.271965110+00:00 stderr F }
2023-12-12T03:15:44.271994191+00:00 stderr F I1212 03:15:44.271986    2269 coreos.go:70] Ignition provisioning: time=2023-12-12T03:13:37Z
2023-12-12T03:15:44.271999481+00:00 stderr F I1212 03:15:44.271992    2269 rpm-ostree.go:308] Running captured: journalctl --list-boots
2023-12-12T03:15:44.277501965+00:00 stderr F I1212 03:15:44.277479    2269 daemon.go:1607] journalctl --list-boots:
2023-12-12T03:15:44.277501965+00:00 stderr F IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
2023-12-12T03:15:44.277501965+00:00 stderr F  -1 a2f9077437e941f7adf5646cb3787542 Tue 2023-12-12 03:13:27 UTC Tue 2023-12-12 03:14:55 UTC
2023-12-12T03:15:44.277501965+00:00 stderr F   0 dd3f4c054ea843c3b1b044df1ec2f20b Tue 2023-12-12 03:15:04 UTC Tue 2023-12-12 03:15:44 UTC
2023-12-12T03:15:44.277561097+00:00 stderr F I1212 03:15:44.277528    2269 rpm-ostree.go:308] Running captured: systemctl list-units --state=failed --no-legend
2023-12-12T03:15:44.282997509+00:00 stderr F I1212 03:15:44.282980    2269 daemon.go:1622] systemd service state: OK
2023-12-12T03:15:44.283032840+00:00 stderr F I1212 03:15:44.283026    2269 daemon.go:1243] Starting MachineConfigDaemon
2023-12-12T03:15:44.283075951+00:00 stderr F I1212 03:15:44.283069    2269 daemon.go:1250] Enabling Kubelet Healthz Monitor
2023-12-12T03:15:45.229467288+00:00 stderr F I1212 03:15:45.229418    2269 daemon.go:627] Node ip-10-0-48-192.us-west-1.compute.internal is not labeled node-role.kubernetes.io/master
2023-12-12T03:15:45.230142314+00:00 stderr F I1212 03:15:45.230121    2269 daemon.go:1770] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs1642409453 --cleanup
2023-12-12T03:15:45.254395447+00:00 stderr F [2023-12-12T03:15:45Z INFO  nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up
2023-12-12T03:15:45.254431118+00:00 stderr F [2023-12-12T03:15:45Z INFO  nmstatectl::persist_nic] /etc/systemd/network/.nmstate-persist.stamp does not exist, no need to clean up
2023-12-12T03:15:45.254472899+00:00 stderr P std::io::Error: No such file or directory (os error 2)
2023-12-12T03:15:45.254485449+00:00 stderr F
2023-12-12T03:15:45.254741085+00:00 stderr F I1212 03:15:45.254711    2269 daemon.go:1776] Cleanup error ignored: %!w(*exec.ExitError=&{0xc0004865d0 []})
2023-12-12T03:15:45.254814247+00:00 stderr F I1212 03:15:45.254782    2269 node.go:23] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-48-192.us-west-1.compute.internal: map[cloud.network.openshift.io/egress-ipconfig:[{"interface":"eni-069d2b1732d89ac68","ifaddr":{"ipv4":"10.0.0.0/18"},"capacity":{"ipv4":14,"ipv6":15}}] k8s.ovn.org/network-ids:{"default":"0"} k8s.ovn.org/node-gateway-router-lrp-ifaddr:{"ipv4":"100.64.0.5/16"} k8s.ovn.org/node-id:5 k8s.ovn.org/node-subnets:{"default":["10.131.0.0/23"]} k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.5/16"} machine.openshift.io/machine:openshift-machine-api/ci-ln-5l3213b-76ef8-bk2xn-worker-us-west-1a-zw6jj machineconfiguration.openshift.io/controlPlaneTopology:HighlyAvailable volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
2023-12-12T03:15:45.255397381+00:00 stderr F I1212 03:15:45.255385    2269 node.go:52] Setting initial node config: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:15:45.266515183+00:00 stderr F I1212 03:15:45.266496    2269 daemon.go:1495] In bootstrap mode
2023-12-12T03:15:45.273265018+00:00 stderr F I1212 03:15:45.273249    2269 daemon.go:1433] Previous boot ostree-finalize-staged.service appears successful
2023-12-12T03:15:45.273314049+00:00 stderr F I1212 03:15:45.273306    2269 daemon.go:1550] Current+desired config: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:15:45.273332649+00:00 stderr F I1212 03:15:45.273326    2269 daemon.go:1566] state: Done
2023-12-12T03:15:45.273379651+00:00 stderr F I1212 03:15:45.273359    2269 update.go:2316] Running: rpm-ostree cleanup -r
2023-12-12T03:16:03.262473862+00:00 stdout F Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
2023-12-12T03:16:22.702733769+00:00 stdout F Freed: 1.0?GB (pkgcache branches: 0)
2023-12-12T03:16:23.430963655+00:00 stderr F I1212 03:16:23.430915    2269 update.go:2331] No bootstrap pivot required; unlinking bootstrap node annotations
2023-12-12T03:16:23.435814396+00:00 stderr F I1212 03:16:23.435790    2269 daemon.go:1966] Validating against current config rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:16:23.436057532+00:00 stderr F I1212 03:16:23.436044    2269 daemon.go:1856] SSH key location update required. Moving SSH keys from "/home/core/.ssh/authorized_keys" to "/home/core/.ssh/authorized_keys.d/ignition".
2023-12-12T03:16:23.450432091+00:00 stderr F I1212 03:16:23.450406    2269 update.go:2037] updating SSH keys
2023-12-12T03:16:23.450669866+00:00 stderr F I1212 03:16:23.450654    2269 update.go:1938] Writing SSH keys to "/home/core/.ssh/authorized_keys.d/ignition"
2023-12-12T03:16:23.450713417+00:00 stderr F I1212 03:16:23.450705    2269 update.go:1903] Creating missing SSH key dir at /home/core/.ssh/authorized_keys.d
2023-12-12T03:16:24.644408181+00:00 stderr F I1212 03:16:24.644175    2269 rpm-ostree.go:308] Running captured: rpm-ostree kargs
2023-12-12T03:16:24.707736159+00:00 stderr F I1212 03:16:24.707702    2269 update.go:2331] Validated on-disk state
2023-12-12T03:16:24.762077812+00:00 stderr F I1212 03:16:24.762033    2269 daemon.go:2063] Completing update to target MachineConfig: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:16:34.813076702+00:00 stderr F I1212 03:16:34.813038    2269 update.go:2331] Update completed for config rendered-worker-a99da9545839a203c18a42bf6702b982 and node has been successfully uncordoned
2023-12-12T03:16:34.838661131+00:00 stderr F I1212 03:16:34.838341    2269 daemon.go:2088] In desired state MachineConfig: rendered-worker-a99da9545839a203c18a42bf6702b982
2023-12-12T03:16:34.969651717+00:00 stderr F I1212 03:16:34.969615    2269 config_drift_monitor.go:246] Config Drift Monitor started
2023-12-12T03:16:44.402381613+00:00 stderr F I1212 03:16:44.402336    2269 certificate_writer.go:183] Certificate was synced from controllerconfig resourceVersion 27716

now status.configVersion.current is missed on all the nodes.

$ mcn ip-10-0-48-192.us-west-1.compute.internal -o yaml | yq -y '.status.configVersion'
desired: rendered-worker-a99da9545839a203c18a42bf6702b982

@rioliu-rh
Copy link

now status.configVersion.current is missed on all the nodes.

$ mcn ip-10-0-48-192.us-west-1.compute.internal -o yaml | yq -y '.status.configVersion'
desired: rendered-worker-a99da9545839a203c18a42bf6702b982

@cdoern Can you take a look at this issue, thx

@cdoern
Copy link
Contributor Author

cdoern commented Dec 15, 2023

@rioliu-rh it was a typo... sorry. Pushing changes now

@rioliu-rh
Copy link

{  error occurred handling build machine-config-operator-amd64: the build machine-config-operator-amd64 failed after 2m20s with reason DockerBuildFailed: Dockerfile build strategy has failed.}

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4062/pull-ci-openshift-machine-config-operator-master-images/1735694006349729792

@rioliu-rh
Copy link

/retest-required

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2023
@openshift-ci-robot
Copy link
Contributor

@cdoern: This pull request references Jira Issue OCPBUGS-24416, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

In response to this:

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cdoern
Copy link
Contributor Author

cdoern commented Dec 19, 2023

/retest-required

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only
create an MCN  once the node is out of the Provisioning and Pending Status

Signed-off-by: Charlie Doern <cdoern@redhat.com>
@cdoern
Copy link
Contributor Author

cdoern commented Jan 5, 2024

/retest-required

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 8, 2024
Copy link
Contributor

openshift-ci bot commented Jan 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuqi-zhang
Copy link
Contributor

/override e2e-gcp-op-single-node

Copy link
Contributor

openshift-ci bot commented Jan 8, 2024

@yuqi-zhang: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • e2e-gcp-op-single-node

Only the following failed contexts/checkruns were expected:

  • ci/prow/bootstrap-unit
  • ci/prow/e2e-aws-ovn
  • ci/prow/e2e-aws-ovn-upgrade
  • ci/prow/e2e-gcp-op
  • ci/prow/e2e-gcp-op-layering
  • ci/prow/e2e-gcp-op-single-node
  • ci/prow/e2e-hypershift
  • ci/prow/images
  • ci/prow/okd-images
  • ci/prow/okd-scos-e2e-aws-ovn
  • ci/prow/okd-scos-images
  • ci/prow/unit
  • ci/prow/verify
  • pull-ci-openshift-machine-config-operator-master-bootstrap-unit
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-hypershift
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-okd-scos-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yuqi-zhang
Copy link
Contributor

/override ci/prow/e2e-gcp-op-single-node

Copy link
Contributor

openshift-ci bot commented Jan 9, 2024

@yuqi-zhang: Overrode contexts on behalf of yuqi-zhang: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 7dae64a and 2 for PR HEAD c73baa5 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 84da3e5 and 1 for PR HEAD c73baa5 in total

@cdoern
Copy link
Contributor Author

cdoern commented Jan 9, 2024

/override ci/prow/e2e-gcp-single-node

@cdoern
Copy link
Contributor Author

cdoern commented Jan 9, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Jan 9, 2024

@cdoern: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • ci/prow/e2e-gcp-single-node

Only the following failed contexts/checkruns were expected:

  • ci/prow/bootstrap-unit
  • ci/prow/e2e-aws-ovn
  • ci/prow/e2e-aws-ovn-upgrade
  • ci/prow/e2e-gcp-op
  • ci/prow/e2e-gcp-op-layering
  • ci/prow/e2e-gcp-op-single-node
  • ci/prow/e2e-hypershift
  • ci/prow/images
  • ci/prow/okd-images
  • ci/prow/okd-scos-e2e-aws-ovn
  • ci/prow/okd-scos-images
  • ci/prow/unit
  • ci/prow/verify
  • pull-ci-openshift-machine-config-operator-master-bootstrap-unit
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-hypershift
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-okd-scos-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override ci/prow/e2e-gcp-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 9, 2024

/override ci/prow/e2e-gcp-op-single-node

Copy link
Contributor

openshift-ci bot commented Jan 9, 2024

@cdoern: Overrode contexts on behalf of cdoern: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 10, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Jan 10, 2024

@cdoern: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-layering c73baa5 link false /test e2e-gcp-op-layering

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 10, 2024

/override ci/prow/e2e-gcp-op-single-node

Copy link
Contributor

openshift-ci bot commented Jan 10, 2024

@cdoern: Overrode contexts on behalf of cdoern: ci/prow/e2e-gcp-op-single-node

In response to this:

/override ci/prow/e2e-gcp-op-single-node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 8c4ca51 into openshift:master Jan 10, 2024
13 of 14 checks passed
@openshift-ci-robot
Copy link
Contributor

@cdoern: Jira Issue OCPBUGS-24416: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-24416 has been moved to the MODIFIED state.

In response to this:

the MCN object was not reacting to node deletions. the MCO was also creating these objects too early in their lifecycle. We should only create an MCN once the node is out of the Provisioning and Pending Status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-proxy-pull-test-container-v4.16.0-202401110907.p0.g8c4ca51.assembly.stream for distgit openshift-proxy-pull-test.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants