Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-20152,OCPBUGS-22364: fix timing issues upon cluster installation or upgrade #3987

Closed
wants to merge 1 commit into from

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Oct 17, 2023

fix degrading on /etc/docker/certs.d not existing as well as cconfig not existing.

Also fix controllerconfig validation issues upon upgrade.

We seem to have some places where we assume the operator has beat everyone else to the punch in creating some directories and even some resources. We should never assume things exist and also we should not degrade on this type of race.

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 17, 2023
@openshift-ci-robot
Copy link
Contributor

@cdoern: This pull request references Jira Issue OCPBUGS-20152, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

fix degrading on /etc/docker/certs.d not existing as well as cconfig not existing.

We seem to have some places where we assume the operator has beat everyone else to the punch in creating some directories and even some resources. We should never assume things exist and also we should not degrade on this type of race.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2023
@cdoern cdoern force-pushed the timing branch 2 times, most recently from 9795d0c to 6f6375e Compare October 25, 2023 15:01
@cdoern cdoern changed the title OCPBUGS-20152: fix timing issues upon cluster installation OCPBUGS-20152: fix timing issues upon cluster installation or upgrade Oct 25, 2023
@cdoern cdoern changed the title OCPBUGS-20152: fix timing issues upon cluster installation or upgrade OCPBUGS-20152: OCPBUGS-22364: fix timing issues upon cluster installation or upgrade Oct 25, 2023
@cdoern
Copy link
Contributor Author

cdoern commented Oct 25, 2023

/payload-job periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 25, 2023

@cdoern: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a5a99250-7347-11ee-8c99-a9eeae8ad1fc-0

@rioliu-rh
Copy link

/hold for QE verification

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 26, 2023
@rioliu-rh
Copy link

/retitle OCPBUGS-20152,OCPBUGS-22364: fix timing issues upon cluster installation or upgrade

@openshift-ci openshift-ci bot changed the title OCPBUGS-20152: OCPBUGS-22364: fix timing issues upon cluster installation or upgrade OCPBUGS-20152,OCPBUGS-22364: fix timing issues upon cluster installation or upgrade Oct 26, 2023
@openshift-ci-robot
Copy link
Contributor

@cdoern: This pull request references Jira Issue OCPBUGS-20152, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-22364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

fix degrading on /etc/docker/certs.d not existing as well as cconfig not existing.

Also fix controllerconfig validation issues upon upgrade.

We seem to have some places where we assume the operator has beat everyone else to the punch in creating some directories and even some resources. We should never assume things exist and also we should not degrade on this type of race.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rioliu-rh
Copy link

rioliu-rh commented Oct 30, 2023

setup a cluster w/o capability ImageRegistry on 4.14.1

$ cv version -o yaml | yq -y '.status.capabilities'
enabledCapabilities:
  - OperatorLifecycleManager
knownCapabilities:
  - Build
  - CSISnapshot
  - Console
  - DeploymentConfig
  - ImageRegistry
  - Insights
  - MachineAPI
  - NodeTuning
  - OperatorLifecycleManager
  - Storage
  - baremetal
  - marketplace
  - openshift-samples
$ debug node/ip-10-0-69-28.us-east-2.compute.internal -- chroot /host stat /etc/docker/certs.d
stat: cannot statx '/etc/docker/certs.d': No such file or directory
error: non-zero exit code from debug container

upgrade cluster from 4.14.1 to CI image built based on this PR and #4003, 4.15.0-0.test-2023-10-30-071930-ci-ln-hb38gnb-latest
upgrade is success

$ cv version -o yaml | yq -y '.status.history'
- acceptedRisks: 'Target release version="" image="registry.build05.ci.openshift.org/ci-ln-hb38gnb/release:latest"
    cannot be verified, but continuing anyway because the update was forced: release
    images that are not accessed via digest cannot be verified

    Forced through blocking failures: Multiple precondition checks failed:

    * Precondition "EtcdRecentBackup" failed because of "ControllerStarted": RecentBackup:
    The etcd backup controller is starting, and will decide if recent backups are
    available or if a backup is required

    * Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate":
    RetrievedUpdates=False (VersionNotFound), so the recommended status of updating
    from 4.14.1 to 4.15.0-0.test-2023-10-30-071930-ci-ln-hb38gnb-latest is unknown.'
  completionTime: '2023-10-30T09:55:42Z'
  image: registry.build05.ci.openshift.org/ci-ln-hb38gnb/release:latest
  startedTime: '2023-10-30T08:55:05Z'
  state: Completed
  verified: false
  version: 4.15.0-0.test-2023-10-30-071930-ci-ln-hb38gnb-latest
- completionTime: '2023-10-30T08:00:07Z'
  image: quay.io/openshift-release-dev/ocp-release@sha256:05ba8e63f8a76e568afe87f182334504a01d47342b6ad5b4c3ff83a2463018bd
  startedTime: '2023-10-30T07:31:34Z'
  state: Completed
  verified: false
  version: 4.14.1

check keywords 'no such file or directory' in mcc pod log,

$ logs openshift-machine-config-operator -c machine-config-controller machine-config-controller-68444f8547-99qck | grep 'no such file or directory'
>> empty

check fields notAfter, notBefore in controller certs object

$ oc get controllerconfig/machine-config-controller -o yaml | yq -y '.status.controllerCertificates'
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2033-10-27T07:18:30Z'
  notBefore: '2023-10-30T07:18:30Z'
  signer: CN=admin-kubeconfig-signer,OU=openshift
  subject: CN=admin-kubeconfig-signer,OU=openshift
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2023-10-31T07:18:32Z'
  notBefore: '2023-10-30T07:34:56Z'
  signer: CN=kubelet-signer,OU=openshift
  subject: CN=kube-csr-signer_@1698651297
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2023-10-31T07:18:32Z'
  notBefore: '2023-10-30T07:18:32Z'
  signer: CN=kubelet-signer,OU=openshift
  subject: CN=kubelet-signer,OU=openshift
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2024-10-29T07:18:33Z'
  notBefore: '2023-10-30T07:18:33Z'
  signer: CN=kube-apiserver-to-kubelet-signer,OU=openshift
  subject: CN=kube-apiserver-to-kubelet-signer,OU=openshift
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2024-10-29T07:18:33Z'
  notBefore: '2023-10-30T07:18:33Z'
  signer: CN=kube-control-plane-signer,OU=openshift
  subject: CN=kube-control-plane-signer,OU=openshift
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2033-10-27T07:18:30Z'
  notBefore: '2023-10-30T07:18:30Z'
  signer: CN=kubelet-bootstrap-kubeconfig-signer,OU=openshift
  subject: CN=kubelet-bootstrap-kubeconfig-signer,OU=openshift
- bundleFile: KubeAPIServerServingCAData
  notAfter: '2024-10-29T07:34:54Z'
  notBefore: '2023-10-30T07:34:53Z'
  signer: CN=openshift-kube-apiserver-operator_node-system-admin-signer@1698651293
  subject: CN=openshift-kube-apiserver-operator_node-system-admin-signer@1698651293
- bundleFile: RootCAData
  notAfter: '2033-10-27T07:18:26Z'
  notBefore: '2023-10-30T07:18:26Z'
  signer: CN=root-ca,OU=openshift
  subject: CN=root-ca,OU=openshift

/unhold
/label qe-approved
/jira refresh

@openshift-ci-robot
Copy link
Contributor

@rioliu-rh: This pull request references Jira Issue OCPBUGS-20152, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

This pull request references Jira Issue OCPBUGS-22364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

setup a cluster w/o capability ImageRegistry on 4.14.1

$ cv version -o yaml | yq -y '.status.capabilities'
enabledCapabilities:
 - OperatorLifecycleManager
knownCapabilities:
 - Build
 - CSISnapshot
 - Console
 - DeploymentConfig
 - ImageRegistry
 - Insights
 - MachineAPI
 - NodeTuning
 - OperatorLifecycleManager
 - Storage
 - baremetal
 - marketplace
 - openshift-samples

upgrade cluster from 4.14.1 to CI image built based on this PR 4.15.0-0.test-2023-10-30-071930-ci-ln-hb38gnb-latest
upgrade is success

$ cv version -o yaml | yq -y '.status.history'
- acceptedRisks: 'Target release version="" image="registry.build05.ci.openshift.org/ci-ln-hb38gnb/release:latest"
   cannot be verified, but continuing anyway because the update was forced: release
   images that are not accessed via digest cannot be verified

   Forced through blocking failures: Multiple precondition checks failed:

   * Precondition "EtcdRecentBackup" failed because of "ControllerStarted": RecentBackup:
   The etcd backup controller is starting, and will decide if recent backups are
   available or if a backup is required

   * Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate":
   RetrievedUpdates=False (VersionNotFound), so the recommended status of updating
   from 4.14.1 to 4.15.0-0.test-2023-10-30-071930-ci-ln-hb38gnb-latest is unknown.'
 completionTime: '2023-10-30T09:55:42Z'
 image: registry.build05.ci.openshift.org/ci-ln-hb38gnb/release:latest
 startedTime: '2023-10-30T08:55:05Z'
 state: Completed
 verified: false
 version: 4.15.0-0.test-2023-10-30-071930-ci-ln-hb38gnb-latest
- completionTime: '2023-10-30T08:00:07Z'
 image: quay.io/openshift-release-dev/ocp-release@sha256:05ba8e63f8a76e568afe87f182334504a01d47342b6ad5b4c3ff83a2463018bd
 startedTime: '2023-10-30T07:31:34Z'
 state: Completed
 verified: false
 version: 4.14.1

check keywords 'no such file or directory' in mcc pod log,

$ logs openshift-machine-config-operator -c machine-config-controller machine-config-controller-68444f8547-99qck | grep 'no such file or directory'
>> empty

/unhold
/label qe-approved
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added qe-approved Signifies that QE has signed off on this PR and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 30, 2023
@rioliu-rh
Copy link

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 30, 2023
@openshift-ci-robot
Copy link
Contributor

@rioliu-rh: This pull request references Jira Issue OCPBUGS-20152, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

This pull request references Jira Issue OCPBUGS-22364, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 30, 2023
@cdoern cdoern force-pushed the timing branch 3 times, most recently from 5f0d86d to 740a867 Compare January 10, 2024 19:05
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2024
@cdoern cdoern force-pushed the timing branch 3 times, most recently from 0580591 to aee0b8a Compare January 25, 2024 20:28
Copy link
Member

@cheesesashimi cheesesashimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. I do have a few suggestions for you to have a look at.

pkg/operator/sync.go Outdated Show resolved Hide resolved
@@ -637,11 +645,105 @@ func (optr *Operator) syncCustomResourceDefinitions() error {
return err
}
}

if strings.Contains(crd, "controllerconfig") {
currentCR, err := optr.apiExtClient.ApiextensionsV1().CustomResourceDefinitions().Get(context.TODO(), "controllerconfigs.machineconfiguration.openshift.io", metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Hoist everything in the if strings.Contains(crd, "controllerconfig") block into a separate function or method so it's a bit easier to follow.

For the section where you continue, you can early return nil there to achieve the same effect.

pkg/operator/sync.go Outdated Show resolved Hide resolved
pkg/controller/kubelet-config/kubelet_config_controller.go Outdated Show resolved Hide resolved
@@ -475,6 +476,18 @@ func (ctrl *Controller) syncKubeletConfig(key string) error {
klog.V(4).Infof("Finished syncing kubeletconfig %q (%v)", key, time.Since(startTime))
}()

if err := wait.PollUntilContextTimeout(context.TODO(), ctrlcommon.ControllerConfigRolloutInterval, ctrlcommon.ControllerConfigTimeout, false, func(_ context.Context) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Hoist this wait code into a separate function for easier reuse since it looks like it is being used in multiple places.

func waitForControllerConfigCreation() error {
	return wait.PollUntilContextTimeout(context.TODO(), ctrlcommon.ControllerConfigRolloutInterval, ctrlcommon.ControllerConfigTimeout, false, func(_ context.Context) (bool, error) {
		// ...
	}
}

@@ -473,6 +474,18 @@ func (ctrl *Controller) syncKubeletConfig(key string) error {
klog.V(4).Infof("Finished syncing kubeletconfig %q (%v)", key, time.Since(startTime))
}()

if err := wait.PollUntilContextTimeout(context.TODO(), ctrlcommon.ControllerConfigRolloutInterval, ctrlcommon.ControllerConfigTimeout, false, func(_ context.Context) (bool, error) {
_, err := ctrl.client.MachineconfigurationV1().ControllerConfigs().Get(context.TODO(), ctrlcommon.ControllerConfigName, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth a CI cycle just to see if we can get away with using the lister instead. Although I don't know how difficult this particular issue is to reproduce there.

pkg/operator/sync.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing unit test is:
=== RUN TestUpdatesGeneratedMachineConfig

I0125 20:37:34.248250   20898 render_controller.go:436] Controller Config has not been created. Continuing context deadline exceeded
W0125 20:37:34.248323   20898 render_controller.go:580] No BaseOSContainerImage set
    render_controller_test.go:135: Expected
        	testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Name:"rendered-test-cluster-master-a39f193e46d2048532b91e0ddc0faceb"}
        got
        	testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"controllerconfigs"}, Subresource:""}, Name:"machine-config-controller"}
    render_controller_test.go:135: Expected
        	testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigs"}, Subresource:""}, Object:(*v1.MachineConfig)(0xc000007520)}
        got
        	testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"controllerconfigs"}, Subresource:""}, Name:"machine-config-controller"}
    render_controller_test.go:135: Expected
        	testing.UpdateActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"update", Resource:schema.GroupVersionResource{Group:"", Version:"", Resource:"machineconfigpools"}, Subresource:""}, Object:(*v1.MachineConfigPool)(0xc000248840)}
        got
        	testing.GetActionImpl{ActionImpl:testing.ActionImpl{Namespace:"", Verb:"get", Resource:schema.GroupVersionResource{Group:"machineconfiguration.openshift.io", Version:"v1", Resource:"controllerconfigs"}, Subresource:""}, Name:"machine-config-controller"}

Some of the fields have got new values now. Not sure but this could be related to API change and we need to update our test?

@cdoern cdoern force-pushed the timing branch 2 times, most recently from 494d0e2 to 04b336e Compare February 21, 2024 18:49
fix degrading on /etc/docker/certs.d not existing as well as cconfig not existing

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Copy link
Contributor

openshift-ci bot commented Feb 21, 2024

@cdoern: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-layering aee0b8a link false /test e2e-gcp-op-layering
ci/prow/e2e-azure-ovn-upgrade-out-of-change d5c0a90 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-aws-ovn d5c0a90 link true /test e2e-aws-ovn
ci/prow/e2e-gcp-op d5c0a90 link true /test e2e-gcp-op
ci/prow/okd-scos-e2e-aws-ovn d5c0a90 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade-out-of-change d5c0a90 link false /test e2e-aws-ovn-upgrade-out-of-change
ci/prow/unit d5c0a90 link true /test unit
ci/prow/e2e-gcp-op-techpreview d5c0a90 link false /test e2e-gcp-op-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 22, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 21, 2024
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jul 22, 2024
@openshift-ci-robot
Copy link
Contributor

@cdoern: This pull request references Jira Issue OCPBUGS-22364. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

fix degrading on /etc/docker/certs.d not existing as well as cconfig not existing.

Also fix controllerconfig validation issues upon upgrade.

We seem to have some places where we assume the operator has beat everyone else to the punch in creating some directories and even some resources. We should never assume things exist and also we should not degrade on this type of race.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jul 22, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants