Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installplan failed because of the error updating CRD #2057

Closed
Daniel-Fan opened this issue Mar 23, 2021 · 4 comments
Closed

Installplan failed because of the error updating CRD #2057

Daniel-Fan opened this issue Mar 23, 2021 · 4 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@Daniel-Fan
Copy link

Daniel-Fan commented Mar 23, 2021

Bug Report

What did you do?
Install a new operator from OperatorHub or upgrade the existing installed operator.

What did you expect to see?
the operator is installed or upgraded successfully

What did you see instead? Under which circumstances?
It is an intermittent issue, did not show up every time.
The operator is showing pending status.
The installplan is failed and reports the following status

conditions:
    - lastTransitionTime: '2021-03-10T16:12:09Z'
      lastUpdateTime: '2021-03-10T16:12:09Z'
      message: >-
        error updating CRD: commonservices.operator.ibm.com: Operation cannot be
        fulfilled on customresourcedefinitions.apiextensions.k8s.io
        "commonservices.operator.ibm.com": the object has been modified; please
        apply your changes to the latest version and try again
      reason: InstallComponentFailed
      status: 'False'
      type: Installed
  phase: Failed

Environment

  • operator-lifecycle-manager version: OCP 4.6.16
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2a7d2cd3ae6e1d2e0bab3682f86f916ca93d93aaedb04cff478a72063557f090
  • Kubernetes version information:
  • Kubernetes cluster kind:

Possible Solution
I think when the olm-operator updates the CRD of operator, it failed to match the resourceversion.

The User manual workaround is to uninstall the pending operator and re-install it from the OperatorHub

@Daniel-Fan Daniel-Fan added the kind/bug Categorizes issue or PR as related to a bug. label Mar 23, 2021
@horis233
Copy link
Contributor

Considering the installplan can't recovery by itself, users have to delete it and reinstall it.

Could OLM add some re-try logic for installplan instead of just return the error and block the installation?

Or

For this specific problem, could we use the patch command instead of the update when updating existing resources?

https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/catalog/step.go#L159

@benluddy
Copy link
Contributor

Thanks for taking the time to file this issue. Retries for InstallPlan step errors are coming soon (tracked downstream by https://issues.redhat.com/browse/OLM-2114), which should cover 409s as well as transient I/O errors and API server unavailability. In the meantime, you should also be able to patch .status.phase to "Installing" in order to make the catalog operator continue the existing plan.

@horis233
Copy link
Contributor

horis233 commented Mar 23, 2021

@benluddy

Thanks for your quick response. Patching .status.phase back to "Installing" sounds like a good solution.

@joelanford joelanford added this to the 0.18.0 milestone Apr 1, 2021
@exdx
Copy link
Member

exdx commented Apr 13, 2021

Addressed in #2090 which added retries to installplan execution. That should help the install succeed in spite of transient errors

@exdx exdx closed this as completed Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants