Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update / Patch to deployment during "replace" rejected by kubernetes, fails in a loop #952

Closed
tmckayus opened this issue Jul 12, 2019 · 3 comments
Labels
triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@tmckayus
Copy link

tmckayus commented Jul 12, 2019

Background: changes to selectors/labels on deployments.apps (and other objects) are not allowed by kubernetes, for the discussion behind that decision see kubernetes/kubernetes#50808.

During a "replace" operation on a CSV, the OLM will attempt to update/patch an existing deployment if the new CSV uses the same deployment name as the existing CSV. If the new CSV includes changes to the deployment(s) that are not allowed by kubernetes (ie a change to a selector/label field, for example), the replace operation will fail ("field is immutable" from kube). At this point, the OLM will periodically retry the replace, failing each time -- the old CSV will never successfully be replaced (but it will stay functional afaik)

If possible, this should be handled in a way that is more clear to an end user. Currently, it may not be clear to someone who isn't an advanced kubernetes/OLM user exactly what is happening and how to correct it.

From a user perspective, there are a couple of simple workarounds to this scenario:

  1. simply respin the CSV and use a different deployment name (long term solution, covers everybody going forward). This is more of a developer solution, during the crafting of a new pull request to update the CSV.

  2. delete the deployment(s) from the old CSV by hand. This will allow the current replace operation to continue. This is an immediate fix for the instance, more of an end user hot-fix.

High-level steps to reproduce

  1. Create/use a CSV for any old operator
  2. Create a "new" CSV to update the operator with a "replaces" value
  3. In the new CSV, change the spec for one of the deployments. Leave the name the same, but change the spec.selector.matchLabels and spec.template.metadata.labels fields
  4. Upload the bundle for your new CSV to quay
  5. Install the "old" CSV on a kubernetes/openshift cluster through the OLM
  6. Add an operatorsource (and whatever other associated objects you might need, depending on whether you're using kube/openshift) that references your new CSV
  7. Use "oc get clusterserviceversion" or the analagous kubectl command to watch the update process. You should see it enter a state like this, PHASE for 0.1.2 will remain "Replacing" forever, PHASE for 0.1.3 will cycle through "Pending", "Failed", and "InstallReady"
$ oc get clusterserviceversion
NAME                            DISPLAY           VERSION   REPLACES             PHASE
packageserver.v0.9.0    Package Server    0.9.0                                      Succeeded
myoperator.v0.1.2         My Operator         0.1.2                                      Replacing
myoperator.v0.1.3         My Operator         0.1.3       myoperator.v0.1.2   Pending
  1. look in the olm-operator pod in the olm namespace for errors, search for "field is immutable"
  2. if you want to free it up, delete the deployment(s) from the existing install and the replace operation will succeed
@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 26, 2020
@openshift-ci-robot openshift-ci-robot added triage/unresolved Indicates an issue that can not or will not be resolved. and removed wontfix labels Feb 27, 2020
@stale
Copy link

stale bot commented Apr 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ecordell
Copy link
Member

ecordell commented Jun 5, 2020

A couple of things to mention here:

  • In newer versions of OLM, deployments are handled similarly to how pods are managed via replicasets - we stamp a hash of the spec in the CSV on the deployment and compare that. If it needs updating, we issue an Update, not a patch.
  • During upgrades, if you need to change an otherwise immutable field of a deployment (like the label selector), you can signal this to olm by changing the "name" of the deployment in the CSV spec. This will cause OLM to create an entirely new deployment and delete the old.

Please re-open or open a new issue if we need further discussion / clarification.

@ecordell ecordell closed this as completed Jun 5, 2020
copybaranaut pushed a commit to pixie-io/pixie that referenced this issue Aug 31, 2021
Summary:
Noticed that operators were actually struggling to update to the latest operator version...
After doing some digging, it turns out it was unhappy because the latest operator version has new labels and deployment labels are immutable.
luckily, we can still move forward and update from this case: operator-framework/operator-lifecycle-manager#952
OLM assumes the deployment name is the same, so it tries to do a rolling deploy. however, changing the deployment name indicates to OLM that we want to completely replace the deployment.

suggestions for a better new deployment name are appreciated.
also fixed a bug with the release build where the rc's prev versions were incorrect.

Test Plan:
created test plan in checklist for verifying operator updates, and ran through the test plan:
https://www.notion.so/pixielabs/Operator-Release-Checklist-a705283f190c4c0aa127f9439bb34180

Reviewers: vihang, zasgar

Reviewed By: vihang

Differential Revision: https://phab.corp.pixielabs.ai/D9533

GitOrigin-RevId: 561964e
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/quay-bridge-operator that referenced this issue Feb 16, 2022
On v3.6.3 the selectors have changed. Selectors are, according to OLM,
an immutable field therefore we need to rename the deployment.

By renaming the deployment OLM will delete the previous deployment and
create a new one:

operator-framework/operator-lifecycle-manager#952 (comment)
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/quay-bridge-operator that referenced this issue Feb 17, 2022
On v3.6.3 the selectors have changed. Selectors are, according to OLM,
an immutable field therefore we need to rename the deployment.

By renaming the deployment OLM will delete the previous deployment and
create a new one:

operator-framework/operator-lifecycle-manager#952 (comment)
andreasgerstmayr added a commit to andreasgerstmayr/tempo-operator that referenced this issue May 31, 2023
Commit 7fa448a added additional labels
to the operator deployment and its selector field.

Unfortunately the selector field is immutable, and when OLM tries to patch
the deployment while upgrading, Kubernetes will reject this update.

A workaround is to rename the deployment, as suggested here:
operator-framework/operator-lifecycle-manager#952

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
andreasgerstmayr added a commit to andreasgerstmayr/tempo-operator that referenced this issue May 31, 2023
Commit 7fa448a added additional labels
to the operator deployment and its selector field.

Unfortunately the selector field is immutable, and when OLM tries to patch
the deployment while upgrading, Kubernetes will reject this update.

A workaround is to rename the deployment, as suggested here:
operator-framework/operator-lifecycle-manager#952

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
andreasgerstmayr added a commit to andreasgerstmayr/tempo-operator that referenced this issue Jun 1, 2023
Commit 7fa448a added additional labels
to the operator deployment and its selector field.

Unfortunately the selector field is immutable, and when OLM tries to patch
the deployment while upgrading, Kubernetes will reject this update.

A workaround is to rename the deployment, as suggested here:
operator-framework/operator-lifecycle-manager#952

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
andreasgerstmayr added a commit to grafana/tempo-operator that referenced this issue Jun 1, 2023
* Rename operator deployment

Commit 7fa448a added additional labels
to the operator deployment and its selector field.

Unfortunately the selector field is immutable, and when OLM tries to patch
the deployment while upgrading, Kubernetes will reject this update.

A workaround is to rename the deployment, as suggested here:
operator-framework/operator-lifecycle-manager#952

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>

* Update makefile

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>

* Rename to tempo-operator-controller, update changelog

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>

---------

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

3 participants