osd: Ensure rook version label is not set on pod #11674

travisn · 2023-02-14T23:02:31Z

Description of your changes:
The rook-version tag must not be set on the pod spec labels since it will result in the ceph daemons being restarted every time there is a rook version update, even if the ceph version or pod spec was not otherwise updated.

The rook-version tag was being added to the OSD pod spec due to a shared pointer to the labels. The code intended to only add the version label to the deployment labels, but since the pod labels shared the map variable, the pod unintentionally also had the version label added.

Now the OSD pod will only be updated and restart if there is a ceph version change, or some other pod spec change on the OSD.

Surprisingly, this does not appear to be a regression, at least not any time recently.

Which issue is resolved by this Pull Request:
Resolves #11657

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

The RookVersionLabelMatchesCurrent() method is no longer used, so remove it. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

The rook-version tag must not be set on the pod spec labels since it will result in the ceph daemons being restarted every time there is a rook version update, even if the ceph version or pod spec was not otherwise updated. The rook-version tag was being added to the OSD pod spec due to a shared pointer to the labels. The code intended to only add the version label to the deployment labels, but since the pod labels shared the map variable, the pod unintentionally also had the version label added. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

zhucan · 2023-02-15T02:38:35Z

@travisn if the cluster had been deployed, upgrade the image version of the rook operator, the osd, mon, mgr and rgw still restart ？ it can only be use for new cluster?

zhucan · 2023-02-15T02:40:14Z

other question: the rook version label not only can not set on osd deployment, but the same with rgw, mon, mgr

travisn · 2023-02-15T03:24:42Z

@travisn if the cluster had been deployed, upgrade the image version of the rook operator, the osd, mon, mgr and rgw still restart ？ it can only be use for new cluster?

If the rook operator is upgraded, the ceph daemons may or may not restart, it depends on if the pod specs change. Sometimes the pod specs change during upgrade, and sometimes they don't. For example, if a release only has an mgr fix for the mgr pod spec, only the mgr pod would be restarted during upgrade, while other daemons should not restart.

other question: the rook version label not only can not set on osd deployment, but the same with rgw, mon, mgr

I did not repro the issue with any other daemon except the osd. The rgw, mon, mgr did not have the rook-version label on the pod. They only have the rook-version label on the deployment. Do you see otherwise?

zhucan · 2023-02-15T06:29:16Z

@travisn Thanks, but I mean before upgrade the version of operator image, the label of the "rook-version" under the pod is "v1.10.11-1.g61c9b998e.dirty"; after upgraded(the image build from the branch with yours), the label of the "rook-version" under the pod is empty, so the resource of the pod changed, the osd will restart.

zhucan · 2023-02-15T06:35:42Z

Before upgrade:

  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rook-ceph-osd
        app.kubernetes.io/component: cephclusters.ceph.rook.io
        app.kubernetes.io/created-by: rook-ceph-operator
        app.kubernetes.io/instance: "0"
        app.kubernetes.io/managed-by: rook-ceph-operator
        app.kubernetes.io/name: ceph-osd
        app.kubernetes.io/part-of: rook-ceph
        ceph-osd-id: "0"
        ceph-version: 17.2.5-0
        ceph_daemon_id: "0"
        ceph_daemon_type: osd
        device-class: hdd
        failure-domain: rook-node01
        osd: "0"
        portable: "false"
        rook-version: v1.10.11-1.g61c9b998e.dirty
        rook.io/operator-namespace: rook-ceph
        rook_cluster: rook-ceph
        topology-location-host: rook-node01
        topology-location-root: default
      name: rook-ceph-osd

After upgraded:

  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rook-ceph-osd
        app.kubernetes.io/component: cephclusters.ceph.rook.io
        app.kubernetes.io/created-by: rook-ceph-operator
        app.kubernetes.io/instance: "0"
        app.kubernetes.io/managed-by: rook-ceph-operator
        app.kubernetes.io/name: ceph-osd
        app.kubernetes.io/part-of: rook-ceph
        ceph-osd-id: "0"
        ceph_daemon_id: "0"
        ceph_daemon_type: osd
        device-class: hdd
        failure-domain: rook-node01
        osd: "0"
        portable: "false"
        rook.io/operator-namespace: rook-ceph
        rook_cluster: rook-ceph
        topology-location-host: rook-node01
        topology-location-root: default
      name: rook-ceph-osd

travisn · 2023-02-15T16:48:33Z

@travisn Thanks, but I mean before upgrade the version of operator image, the label of the "rook-version" under the pod is "v1.10.11-1.g61c9b998e.dirty"; after upgraded(the image build from the branch with yours), the label of the "rook-version" under the pod is empty, so the resource of the pod changed, the osd will restart.

Correct, the first upgrade with this fix will cause the osd pods to restart since the label is removed. The advantage will be for future upgrades that won't have that issue anymore.

osd: Ensure rook version label is not set on pod (backport #11674)

core: remove dead code for rook version labels

47d7c06

The RookVersionLabelMatchesCurrent() method is no longer used, so remove it. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>

travisn requested review from BlaineEXE and satoru-takeuchi February 14, 2023 23:02

travisn added the backport-release-1.10 label Feb 14, 2023

travisn force-pushed the osd-version-label branch from 97a6bea to 1ba93b4 Compare February 14, 2023 23:11

zhucan mentioned this pull request Feb 15, 2023

Is possible to ignore label under the pod? cisco-open/k8s-objectmatcher#65

Open

BlaineEXE approved these changes Feb 15, 2023

View reviewed changes

travisn merged commit 06eb431 into rook:master Feb 15, 2023

travisn deleted the osd-version-label branch February 15, 2023 19:21

mergify bot mentioned this pull request Feb 15, 2023

osd: Ensure rook version label is not set on pod (backport #11674) #11684

Merged

mergify bot added a commit that referenced this pull request Feb 15, 2023

Merge pull request #11684 from rook/mergify/bp/release-1.10/pr-11674

f752bec

osd: Ensure rook version label is not set on pod (backport #11674)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: Ensure rook version label is not set on pod #11674

osd: Ensure rook version label is not set on pod #11674

travisn commented Feb 14, 2023

zhucan commented Feb 15, 2023

zhucan commented Feb 15, 2023

travisn commented Feb 15, 2023

zhucan commented Feb 15, 2023

zhucan commented Feb 15, 2023 •

edited

travisn commented Feb 15, 2023

osd: Ensure rook version label is not set on pod #11674

osd: Ensure rook version label is not set on pod #11674

Conversation

travisn commented Feb 14, 2023

zhucan commented Feb 15, 2023

zhucan commented Feb 15, 2023

travisn commented Feb 15, 2023

zhucan commented Feb 15, 2023

zhucan commented Feb 15, 2023 • edited

travisn commented Feb 15, 2023

zhucan commented Feb 15, 2023 •

edited