Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Ensure rook version label is not set on pod #11674

Merged
merged 2 commits into from
Feb 15, 2023

Conversation

travisn
Copy link
Member

@travisn travisn commented Feb 14, 2023

Description of your changes:
The rook-version tag must not be set on the pod spec labels since it will result in the ceph daemons being restarted every time there is a rook version update, even if the ceph version or pod spec was not otherwise updated.

The rook-version tag was being added to the OSD pod spec due to a shared pointer to the labels. The code intended to only add the version label to the deployment labels, but since the pod labels shared the map variable, the pod unintentionally also had the version label added.

Now the OSD pod will only be updated and restart if there is a ceph version change, or some other pod spec change on the OSD.

Surprisingly, this does not appear to be a regression, at least not any time recently.

Which issue is resolved by this Pull Request:
Resolves #11657

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

The RookVersionLabelMatchesCurrent() method is no longer
used, so remove it.

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
The rook-version tag must not be set on the pod spec labels
since it will result in the ceph daemons being restarted every
time there is a rook version update, even if the ceph version
or pod spec was not otherwise updated.

The rook-version tag was being added to the OSD pod spec
due to a shared pointer to the labels. The code intended
to only add the version label to the deployment labels,
but since the pod labels shared the map variable, the
pod unintentionally also had the version label added.

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
@zhucan
Copy link
Member

zhucan commented Feb 15, 2023

@travisn if the cluster had been deployed, upgrade the image version of the rook operator, the osd, mon, mgr and rgw still restart ? it can only be use for new cluster?

@zhucan
Copy link
Member

zhucan commented Feb 15, 2023

other question: the rook version label not only can not set on osd deployment, but the same with rgw, mon, mgr

@travisn
Copy link
Member Author

travisn commented Feb 15, 2023

@travisn if the cluster had been deployed, upgrade the image version of the rook operator, the osd, mon, mgr and rgw still restart ? it can only be use for new cluster?

If the rook operator is upgraded, the ceph daemons may or may not restart, it depends on if the pod specs change. Sometimes the pod specs change during upgrade, and sometimes they don't. For example, if a release only has an mgr fix for the mgr pod spec, only the mgr pod would be restarted during upgrade, while other daemons should not restart.

other question: the rook version label not only can not set on osd deployment, but the same with rgw, mon, mgr

I did not repro the issue with any other daemon except the osd. The rgw, mon, mgr did not have the rook-version label on the pod. They only have the rook-version label on the deployment. Do you see otherwise?

@zhucan
Copy link
Member

zhucan commented Feb 15, 2023

@travisn Thanks, but I mean before upgrade the version of operator image, the label of the "rook-version" under the pod is "v1.10.11-1.g61c9b998e.dirty"; after upgraded(the image build from the branch with yours), the label of the "rook-version" under the pod is empty, so the resource of the pod changed, the osd will restart.

@zhucan
Copy link
Member

zhucan commented Feb 15, 2023

Before upgrade:

  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rook-ceph-osd
        app.kubernetes.io/component: cephclusters.ceph.rook.io
        app.kubernetes.io/created-by: rook-ceph-operator
        app.kubernetes.io/instance: "0"
        app.kubernetes.io/managed-by: rook-ceph-operator
        app.kubernetes.io/name: ceph-osd
        app.kubernetes.io/part-of: rook-ceph
        ceph-osd-id: "0"
        ceph-version: 17.2.5-0
        ceph_daemon_id: "0"
        ceph_daemon_type: osd
        device-class: hdd
        failure-domain: rook-node01
        osd: "0"
        portable: "false"
        rook-version: v1.10.11-1.g61c9b998e.dirty
        rook.io/operator-namespace: rook-ceph
        rook_cluster: rook-ceph
        topology-location-host: rook-node01
        topology-location-root: default
      name: rook-ceph-osd

After upgraded:

  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rook-ceph-osd
        app.kubernetes.io/component: cephclusters.ceph.rook.io
        app.kubernetes.io/created-by: rook-ceph-operator
        app.kubernetes.io/instance: "0"
        app.kubernetes.io/managed-by: rook-ceph-operator
        app.kubernetes.io/name: ceph-osd
        app.kubernetes.io/part-of: rook-ceph
        ceph-osd-id: "0"
        ceph_daemon_id: "0"
        ceph_daemon_type: osd
        device-class: hdd
        failure-domain: rook-node01
        osd: "0"
        portable: "false"
        rook.io/operator-namespace: rook-ceph
        rook_cluster: rook-ceph
        topology-location-host: rook-node01
        topology-location-root: default
      name: rook-ceph-osd

@travisn
Copy link
Member Author

travisn commented Feb 15, 2023

@travisn Thanks, but I mean before upgrade the version of operator image, the label of the "rook-version" under the pod is "v1.10.11-1.g61c9b998e.dirty"; after upgraded(the image build from the branch with yours), the label of the "rook-version" under the pod is empty, so the resource of the pod changed, the osd will restart.

Correct, the first upgrade with this fix will cause the osd pods to restart since the label is removed. The advantage will be for future upgrades that won't have that issue anymore.

@travisn travisn merged commit 06eb431 into rook:master Feb 15, 2023
@travisn travisn deleted the osd-version-label branch February 15, 2023 19:21
mergify bot added a commit that referenced this pull request Feb 15, 2023
osd: Ensure rook version label is not set on pod (backport #11674)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the image version of operator, all daemons of the ceph will be restarted?
3 participants