Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: fix removing key file timing #13830

Merged
merged 1 commit into from Mar 26, 2024
Merged

Conversation

cupnes
Copy link
Contributor

@cupnes cupnes commented Feb 28, 2024

The key file deletion process is in the shell script commonly used by all of encryption-open, encryption-open-metadata, and encryption-open-wal init containers. The key file is deleted at the encryption-open init container and encryption-open-metadata and encryption-open-wal init containers are failed to open the key file.

The key file is in the /etc/ceph folder. Unless that folder is shared, the key file anyway won't be available in the other init containers even if it is not deleted by these init containers. And it will naturally anyway be deleted after the init containers are completed. So The key file deletion process in shell scripts is unnecessary.

Fixes: #13737

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@cupnes cupnes marked this pull request as ready for review March 1, 2024 00:40
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cupnes Are you still testing this? I don't think this will fix the issue. The key file is in the /etc/ceph folder. If this folder needs to be shared with the other init containers, it will need to be mounted as a volume inside all the init containers. Unless that folder is shared, the key file anyway won't be available in the other init containers even if it is not deleted by the first init container.

And if you mount this as an EmptyDir to share between the containers, it will naturally anyway be deleted after the init containers are completed, and no need for a separate init container to delete the file.

@cupnes
Copy link
Contributor Author

cupnes commented Mar 7, 2024

@travisn

Are you still testing this?

I thought the test was finished. The CI of this PR has failed in two. The first is "TestHelmUpgradeSuite (v1.24.17)". I do not think that this PR is the cause, as the same failures have occurred outside of this PR[1]. The second is "govulncheck". About it, I am aware that you have put up an issue[2].

I don't think this will fix the issue. The key file is in the /etc/ceph folder. If this folder needs to be shared with the other init containers, it will need to be mounted as a volume inside all the init containers. Unless that folder is shared, the key file anyway won't be available in the other init containers even if it is not deleted by the first init container.

I think you're right, just I'm concerned about is that when I deleted the key file deletion in the encryption-open container, the encryption-open-metadata no longer fails with "Failed to open key file.".

And if you mount this as an EmptyDir to share between the containers, it will naturally anyway be deleted after the init containers are completed, and no need for a separate init container to delete the file.

However, I think that it would be better than my solution and will change this PR to your suggestion.

[1] https://github.com/rook/rook/actions/runs/8161015462/job/22308883169?pr=13878
[2] #13888

@travisn
Copy link
Member

travisn commented Mar 7, 2024

Ok interesting, you had tested successfully that your change was working? In code inspection I just didn't see the /etc/ceph was mounted, so I didn't expect it to work. In your tests, you also don't see the /etc/ceph volume mounted in the init containers, right?

@cupnes
Copy link
Contributor Author

cupnes commented Mar 8, 2024

I see, that may be so. I will answer next week as I cannot take the time to check today.

@cupnes
Copy link
Contributor Author

cupnes commented Mar 15, 2024

I have been busy with other matters and have not made much progress.

I'm currently able to reproduce the situation in issue #13737 with the Canary integration tests in PR #13773. I will apply Travis's suggestion here, and if I confirm that it resolves the issue, I will amend this PR to that effect.

@cupnes
Copy link
Contributor Author

cupnes commented Mar 18, 2024

@travisn
I'm sorry. I did not respond to your comment.

Ok interesting, you had tested successfully that your change was working?

Yes, however, I understand that it is not a good modification and I intend to modify it.

In code inspection I just didn't see the /etc/ceph was mounted, so I didn't expect it to work. In your tests, you also don't see the /etc/ceph volume mounted in the init containers, right?

The /etc/ceph volume mounted on the init container was shown as follows.

runner@fv-az739-110:~/work/rook/rook$ kubectl get pod -n rook-ceph rook-ceph-osd-0-88548ff98-gl94h -oyaml | less
apiVersion: v1
kind: Pod
...
spec:
  ...
  initContainers:
  ...
  - command:
    ...
    name: encryption-open
    ...
    volumeMounts:
    ...
    - mountPath: /etc/ceph
      name: osd-encryption-key
    ...
  - command:
    ...
    name: encryption-open-metadata
    ...
    volumeMounts:
    ...
    - mountPath: /etc/ceph
      name: osd-encryption-key
    ...
  volumes:
  ...
  - emptyDir:
      medium: Memory
    name: osd-encryption-key
  ...

Therefore, following your comment, I understand that the init container for deletion is not necessary and that it is sufficient to simply delete this line. Is this understanding correct?

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, following your comment, I understand that the init container for deletion is not necessary and that it is sufficient to simply delete this line. Is this understanding correct?

Yes, this sounds correct. Thanks for confirming that the /etc/ceph volume was indeed mounted. To confirm that the deletion is not necessary, you can connect to the main container and confirm the file does not exist.

@cupnes
Copy link
Contributor Author

cupnes commented Mar 19, 2024

@travisn

Yes, this sounds correct. Thanks for confirming that the /etc/ceph volume was indeed mounted. To confirm that the deletion is not necessary, you can connect to the main container and confirm the file does not exist.

Thank you. I made that fix and looked at the main container and found no key file.

runner@fv-az776-592:~/work/rook/rook$ kubectl -n rook-ceph exec rook-ceph-osd-0-7986dbc8f8-8vjrb -- ls -l /etc/ceph
Defaulted container "osd" out of: osd, blkdevmapper (init), blkdevmapper-metadata (init), encryption-kms-get-kek (init), encryption-open (init), encryption-open-metadata (init), blkdevmapper-encryption (init), blkdevmapper-metadata-encryption (init), encrypted-block-status (init), expand-encrypted-bluefs (init), activate (init), expand-bluefs (init), chown-container-data-dir (init)
total 0
lrwxrwxrwx 1 root root 16 Mar 19 00:34 ceph.conf -> ..data/ceph.conf
runner@fv-az776-592:~/work/rook/rook$

I will amend this PR with this plan.

The key file deletion process is in the shell script commonly used by
all of encryption-open, encryption-open-metadata, and
encryption-open-wal init containers. The key file is deleted at the
encryption-open init container and encryption-open-metadata and
encryption-open-wal init containers are failed to open the key file.

The key file is in the /etc/ceph folder. Unless that folder is shared,
the key file anyway won't be available in the other init containers
even if it is not deleted by these init containers. And it will
naturally anyway be deleted after the init containers are
completed. So The key file deletion process in shell scripts is
unnecessary.

Fixes: rook#13737

Signed-off-by: Yuma Ogami <yuma-ogami@cybozu.co.jp>
@cupnes
Copy link
Contributor Author

cupnes commented Mar 19, 2024

I fixed this PR. Please review.

@travisn travisn merged commit 6d9ecce into rook:master Mar 26, 2024
51 checks passed
mergify bot added a commit that referenced this pull request Mar 26, 2024
osd: fix removing key file timing (backport #13830)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OSD failing to start if encrypted and also have metadata device
2 participants