ceph: fix monitor pvc storage on ebs #3594

dotnwat · 2019-08-08T22:18:53Z

Description of your changes:

there is a build here for testing noahdesu/rook-ceph-master-no-lifecycle-e

initializes ownership on files using init container rather than life cycle hook. see commit message for explanation.
make sure that pvc file system mount uses empty directory for daemon data

ext4 adds some extra directories in the root, so this PR mounts in a subdirectory.

[vm 0 ~]$ oc -n rook-ceph log rook-ceph-mon-a-5cf4748c6-nvmzd -c list-container-data-dir
total 20
drwxrwsr-x. 3 root 1000630000  4096 Aug  8 20:06 .
drwxr-x---. 1 ceph ceph          20 Aug  8 20:06 ..
drwxrwS---. 2 root 1000630000 16384 Aug  8 20:06 lost+found

Which issue is resolved by this Pull Request:
Resolves #3591

Checklist:

Reviewed the developer guide on Submitting a Pull Request
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.
Pending release notes updated with breaking and/or notable changes, if necessary.
Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
Code generation (make codegen) has been run to update object specifications, if necessary.
Comments have been added or updated based on the standards set in CONTRIBUTING.md
Add the flag for skipping the CI if this PR does not require a build. See here for more details.

travisn · 2019-08-08T22:30:47Z

pkg/operator/ceph/cluster/mon/spec.go

@@ -283,7 +304,6 @@ func (c *Cluster) makeMonDaemonContainer(monConfig *monConfig) v1.Container {
 			k8sutil.PodIPEnvVar(podIPEnvVar),
 		),
 		Resources: cephv1.GetMonResources(c.spec.Resources),
-		Lifecycle: opspec.PodLifeCycle(""),


Do we need to consider this change for other daemons? For now, maybe we should leave this line (but commented out) and add a comment about why it's not needed. Otherwise, someone might think it is missing since all the other daemons have it and put it back.

Why would we only be seeing this now for mons on PVs? That would affect the data dir, but remind me why is the log dir affected as well?

There are two issues here. According to the docs, using a post start life cycle for anything that the entry point depends on is a race. Assuming that's a correct reading and the chown is in fact a dependency, that's the primary reason I left this patch in. In that context, it seems like all usages are incorrect if the intention is to set permissions for the main container.

The second issue is the problem we saw with proc/<pid>/ipc file not being found when chown was being run in the life cycle hook. I don't have a root cause for that, but it doesn't seem to show up the chown is performed in an init container.

A similar error here https://bugzilla.redhat.com/show_bug.cgi?id=1701326#c6 and someone mentioning a race between CRI-O and kubelet here cri-o/cri-o#1927 (comment)

Presumably postStart hooks are intended to observe containers that are fully initialized, but if anything would exacerbate an existing startup race, I think PVC attachments qualify (even if the mount related to the race is hostpath).

added a note on the lifecycle hook linking back here for the discussion

pkg/operator/ceph/cluster/mon/spec.go

the life cycle post-start hook is used to chown the ceph daemon log directory. however, hooks are not guaranteed to run before a container entry point. "Kubernetes sends the postStart event immediately after the Container is created. There is no guarantee, however, that the postStart handler is called before the Container’s entrypoint is called" Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

when the root of a filesystem mode pvc is mounted as the base data directory for the monitor it will not be empty: `lost+found` from ext4 will be present. this causes the mon mkfs to fail because it expects an empty directory. the solution is to mount under a directory in the source volume. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

Following the Ceph OSD rework, my test environment regularly failed to start OSDs, as they would start before the postStart lifecyle hook which was chown'ing the log and data directories could finish. Pods failed to start for an arbitrarily long time before starting successfully. This was a manifestation of a race condition identified in Ceph monitors in rook#3594 (comment) Create a Ceph operator-level method to create an init container that will chown the necessary directories for all Ceph daemons (any daemon which runs as the ceph:ceph user-and-group). Also add this container to expectations for Ceph pod templates in unit tests. Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>

dotnwat added ceph main ceph tag bug labels Aug 8, 2019

dotnwat requested a review from travisn August 8, 2019 22:19

travisn requested changes Aug 8, 2019

View reviewed changes

dotnwat added 2 commits August 8, 2019 16:29

travisn approved these changes Aug 9, 2019

View reviewed changes

travisn merged commit 207dad2 into rook:master Aug 9, 2019

dotnwat deleted the mon-pvc-root-dir-fix branch August 13, 2019 17:29

BlaineEXE mentioned this pull request Oct 11, 2019

Ceph directory-based OSDs: config refactoring #3554

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph: fix monitor pvc storage on ebs #3594

ceph: fix monitor pvc storage on ebs #3594

dotnwat commented Aug 8, 2019 •

edited

Loading

travisn Aug 8, 2019

travisn Aug 8, 2019

dotnwat Aug 8, 2019 •

edited

Loading

dotnwat Aug 8, 2019

ceph: fix monitor pvc storage on ebs #3594

ceph: fix monitor pvc storage on ebs #3594

Conversation

dotnwat commented Aug 8, 2019 • edited Loading

travisn Aug 8, 2019

Choose a reason for hiding this comment

travisn Aug 8, 2019

Choose a reason for hiding this comment

dotnwat Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

dotnwat Aug 8, 2019

Choose a reason for hiding this comment

dotnwat commented Aug 8, 2019 •

edited

Loading

dotnwat Aug 8, 2019 •

edited

Loading