Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: fix monitor pvc storage on ebs #3594

Merged
merged 2 commits into from
Aug 9, 2019
Merged

ceph: fix monitor pvc storage on ebs #3594

merged 2 commits into from
Aug 9, 2019

Conversation

dotnwat
Copy link
Contributor

@dotnwat dotnwat commented Aug 8, 2019

Description of your changes:

there is a build here for testing noahdesu/rook-ceph-master-no-lifecycle-e

  1. initializes ownership on files using init container rather than life cycle hook. see commit message for explanation.

  2. make sure that pvc file system mount uses empty directory for daemon data

ext4 adds some extra directories in the root, so this PR mounts in a subdirectory.

[vm 0 ~]$ oc -n rook-ceph log rook-ceph-mon-a-5cf4748c6-nvmzd -c list-container-data-dir
total 20
drwxrwsr-x. 3 root 1000630000  4096 Aug  8 20:06 .
drwxr-x---. 1 ceph ceph          20 Aug  8 20:06 ..
drwxrwS---. 2 root 1000630000 16384 Aug  8 20:06 lost+found

Which issue is resolved by this Pull Request:
Resolves #3591

Checklist:

  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md
  • Add the flag for skipping the CI if this PR does not require a build. See here for more details.

@dotnwat dotnwat added ceph main ceph tag bug labels Aug 8, 2019
@dotnwat dotnwat requested a review from travisn August 8, 2019 22:19
@@ -283,7 +304,6 @@ func (c *Cluster) makeMonDaemonContainer(monConfig *monConfig) v1.Container {
k8sutil.PodIPEnvVar(podIPEnvVar),
),
Resources: cephv1.GetMonResources(c.spec.Resources),
Lifecycle: opspec.PodLifeCycle(""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider this change for other daemons? For now, maybe we should leave this line (but commented out) and add a comment about why it's not needed. Otherwise, someone might think it is missing since all the other daemons have it and put it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we only be seeing this now for mons on PVs? That would affect the data dir, but remind me why is the log dir affected as well?

Copy link
Contributor Author

@dotnwat dotnwat Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues here. According to the docs, using a post start life cycle for anything that the entry point depends on is a race. Assuming that's a correct reading and the chown is in fact a dependency, that's the primary reason I left this patch in. In that context, it seems like all usages are incorrect if the intention is to set permissions for the main container.

The second issue is the problem we saw with proc/<pid>/ipc file not being found when chown was being run in the life cycle hook. I don't have a root cause for that, but it doesn't seem to show up the chown is performed in an init container.

A similar error here https://bugzilla.redhat.com/show_bug.cgi?id=1701326#c6 and someone mentioning a race between CRI-O and kubelet here cri-o/cri-o#1927 (comment)

Presumably postStart hooks are intended to observe containers that are fully initialized, but if anything would exacerbate an existing startup race, I think PVC attachments qualify (even if the mount related to the race is hostpath).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a note on the lifecycle hook linking back here for the discussion

pkg/operator/ceph/cluster/mon/spec.go Outdated Show resolved Hide resolved
the life cycle post-start hook is used to chown the ceph daemon log
directory. however, hooks are not guaranteed to run before a container
entry point.

"Kubernetes sends the postStart event immediately after the Container is
created. There is no guarantee, however, that the postStart handler is
called before the Container’s entrypoint is called"

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
when the root of a filesystem mode pvc is mounted as the base data directory
for the monitor it will not be empty: `lost+found` from ext4 will be present.
this causes the mon mkfs to fail because it expects an empty directory.

the solution is to mount under a directory in the source volume.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
@travisn travisn merged commit 207dad2 into rook:master Aug 9, 2019
@dotnwat dotnwat deleted the mon-pvc-root-dir-fix branch August 13, 2019 17:29
BlaineEXE added a commit to SUSE/rook that referenced this pull request Oct 10, 2019
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
BlaineEXE added a commit to SUSE/rook that referenced this pull request Oct 10, 2019
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
BlaineEXE added a commit to SUSE/rook that referenced this pull request Oct 15, 2019
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
BlaineEXE added a commit to SUSE/rook that referenced this pull request Oct 15, 2019
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
BlaineEXE added a commit to SUSE/rook that referenced this pull request Oct 15, 2019
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
sabbot pushed a commit to sabbot/rook that referenced this pull request Oct 17, 2019
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
binoue pushed a commit to binoue/rook that referenced this pull request Apr 10, 2020
Following the Ceph OSD rework, my test environment regularly failed to
start OSDs, as they would start before the postStart lifecyle hook which
was chown'ing the log and data directories could finish. Pods failed to
start for an arbitrarily long time before starting successfully. This
was a manifestation of a race condition identified in Ceph monitors in
rook#3594 (comment)

Create a Ceph operator-level method to create an init container that
will chown the necessary directories for all Ceph daemons (any daemon
which runs as the ceph:ceph user-and-group). Also add this container to
expectations for Ceph pod templates in unit tests.

Signed-off-by: Blaine Gardner <blaine.gardner@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ceph main ceph tag
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ceph Mons on PVs fail to start in OpenShift
2 participants