Conversation
Are the bind mounts done in the host mount namespace or another mount namespace? |
They are done in host namespace. |
// TODO: verify if rkt should do this, or it should be some outer responsibility | ||
// to ensure if such patch exists | ||
if err := ensureDestinationExists(source, destination); err != nil { | ||
// TODO: check if symlinked source tries to escape root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could reuse code from https://github.com/coreos/rkt/pull/2290/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evaluateAppMountPath is already invocated few lines above, but it's called for destination
.
I have a limited understanding of https://github.com/coreos/rkt/blob/master/tests/rkt_mount_test.go#L68 test - how anything can escape in this situation root? (this test is failing on kvm even after this patch).
|
||
if (kvm->cfg.using_rootfs) { | ||
- strcat(real_cmdline, " rw rootflags=trans=virtio,version=9p2000.L,cache=loose rootfstype=9p"); | ||
+ strcat(real_cmdline, " rw rootflags=trans=virtio,version=9p2000.L,cache=none rootfstype=9p"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, was sure that commented this. We had an issue connected with these caches. Without this change some data could be lost if they are saved by application just before pod stopping. For example if I started docker://busybox in interactive mode, added some volume, echoed some value into new file in this volume and then quickly closed pod - file was created, but without expected content inside.
Changing this option helped.
Change was done basing on information from https://www.kernel.org/doc/Documentation/filesystems/9p.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add such a comment to the code or commit log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll do this. I'll extract this single change into separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Needs rebase. |
ACK |
Done, also fix for mounting symlinks is updated and fully working. Still kvm flavor will not pass functional tests, because of that we cannot pass exit status value from pod as lkvm exit value (i can workaround this spawning lkvm as child process, but this breaks idea of execing onto lkvm at the end of run process - so imo it's ugly). |
Sorry we didn't get to review this in time for v1.3.0. We will get it for the next milestone. |
ACK. |
Review request. |
err := AppToSystemdMountUnits(common.Stage1RootfsPath(p.Root), appName, p.Manifest.Volumes, ra, UnitsDir) | ||
if err != nil { | ||
// bind mount all shared volumes (we don't use mechanism for bind-mounting given by nspawn) | ||
if err := MountSharedVolumes(common.Stage1RootfsPath(p.Root), appName, p.Manifest.Volumes, ra); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since MountSharedVolumes
is not related to systemd-nspawn parameters nor systemd units, I think the function call should not be in stage1/init/common/pod.go:appToSystemd
but directly in stage1/init/init.go:stage1()
. It would require to write another loop on p.Manifest.Apps though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Give me some time to implement this, so this PR should be marked as needs rework
for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using --volume=myvol,kind=host,source=/tmp/myvol-missing
with a non-existent directory, the error message is:
stage1: failed to configure systemd: lstat /tmp/myvol-missing: no such file or directory
It should not say "failed to configure systemd" since it is not using systemd or systemd-nspawn anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be fixed when I'll address previous comment.
stage1/init/init.go:stage1():
With the current PR, the bind mounts are done before the Unshare & the setup of the mnt namespace as slave. So the bind mounts are visible to the host. If the bind mounts were done just after (line 613), they would not be visible to the host. It looks cleaner to me. |
Ok, I'll look on this. |
adf9225
to
3eed76c
Compare
// KvmPodToSystemd generates systemd unit files for a pod according to the manifest and network configuration | ||
func KvmPodToSystemd(p *stage1commontypes.Pod, n *networking.Networking) error { | ||
const ( | ||
sharedVolPerm = os.FileMode(0755) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it reuse the constant defined in the common
pkg?
stage1/init/common/pod.go: sharedVolPerm = os.FileMode(0755)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. Give me a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... This would require to export (uppercase) this constant in common/pod.go
I'm doing this now.
d86952a
to
7f3631b
Compare
Without this change some data could be lost if they are saved by application just before pod stopping. For example if I started docker://busybox in interactive mode, added some volume, echoed some value into new file in this volume and then quickly closed pod - file was created, but without expected content inside. Changing this option helped. Change was done basing on information from https://www.kernel.org/doc/Documentation/filesystems/9p.txt
Co-authored-by: Michal Stachowski <michal.stachowski@outlook.com>
} | ||
if !strings.HasPrefix(absDestination, absAppRootfs) { | ||
return fmt.Errorf("path escapes app's root: %v", absDestination) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this reuse pod.go:evaluateAppMountPath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were discussing about this earlier in #2328 (comment)
LGTM |
This PR provides new volumes support, based on bind mounting volumes in time of rootfs preparation, removing requirement to pass each volume separately through 9p. It also simplifies volumes support in kvm flavor.
Also - new approach closes #2263
Also - this PR adds possibility to choose as source symlink.