Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

kvm: new volumes support #2328

Merged
merged 5 commits into from Apr 11, 2016
Merged

kvm: new volumes support #2328

merged 5 commits into from Apr 11, 2016

Conversation

jellonek
Copy link
Contributor

This PR provides new volumes support, based on bind mounting volumes in time of rootfs preparation, removing requirement to pass each volume separately through 9p. It also simplifies volumes support in kvm flavor.

Also - new approach closes #2263
Also - this PR adds possibility to choose as source symlink.

@alban
Copy link
Member

alban commented Mar 29, 2016

Are the bind mounts done in the host mount namespace or another mount namespace?

@jellonek
Copy link
Contributor Author

They are done in host namespace.

// TODO: verify if rkt should do this, or it should be some outer responsibility
// to ensure if such patch exists
if err := ensureDestinationExists(source, destination); err != nil {
// TODO: check if symlinked source tries to escape root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluateAppMountPath is already invocated few lines above, but it's called for destination.
I have a limited understanding of https://github.com/coreos/rkt/blob/master/tests/rkt_mount_test.go#L68 test - how anything can escape in this situation root? (this test is failing on kvm even after this patch).


if (kvm->cfg.using_rootfs) {
- strcat(real_cmdline, " rw rootflags=trans=virtio,version=9p2000.L,cache=loose rootfstype=9p");
+ strcat(real_cmdline, " rw rootflags=trans=virtio,version=9p2000.L,cache=none rootfstype=9p");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, was sure that commented this. We had an issue connected with these caches. Without this change some data could be lost if they are saved by application just before pod stopping. For example if I started docker://busybox in interactive mode, added some volume, echoed some value into new file in this volume and then quickly closed pod - file was created, but without expected content inside.
Changing this option helped.
Change was done basing on information from https://www.kernel.org/doc/Documentation/filesystems/9p.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add such a comment to the code or commit log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do this. I'll extract this single change into separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@krnowak
Copy link
Collaborator

krnowak commented Mar 31, 2016

Needs rebase.

@jellonek
Copy link
Contributor Author

ACK

@jellonek
Copy link
Contributor Author

Done, also fix for mounting symlinks is updated and fully working.

Still kvm flavor will not pass functional tests, because of that we cannot pass exit status value from pod as lkvm exit value (i can workaround this spawning lkvm as child process, but this breaks idea of execing onto lkvm at the end of run process - so imo it's ugly).

@alban
Copy link
Member

alban commented Mar 31, 2016

Sorry we didn't get to review this in time for v1.3.0. We will get it for the next milestone.

@alban alban modified the milestones: v1.4.0, v1.3.0 Mar 31, 2016
@jellonek
Copy link
Contributor Author

jellonek commented Apr 1, 2016

ACK.

@jellonek
Copy link
Contributor Author

jellonek commented Apr 5, 2016

Review request.

err := AppToSystemdMountUnits(common.Stage1RootfsPath(p.Root), appName, p.Manifest.Volumes, ra, UnitsDir)
if err != nil {
// bind mount all shared volumes (we don't use mechanism for bind-mounting given by nspawn)
if err := MountSharedVolumes(common.Stage1RootfsPath(p.Root), appName, p.Manifest.Volumes, ra); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since MountSharedVolumes is not related to systemd-nspawn parameters nor systemd units, I think the function call should not be in stage1/init/common/pod.go:appToSystemd but directly in stage1/init/init.go:stage1(). It would require to write another loop on p.Manifest.Apps though. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Give me some time to implement this, so this PR should be marked as needs rework for this purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using --volume=myvol,kind=host,source=/tmp/myvol-missing with a non-existent directory, the error message is:

stage1: failed to configure systemd: lstat /tmp/myvol-missing: no such file or directory

It should not say "failed to configure systemd" since it is not using systemd or systemd-nspawn anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fixed when I'll address previous comment.

@alban
Copy link
Member

alban commented Apr 5, 2016

stage1/init/init.go:stage1():

    if err = stage1initcommon.PodToSystemd(p, interactive, flavor, privateUsers); err != nil {
        log.PrintE("failed to configure systemd", err)
        return 2
    }

    args, env, err := getArgsEnv(p, flavor, debug, n)
    if err != nil {
        log.Error(err)
        return 3
    }

    // create a separate mount namespace so the cgroup filesystems
    // are unmounted when exiting the pod
    if err := syscall.Unshare(syscall.CLONE_NEWNS); err != nil {
        log.FatalE("error unsharing", err)
    }

With the current PR, the bind mounts are done before the Unshare & the setup of the mnt namespace as slave. So the bind mounts are visible to the host. If the bind mounts were done just after (line 613), they would not be visible to the host. It looks cleaner to me.

@jellonek
Copy link
Contributor Author

jellonek commented Apr 5, 2016

Ok, I'll look on this.

@jellonek jellonek force-pushed the jell/volumes branch 2 times, most recently from adf9225 to 3eed76c Compare April 7, 2016 15:30
@jellonek
Copy link
Contributor Author

jellonek commented Apr 7, 2016

@albanc: logic refactored, mounts are done after unshare and refactor of GenerateMounts leaved for #2380

// KvmPodToSystemd generates systemd unit files for a pod according to the manifest and network configuration
func KvmPodToSystemd(p *stage1commontypes.Pod, n *networking.Networking) error {
const (
sharedVolPerm = os.FileMode(0755)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it reuse the constant defined in the common pkg?

stage1/init/common/pod.go:      sharedVolPerm = os.FileMode(0755)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should. Give me a second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... This would require to export (uppercase) this constant in common/pod.go
I'm doing this now.

@jellonek jellonek force-pushed the jell/volumes branch 2 times, most recently from d86952a to 7f3631b Compare April 8, 2016 10:59
jellonek and others added 5 commits April 8, 2016 13:22
Without this change some data could be lost if they are saved by
application just before pod stopping. For example if I started
docker://busybox in interactive mode, added some volume, echoed some
value into new file in this volume and then quickly closed pod - file
was created, but without expected content inside.

Changing this option helped.

Change was done basing on information from
  https://www.kernel.org/doc/Documentation/filesystems/9p.txt
Co-authored-by: Michal Stachowski <michal.stachowski@outlook.com>
}
if !strings.HasPrefix(absDestination, absAppRootfs) {
return fmt.Errorf("path escapes app's root: %v", absDestination)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this reuse pod.go:evaluateAppMountPath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were discussing about this earlier in #2328 (comment)

@alban
Copy link
Member

alban commented Apr 11, 2016

LGTM

@alban alban merged commit a2544f1 into rkt:master Apr 11, 2016
@jellonek jellonek deleted the jell/volumes branch April 12, 2016 10:51
@alban alban mentioned this pull request Apr 15, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvm: problem with mounting single file as a volume
5 participants