Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nspawn: overlayfs cannot be the root file system right now #3847

Closed
Andrei-Pozolotin opened this issue Aug 1, 2016 · 11 comments · Fixed by #14269
Closed

nspawn: overlayfs cannot be the root file system right now #3847

Andrei-Pozolotin opened this issue Aug 1, 2016 · 11 comments · Fixed by #14269
Labels
nspawn RFE 🎁 Request for Enhancement, i.e. a feature request

Comments

@Andrei-Pozolotin
Copy link

Andrei-Pozolotin commented Aug 1, 2016

  1. the great feature of --overlay= overlay fs support
    https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html

  2. somehow does not allow to utilize generated overlay stack as container root fs mount
    (we tried many different things, nothing works for us)

  3. additionally overlay fs support is missing from machine.nspawn
    https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html

so the request is to 2-fold
a) support overlay as root fs
b) control overlay settings from *.nspawn

@poettering poettering changed the title BUG: systemd-nspawn: overlayfs support as root mount and in *.nspawn nspawn: overlayfs cannot be the root file system right now Aug 19, 2016
@ghost
Copy link

ghost commented Nov 9, 2016

Hi all,
I came to this bug when trying to use an image as a base for several containers, to share base OS updates in particular.

The workaround i found is to use several overlays. My setup is a base fedora image mounted read-only and additional file hierarchy for running the Caddy webserver: /usr/bin/caddy and /etc/Caddyfile.

systemd-nspawn -D /var/lib/machines/fedora --read-only --overlay /var/lib/machines/fedora/etc:/var/lib/machines/caddy/base/etc:/var/lib/machines/caddy/overlay/etc:/etc --overlay-ro /var/lib/machines/fedora/usr/bin:/var/lib/machines/caddy/base/bin:/usr/bin --tmpfs /var -M caddy -b

It works pretty well even if I would really enjoy more help from systemd to share data across images.

I see Lennart renamed the bug removing the .nspawn part of it which is the one that interest me the most in this case. I have opened #4634 for this purpose.

Cheers,

@poettering poettering added the RFE 🎁 Request for Enhancement, i.e. a feature request label Nov 11, 2016
@LionNatsu
Copy link
Contributor

Docs says I can do it in this way:

systemd-nspawn -b -D "/home/lion/aroot" --overlay="/home/lion/aroot:/home/lion/overlay:/"

However, I always get this:

Spawning container aroot on /home/lion/aroot.
Press ^] three times within 1s to kill container.
Failed to move root directory: Invalid argument

On the other hand, If I use --overlay="...:...:/root" or --overlay="...:...:/usr" or any else, it runs well.

I do think this is a bug...

@kangsterizer
Copy link

Note that if you mount/umount the overlay manually as root filesystem and start the container there it works fine (of course - that's a work around until this works in systemd-nspawn)

@dteleguin
Copy link

Just my 2¢, I'm using systemd-230 and overlays to build multiple containers on top of base OS image. Base image and containers live in /srv/machines:

srv/
└── machines/
    ├── base/
    │   └── (OS image here)
    ├── machine1/
    │   └── (container-specific data)
    ├── machine2/
    │   └── (container-specific data)
    └── work/
        └── (workdir for overlayfs)

At the same time, var/lib/machines/machine* entries are just empty dirs, meant to be the targets for overlay mounts. First I've tried systemd-nspawn --overlay=...:

[root@host ~]# systemd-nspawn -D /var/lib/machines/machine1 --overlay=/srv/machines/base:/srv/machines/machine1:/var/lib/machines/machine1
Directory /var/lib/machines/machine1 doesn't look like it has an OS tree. Refusing.

The only remaining option was to mount overlays manually. Since my /srv lives on a separate partition, a specific mount order should be respected. This can be done either via /etc/fstab (systemd≥220):

overlay /var/lib/machines/machine1 overlay x-systemd.requires=/srv,lowerdir=/srv/machines/base,upperdir=/srv/machines/machine1,workdir=/srv/machines/work/machine1

or via /etc/systemd/system/var-lib-machines-machine1.mount unit:

[Unit]
After=srv.mount
Requires=srv.mount
Description=Mount overlay container for machine1

[Mount]
What=overlay
Where=/var/lib/machines/machine1
Type=overlay
Options=lowerdir=/srv/machines/base,upperdir=/srv/machines/machine1,workdir=/srv/machines/work/machine1

[Install]
WantedBy=multi-user.target

Of course this should be repeated for each remaining machine. I hope one day systemd-nspawn will support this layout out of the box, another step towards Docker Done Right™ ;)

@oxwivi
Copy link

oxwivi commented Aug 23, 2018

I want to use my host operating system as the base image for containers (as described in referenced #9044), can I use the overlay method @ghost described in the second comment? I don't quite understand the syntax and purpose of those overlays, can anyone get me started?

poettering added a commit to poettering/systemd that referenced this issue Dec 21, 2018
poettering added a commit to poettering/systemd that referenced this issue Jan 21, 2019
poettering added a commit to poettering/systemd that referenced this issue Feb 14, 2019
poettering added a commit to poettering/systemd that referenced this issue Feb 15, 2019
poettering added a commit to poettering/systemd that referenced this issue Feb 21, 2019
@poettering
Copy link
Member

Fix in #11243

poettering added a commit to poettering/systemd that referenced this issue Mar 1, 2019
@shuhaowu
Copy link

shuhaowu commented Mar 31, 2019

Sorry if I misinterpreted this, but is this 100% fixed? From the docs changes: --volatile=overlay seems to mount an overlayfs where the top level is a tmpfs as opposed to somewhere persistent, which I assumed was the original intent of this issue.

@ar-qun
Copy link

ar-qun commented Aug 10, 2019

@shuhaowu have you found an answer to this?

I still can't do systemd-nspawn --xbD / --volatile=overlay --overlay "/:/home/me/root:/".

@DaanDeMeyer
Copy link
Contributor

DaanDeMeyer commented Dec 5, 2019

@poettering

Moving

systemd/src/nspawn/nspawn.c

Lines 3469 to 3479 in a0b7f19

r = mount_custom(
directory,
arg_custom_mounts,
arg_n_custom_mounts,
arg_userns_mode != USER_NAMESPACE_NO,
arg_uid_shift,
arg_uid_range,
arg_selinux_apifs_context,
false);
if (r < 0)
return r;

Before

systemd/src/nspawn/nspawn.c

Lines 3399 to 3407 in a0b7f19

/* Mark everything as shared so our mounts get propagated down. This is
* required to make new bind mounts available in systemd services
* inside the container that create a new mount namespace.
* See https://github.com/systemd/systemd/issues/3860
* Further submounts (such as /dev) done after this will inherit the
* shared propagation mode. */
r = mount_verbose(LOG_ERR, NULL, directory, NULL, MS_SHARED|MS_REC, NULL);
if (r < 0)
return r;

enables succesfully mounting with a root overlay (currently mount_move_root returns EINVAL after removing the empty_or_root check in overlay_mount_parse). I'm still not sure exactly why but I think the EINVAL error from mount_move_root is the following one from the mount manpage:

EINVAL A  move  operation  (MS_MOVE) was attempted, but the parent mount of source mount has propagation type MS_SHARED.

Can you give me any pointers on a good way to fix this? I'm assuming moving the mount_custom call that far up will cause other issues but you probably know better as the original author.

We could simply special case custom mounts to root and have them happen before marking everything as MS_SHARED. I haven't thought of anything simpler. I tried moving the MS_SHARED|MS_REC call down but that broke all kinds of other stuff.

@DaanDeMeyer
Copy link
Contributor

I finally figured this out completely:

When working with directories, the root directory is always bind-mounted first before handling all the other mount stuff. The current difference between --volatile and --overlay is that --volatile is setup before the bind-mount is marked recursively as MS_SHARED and --overlay is setup after the bind-mount is marked recursively as MS_SHARED.

After overmounting all mount operations are applied to the overmount. Because --volatile is overmounted before marking the bind mount MS_SHARED, the volatile overmount is marked MS_SHARED and moving the overmount to "/" works without any issues. However, when using --overlay is used, the bind mount is marked MS_SHARED before the overlay mount is mounted on top of it. Because the underlying mount is considered the parent of the overmount, when we now try to move the overmount to "/", we're trying to move a mount with a parent mount marked MS_SHARED which is explicitly disallowed by mount and gives us the EINVAL.

Simply marking the underlying mount MS_PRIVATE before creating the overmount breaks other logic further on that likely depends on the mount being marked MS_SHARED. The clean fix seems to be to handle all custom root mounts before marking the bind-mount as shared. Maybe we can even go a step further and not bother creating the bind mount when we're mounting something else on top of root (--volatile, --overlay on root and I think --pivot-root as well) since it is only created so we have a parent mount that we can inherit propagation from and eventually move to "/". In the cases I mentioned, we already have a mount we can use and creating a bind mount doesn't seem necessary.

DaanDeMeyer added a commit to DaanDeMeyer/systemd that referenced this issue Dec 6, 2019
DaanDeMeyer added a commit to DaanDeMeyer/systemd that referenced this issue Dec 6, 2019
DaanDeMeyer added a commit to DaanDeMeyer/systemd that referenced this issue Dec 6, 2019
DaanDeMeyer added a commit to DaanDeMeyer/systemd that referenced this issue Dec 7, 2019
DaanDeMeyer added a commit to DaanDeMeyer/systemd that referenced this issue Dec 7, 2019
DaanDeMeyer added a commit to DaanDeMeyer/systemd that referenced this issue Dec 12, 2019
@vindicatorr
Copy link

Is this actually fixed though @poettering? I know this is old, but not sure if you/someone-involved would see this, and I'm not sure it would warrant opening a new issue.

Over the past week, I've done some testing with overlay buildouts (just started getting into containers over the past week), and while I could use it fine using the systemd-nspawn parameters, it wasn't working with the <systemdContainer>.nspawn file...

Overlay Structure:

-overlayGlibcGccLibs
--overlayBash
---overlaySystemd
-...<various levels of overlays building off of the predecessors as needed>

With that, I could:

$ sudo systemd-nspawn --boot \
   --directory=./overlayGlibcGccLibs \
   --overlay=./overlayGlibcGccLibs:./overlayBash:./overlaySystemd:/

and that would work fine (as far as I can tell so far).

But now using the.nspawn files, it's not liking it:
overlaySystemd.nspawn:

[Exec]
Boot=true

[Files]
Overlay=./overlayGlibsGccLibs:./overlayBash:./overlaySystemd:/

Directory /<pathTo>/overlaySystemd doesn't look like an OS root directory (os-release file is missing). Refusing.
And that just seems to be due to the expectation that the .nspawn name equates to the directory name, while I was using the lowest-level overlay as the starting point for --directory since it contains the expected os-release file (from the filesystem package).

Using Boot=false, it works fine: Spawning container overlaySystemd on...

As mentioned earlier in the thread, I could just manually mount the overlays and have an .nspawn for that overlay directory and leave out the Overlay= parameter, but it seems like it at least has the potential to work if it would look for os-release in the overlay merge, instead of looking for it in the upperdir overlay alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nspawn RFE 🎁 Request for Enhancement, i.e. a feature request
Development

Successfully merging a pull request may close this issue.

10 participants