Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust buildroot creation to work inside a user namespace. #234

Merged
merged 5 commits into from
Aug 4, 2019

Conversation

owtaylor
Copy link
Contributor

@owtaylor owtaylor commented Feb 6, 2019

This set of patches is sufficient for me to get mock running in a rootless container. I was testing with https://github.com/debarshiray/fedora-toolbox - but something like podman run --privileged fedora:29 should work as well.

(I developed and tested this against 1.4.13, then rebased to devel - I don't think anything should have broken with the rebase, but it's possible.)

The three changes are:

  • Use bind mounts for device files
  • Use a bind mount for /sys
  • Use the %_netsharedpath RPM macro to avoid RPM trying and failing to chown /proc and /sys

@pep8speaks
Copy link

pep8speaks commented Feb 6, 2019

Hello @owtaylor! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-02 16:25:20 UTC

@owtaylor
Copy link
Contributor Author

owtaylor commented Feb 6, 2019

I fixed up the one pep8 issue in my code, the other issues seem unrelated. I can add a cleanup patch to fix them to the patchset if desired.

# to install the buildroot, it doesn't make sense for RPM to try and
# set the permissions on them - and that might fail with permission errors.
with open(os.path.join(rpm_config_home, ".rpmmacros"), "w") as f:
f.write("%_netsharedpath /proc:/sys\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why this way? Isn't it better to set config_opts['macros']['%_netsharedpath'] as default in util.py? And mention it in sitedefault.cfg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but my impression was that config_opts['macros'] affects rpmbuild, but would have no effect on the package management commands used to set up the chroot. [Except for microdnf? Perhaps something is needed for that case as well?]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS this is pretty much the only way to achieve the desired result at the moment if yum/dnf/whatever is run from outside the chroot. Rpm reads its configuration during Python module initialization, and since neither yum or dnf have a way to define arbitrary rpm macros via their configuration or cli-switches, this has to be placed in a file before yum/dnf is launched. If yum/dnf was run from within the target chroot then you could place the configuration there, but from the outside... you can't go changing system-wide configuration even temporarily, so that basically leaves what we have here: override HOME to a temporary directory and create ~/.rpmmacros there.

On the positive side, this could be used to house other similar configurations if needed, for example %_install_langs could be supported this way.

One possibility would be taking this all one step further and generate the temporary ~/.rpmmacros from config_opts['macros'] contents so the same config mechanism could be used for arbitrary macros on the target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed DNF RFE and discussed that with DNF team in person:
https://bugzilla.redhat.com/show_bug.cgi?id=1673333
But it is a long run. It will not help with this PR - we can just make it easier later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Just realized/remembered that while module import still always reads in the rpm config, for many years now it's possible to reset + reload the configuration later. So there is an alternative: doing rpm.reloadConfig() inside the chroot. Just FWIW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure about that .rpmmacros. It seems very hackish to me. It is not needed on bare metals hosts. Only in containers. Therefore, I would rather see this done in Toolbox itself or in Container kickstarts https://pagure.io/fedora-kickstarts/tree/master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be configured in /etc/rpm in the toolbox-support package which is suppose to be installed in all containers for toolbox, and I think it would get picked up correctly. Reasons I would rather see it here:

  • Solve it once, why make someone trying to run mock under podman themselves without the toolbox wrapper hit this error and hopefully google search and find this PR and find this workaround. (Yes, maybe we could get it changed in the Fedora base containers, but might take a while.)
  • This is really something that is needed for the container we create rather than the enclosing container; e.g., if mock was enhanced at some point to use systemd-nspawn --private-users to avoid needing root privileges, then you'd need the same thing.

That being said, if you really don't want this, I'd rather have the rest of this PR landed :-)

cmd.append('--rbind')
else:
cmd.append('--bind')
bind = '--rbind' if self.recursive else '--bind'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used at all. Probably some leftover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed.

@xsuchy
Copy link
Member

xsuchy commented Feb 6, 2019

Generally, it seems to be good. Thou, I am not 100 % sure about that second commit. I will have to check it again when I will have more time whether there is no any regression for ordinary use cases.

@owtaylor
Copy link
Contributor Author

owtaylor commented Feb 6, 2019

Generally, it seems to be good. Thou, I am not 100 % sure about that second commit. I will have to check it again when I will have more time whether there is no any regression for ordinary use cases.

An alternative would be to have a special SysfsMountPoint that first tries mounting a fresh sysfs and falls back to a bind mount. We'd definitely want to go that way if problems do show up. This way seemed simpler to me, especially since it mostly affects --old-chroot.

@owtaylor
Copy link
Contributor Author

owtaylor commented Apr 3, 2019

@xsuchy - did you have a chance to think any further about what you'd prefer here?

@matthiasclasen
Copy link

would be nice to get this landed

@xsuchy
Copy link
Member

xsuchy commented Apr 17, 2019

I am trying to get toolbox running, but I am getting:

$ toolbox -v create -r 29 
toolbox: Fedora generational core is f29
toolbox: base image is fedora-toolbox:29
toolbox: customized user-specific image is fedora-toolbox-msuchy:29
toolbox: container is fedora-toolbox-msuchy-29
toolbox: checking value /var/run/.heim_org.h5l.kcm-socket (Stream) of property Listen in sssd-kcm.socket
toolbox: parsing value /var/run/.heim_org.h5l.kcm-socket (Stream) of property Listen in sssd-kcm.socket
toolbox: checking if image fedora-toolbox-msuchy:29 already exists
ERRO[0000] User-selected graph driver "overlay" overwritten by graph driver "vfs" from database - delete libpod local files to resolve 
Error: error getting image "fedora-toolbox-msuchy:29": unable to find a name and tag match for fedora-toolbox-msuchy in repotags: no such image
toolbox: looking for image localhost/fedora-toolbox:29
Pulling docker://localhost/fedora-toolbox:29
ERRO[0000] exit status 1                                
toolbox: looking for image registry.fedoraproject.org/f29/fedora-toolbox:29
Pulling docker://registry.fedoraproject.org/f29/fedora-toolbox:29
ERRO[0000] exit status 1                                
toolbox: failed to pull base image fedora-toolbox:29

@@ -480,15 +496,39 @@ def _setup_devices(self):
kver = os.uname()[2]
self.root_log.debug("kernel version == %s", kver)
for i in devFiles:
src_path = "/" + i[2]
chroot_path = make_chroot_path(i[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be self.make_chroot_path

@debarshiray
Copy link

I am trying to get toolbox running, but I am getting:

If you were trying that on a Fedora 30 host, then it's containers/buildah#1504 It's specific to the combination of Fedora 30, buildah pull and registry.fedoraproject.org.

Easiest alternative is to try a Fedora 29 host.

@xsuchy
Copy link
Member

xsuchy commented Apr 17, 2019

When I try the mock package with this PR (tito build --test --rpm -i) and I run mock /tmp/tito/mock-1.4.14-1.git.34.3750bc8.fc30.src.rpm - on normal host, not in container - then I get:

ERROR: Command failed: 
 # /usr/bin/systemd-nspawn -q -M db9b088beaf84e3ea6ed72e33291fc81 -D /var/lib/mock/fedora-29-x86_64/root -a --setenv=TERM=vt100 --setenv=SHELL=/bin/bash --setenv=HOME=/builddir --setenv=HOSTNAME=mock --setenv=PATH=/usr/bin:/bin:/usr/sbin:/sbin --setenv=PROMPT_COMMAND=printf "\033]0;<mock-chroot>\007" --setenv=PS1=[\u@\h<mock-chroot>/\w]\[\033[01;31m\]${?/#0/}\[\033[00m\]\$ --setenv=LANG=cs_CZ.UTF-8 /usr/sbin/groupadd -g 135 mock

@xsuchy
Copy link
Member

xsuchy commented Apr 17, 2019

And the actual error is DEBUG: BUILDSTDERR: Failed to determine whether the unified cgroups hierarchy is used: No medium found

@xsuchy
Copy link
Member

xsuchy commented Apr 17, 2019

Hmm, the second commit is problematic. When I run it (mock shell is enough) it will do something wrong during mounting that I lost network and had to reboot.

@owtaylor
Copy link
Contributor Author

Hmm, the second commit is problematic. When I run it (mock shell is enough) it will do something wrong during mounting that I lost network and had to reboot.

This was hard to track down - it comes down to the behavior of shared mounts - the basic idea is that when a mount is marked shared - then "copies" of that mount - by creating a new mount namespace, or a bind mount - are in the same sharing group, are also marked shared, and changes propagate bi-directionally.

  • /sys is typically marked as a shared mount on Fedora. (In fact, / seems to be typically marked shared)
  • When mock calls unshare(CLONE_NEWNS) to create a new mount namespace, the copy of /sys in the new namespace is marked shared and in the same sharing group as the original /sys
  • When my patch did mount --rbind /sys <chroot>/sys, the chroot sys is again in the same sharing group as the original /sys
  • When my patch did mount -l <chroot>/sys, it turns out that a lazy umount is inherently a recursive unmount (see description of MNT_DETACH in umount2(2)), and the unmounts of the sub-filesystems of /sys propagate back to the host /sys.
  • At that point, all the subfilesystems of the host /sys are gone, and then system is borked.

The basic solution here is that when mock creates it's mount namespace, it should do mount --make-private / - this is what the command line tool unshare does - otherwise, the new mount namespace doesn't actually protect the parent mount namespace from changes. (And since then /sys in the new mountspace is marked private, the bound copy in <chroot>/sys is also private, and umount -lis harmless.)

This didn't come up in my testing inside containers, since /sys inside a container is going to be a private mount.

Will post a new version of my patchset.

@owtaylor
Copy link
Contributor Author

I've pushed a new version that I think works correctly inside and outside of a container. The main changes are a) enforcing a non-shared mount namespace b) using a bind mount for /proc as well. The second change is necessary because between the initial version and now toolbox was switched to use pid=host mode, rather than a separate pid namespace. My preference is to always use the bind mounts for /proc and /sys rather than make them conditional on failure or to try and detect the exact circumstances in which they are necessary, to reduce the number of code paths.

One thing that this patch doesn't do is to try and automatically detect when --old-chroot / use_nspawn=False is needed - that needs to be set manually by the user. If desired, that behavior could be implemented as a separate PR.

@owtaylor
Copy link
Contributor Author

owtaylor commented Jul 8, 2019

@xsuchy - any thoughts on the latest version? (I know the description of what went wrong with bind mounts is complex - but I would consider the final 'mount --make-rprivate /' commit to be a straight-up bug fix to what mock what is doing already.)

@xsuchy
Copy link
Member

xsuchy commented Jul 8, 2019

My problem is that I cannot really work with toolbox. And therefore I cannot test this. Or work on this.

Previously, I was not even able to enter toolbox. This seems to be fixed now. But:

🎩[msuchy@dri/~/projects/mock/mock{devel}]1$ toolbox enter
\[msuchy@toolbox/~/projects/mock/mock]$ rpm -q tito
package tito is not installed
[msuchy@toolbox/~/projects/mock/mock]$ dnf install tito
Error: This command has to be run under the root user.
[msuchy@toolbox/~/projects/mock/mock]$ sudo dnf install tito
Fedora Modular 30 - x86_64                                                                                        0.0  B/s |   0  B     00:00    
Failed to synchronize cache for repo 'fedora-modular'
Error: Failed to synchronize cache for repo 'fedora-modular'
[msuchy@toolbox/~/projects/mock/mock]$ tito
bash: tito: command not found...
Install package 'tito' to provide command 'tito'? [N/y] y


 * Waiting in queue... Failed to install packages: tito-0.6.11-8.fc30.noarch is already installed

@xsuchy
Copy link
Member

xsuchy commented Jul 8, 2019

It is not just about tito. It behave similary for e.g., python3-jinja2, which I need for running out of git repo. It is installed on host, but missing in toolbox and I cannot get it in.

@owtaylor
Copy link
Contributor Author

owtaylor commented Jul 8, 2019

My problem is that I cannot really work with toolbox. And therefore I cannot test this. Or work on this.

Previously, I was not even able to enter toolbox. This seems to be fixed now. But:

Sorry you are having problems with toolbox! :-( Whats' 'rpm -q podman toolbox'? @debarshiray - any ideas of what is going on?

@xsuchy
Copy link
Member

xsuchy commented Jul 8, 2019

$ rpm -q podman toolbox
podman-1.4.4-1.fc30.x86_64
toolbox-0.0.11-1.fc30.noarch

@debarshiray
Copy link

@xsuchy when you are inside the toolbox, what does ls -l /etc/resolv.conf say? It should be a symbolic link pointing at /run/host/etc/resolv.conf.

Other than that there seems to be a DNF bug which triggers the Failed to synchronize cache for repo 'fedora-modular' error that people have mentioned on IRC. Trying it again gets you past it.

@xsuchy
Copy link
Member

xsuchy commented Jul 8, 2019

🎩[msuchy@dri/~/Dropbox]$ toolbox enter
\[msuchy@toolbox/~/Dropbox]$ ls -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 25 Jul  8 19:51 /etc/resolv.conf -> /run/host/etc/resolv.conf
[msuchy@toolbox/~/Dropbox]$ cat /run/host/etc/resolv.conf
cat: /run/host/etc/resolv.conf: No such file or directory

The problem seems to be that:

[msuchy@toolbox/~/Dropbox]$ ls -l /run/host/etc/resolv.conf 
lrwxrwxrwx. 1 nobody nobody 35 Jul  8 18:16 /run/host/etc/resolv.conf -> /var/run/NetworkManager/resolv.conf

and /var/run of host is obviously not mounted.

@debarshiray
Copy link

Interesting. Out of curiosity what's the Fedora version running on the host? I am asking because of the symbolic link into /var/run/NetworkManager/.... I know that systems using systemd-resolved have the same problem.

Anyway, to unbreak this ...

If you have /run/host/monitor inside the toolbox container, then you can point the /etc/resolv.conf symbolic link to /run/host/monitor/resolv.conf. If you don't have /run/host/monitor then re-creating the toolbox will give you that bind-mount.

$ toolbox rm --all --force
$ toolbox create

@xsuchy
Copy link
Member

xsuchy commented Jul 9, 2019

After toolbox rm --all --force:

[msuchy@toolbox/~]$ ls -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 25 Jul  9 07:27 /etc/resolv.conf -> /run/host/etc/resolv.conf
[msuchy@toolbox/~]$ ls -l /run/host/etc/resolv.conf
lrwxrwxrwx. 1 nobody nobody 35 Jul  9 07:25 /run/host/etc/resolv.conf -> /var/run/NetworkManager/resolv.conf
[msuchy@toolbox/~]$ ls -l /var/run/NetworkManager/resolv.conf
ls: cannot access '/var/run/NetworkManager/resolv.conf': No such file or directory

After sudo ln -s /run/host/monitor/resolv.conf /etc/resolv.conf it started working.

I have Fedora 30.

@xsuchy
Copy link
Member

xsuchy commented Jul 9, 2019

mock shell in toolbox (with this PR applied) just dnf-install the chroot and exit. Similary mock foo.src.rpm just dnf-install chroot and exit with exit code 1.

@xsuchy
Copy link
Member

xsuchy commented Jul 9, 2019

Hmm, it works with --old-chroot.

BindMountPoint(srcpath='/sys',
bindpath=rootObj.make_chroot_path('/sys'),
recursive=True,
options="nodev,noexec,nosuid,readonly"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this would work for --new-chroot it would reveal all /proc/PIDs tree of host. Which goes against the isolation, which we want to have in Mock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I think this isn't the case for two reasons:

First, we're not really changing anything - previously the code mounted in a fresh /proc from the outer PID namespace - now it's bind mounting in the same thing. (The main difference is that if parts of /proc /sys are shadowed by overmounts, the shadowing is preserved - this is why the fresh mount is restricted in the userns case.)

But second, the "essential mounts", as I understand it, aren't actually mounted when systemd-nspawn is run - for USE_NSPAWN, they are only mounted for running package management commands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, right.

@xsuchy
Copy link
Member

xsuchy commented Jul 9, 2019

I think that making this work with --new-chroot is overkill as we do not need another level of isolation in toolbox. We are already in container and shielded from host. No need to try to run there another container using systemd-nspaw. So first step: wrap this functionality into if old_chroot: and we make it a feature: Mock runs in Toolbox only with --old-chroot.

@owtaylor
Copy link
Contributor Author

owtaylor commented Jul 9, 2019

Hmm, it works with --old-chroot.

See: #234 (comment) - "One thing that this patch doesn't do is to try and automatically detect when --old-chroot / use_nspawn=False is needed - that needs to be set manually by the user. If desired, that behavior could be implemented as a separate PR."

@xsuchy
Copy link
Member

xsuchy commented Jul 10, 2019

See: #234 (comment) - "One thing that this patch doesn't do is to try and automatically detect when --old-chroot / use_nspawn=False is needed - that needs to be set manually by the user. If desired, that behavior could be implemented as a separate PR."

Fair enough - but current PR breaks --new-chroot (which is default nowadays) for everybody else. So we have to do something about it.

@owtaylor
Copy link
Contributor Author

owtaylor commented Jul 10, 2019 via email

@xsuchy
Copy link
Member

xsuchy commented Jul 10, 2019

Hmm, I tried it once again. And it indeed works in host.

# source mount propagation status is 'shared' - changes to the mounts
# will still propagate back to the parent namespace. Do the same
# thing as unshare(1) and make all mounts private.
do(['mount', '--make-rprivate', '/'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the flag on / of host (!) - I am not sure we want this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? It's after the call to unshare()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am sure. Try to replace it with ['touch', '/DBG'] and see where the file will be created.

@owtaylor
Copy link
Contributor Author

owtaylor commented Jul 30, 2019 via email

@owtaylor
Copy link
Contributor Author

With my patches:

$ grep /proc /proc/self/mountinfo 
20 90 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:25 - proc proc rw
43 20 0:39 / /proc/sys/fs/binfmt_misc rw,relatime shared:27 - autofs systemd-1 rw,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15100

$ mock --old-chroot shell
# grep /proc /proc/self/mountinfo
350 319 0:4 / /proc rw,relatime - proc proc rw
374 350 202:1 /tmp/mock-selinux-plugin.s0rblc29 /proc/filesystems rw,relatime - ext4 /dev/xvda1 rw,seclabel
# logout

$ grep /proc /proc/self/mountinfo 
20 90 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:25 - proc proc rw
43 20 0:39 / /proc/sys/fs/binfmt_misc rw,relatime shared:27 - autofs systemd-1 rw,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15100

So that all looks right to me - the 'shared' flag was on the host mount before, was cleared inside the inner namespace, and is still on the host mount afterwards.

This patch is bug fix that makes sense independent of the rest of the patchset. Without this change (stock mock) - under --old-chroot, the chroot can change the host mount table:

$ mock --old-chroot shell
# mkdir /newmount
# mount -t tmpfs /dev/null /newmount
# logout
$ mount | grep newmount
/dev/null on /var/lib/mock/fedora-29-x86_64/root/newmount type tmpfs (rw,relatime,seclabel)

But with this change, that doesn't happen.

$ mock --old-chroot shell
# mkdir /newmount
# mount -t tmpfs /dev/null /newmount
# logout
$ mount | grep newmount

@owtaylor
Copy link
Contributor Author

owtaylor commented Aug 1, 2019

Ping? Does that make sense?

raiseExc=0, shell=False, env=self.env)

if src_path == '/dev/tty':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to src_path in ['/dev/tty', '/dev/ptmx'] ?

Additionaly, can you add comment like "This is run only if not bind-mounted". ('cos it took me some time to comprehend why you took it inside of the for-loop).

# If mknod gives us a permission error, fall back to a different
# strategy of using a bind mount from root to host. This won't
# work for the loop devices, so just skip them in this case.
if e.errno == errno.EPERM:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is done even on bare metals. Not just for containers. Previously if the mknod failed on bare metal, it failed. Now we bind mount host device. I am not saying it is bad (not sure actually), but it is definitely change from past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fairly difficult to detect "on bare metal" reliably without encoding assumptions about the container system (docker, podman, lxd, systemd-nspawn...), so I've avoided "in a container" conditionalization anywhere in the patchset.

And conditionalizing based on !USE_NSPAWN as Pavel suggests makes the assumption that
systemp-nspawn will never ever work nested in a container - which I don't think is necessarily going to be true forever.

IMO, we can make the code simpler and more future proof by just trying the fallback in all cases - maybe it will help if we get an EPERM even on bare metal because of some other reason, e..g, selinux seccomp.

Making error reporting more confusing is always a concern, but it's not the code is expecting mknod to fail and producing a nice informative error message - it's just dying with a traceback in that case.

# avoid problems with kernel restrictions on mounting a new procfs, sysfs
BindMountPoint(srcpath='/proc',
bindpath=rootObj.make_chroot_path('/proc'),
recursive=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we need the recursive here at all? Previously when we mounted /proc and /sys, it mounted just the top level and did not mounted anything within. E.g., /proc/sys/fs/binfmt_misc.
Why it worked in past and why we need to recursively traverse now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that a container system might bind-mount over parts of /proc and /sys to hide them from processes inside the container. To prevent the container from escaping from this, it's not allowed to mount a new copy of the filesystem, but it's also not allowed to bind mount the filesystem in a way that would reveal the hidden parts - thus the kernel bindmount of /proc will fail unless it's recursive. I can make the comment say more of that instead of just "kernel restrictions".

os.mknod(self.make_chroot_path(i[2]), i[0], i[1])
if os.path.exists(src_path) or "loop" in src_path:
try:
os.mknod(chroot_path, i[0], i[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to fall-back only if we are in the USE_NSPAWN scenario? I.e. don't fallback for normal chroot()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking back ... toolbox actually implies normal chroot() inside podman container. So I should reverse my question: Do we want to fallback if USE_NSPAWN is on? Because in that case mock is run in non-toolbox environment (normal host) - and it might be better to fail on mknod (if that really was failing) instead of trying useless bind mount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply to @xsuchy above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I view this PR approach (running toolbox/podman wrapper around mock, instead of starting containers by mock) as much more straight-forward way to make mock maintainable in future. And really superior. While USE_NSPAWN is better than nothing, the current nspawn support isn't complete (--installroot is still not run in container) and it IMO doesn't make much sense to continue with finishing it.. having it dropped one day sounds like good idea to me.

But still, ATM, USE_NSPAWN is used by default (and e.g. used by copr build system), and in that case - falling down from mknod to bind-mount doesn't make much sense.. Also, from the other side -- trying mknod when we know in advance it will fail (toolbox case) isn't easy reading. All that is mixed up with bare host + old-chroot use-case. Just saying.

But yeah, we seem to know now what is the change about and why - at least now, so +1 from me.

@xsuchy
Copy link
Member

xsuchy commented Aug 2, 2019

Sorry for all those comments and questions. I am just trying to comprehend all those changes and make sure that we do not introduce any regression or security issue to current users.

if self.recursive:
cmd.append('--rbind')
else:
cmd.append('--bind')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw., why --bind doesn't work and --rbind does? (considering that we don't actually need recursive mounts inside chroot for the package builds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above about why we need to --rbind for the /proc and /sys mounts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the question isn't answered here, moving here https://bugzilla.redhat.com/show_bug.cgi?id=1745048

The code attempted to do all mock mounts inside a private mount namespace,
by using unshare(CLONE_NEWNS), but it turns out that that isn't effective
if part or all of the original mount tree are marked as 'shared' mounts -
changes to copies of those mounts will still propagate back to the original
mounts. Mark the entire new mount tree as 'private' to avoid such sharing.
If mock is running in a user namespace, the the /proc and /sys we
mount into the buildroot will be owned by nobody:nobody (the real
host on the root) not root:root. To avoid failures during package
installation set the %_netsharedpath RPM macro to /proc:/sys to skip
these directories. (Suggestion from Panu Matilainen.)

Implement by pointing HOME to a directory with only ~/.rpmmacros.
(This means that /root/.rpmmacros will now be ignored; system
RPM configuration will still be honored.)
Mounting a fresh sysfs from within a user namespace is allowed only
in limited cases, so always use a bind mount instead to avoid problems.

This needs to be a recursive bind mount so that any mounts on top
of the parent /sys are preserved in the child (the kernel will
fail a plain bind mount).
If mock is running inside a user namespace, then mknod will not succeed,
so use bind mounts instead.
If we are in a user namespace, but not in a PID namespace, then a
fresh mount of /proc will be denied, so similar to /sys, simply
always use a bind-mount of /proc from the host.
@owtaylor
Copy link
Contributor Author

owtaylor commented Aug 2, 2019

Repushed:

  • With hopefully clearer comments
  • With the src_path in (...) cleanup

Let me know if you want me to try and conditionalize on not-in-a-container or !USE_NSPAWN, or if you want me to remove the %_netsharedpath setting.

@praiskup
Copy link
Member

praiskup commented Aug 2, 2019

Iŧ's just awesome to see that mock can be run in rootless container, thanks a lot Owen for this work.

@xsuchy
Copy link
Member

xsuchy commented Aug 4, 2019

I am going to merge this. However, I would welcome if you will continue the work with .rpmmacros and put it default container or toolbox.
Thank you for your work and patience.

@xsuchy xsuchy merged commit 3c6e282 into rpm-software-management:devel Aug 4, 2019
bindpath=rootObj.make_chroot_path('/proc'),
recursive=True,
options="nodev,noexec,nosuid,readonly"),
BindMountPoint(srcpath='/sys',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, changing /sys from mount to bind mount caused this: https://bugzilla.redhat.com/show_bug.cgi?id=1740421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants