-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust buildroot creation to work inside a user namespace. #234
Conversation
I fixed up the one pep8 issue in my code, the other issues seem unrelated. I can add a cleanup patch to fix them to the patchset if desired. |
# to install the buildroot, it doesn't make sense for RPM to try and | ||
# set the permissions on them - and that might fail with permission errors. | ||
with open(os.path.join(rpm_config_home, ".rpmmacros"), "w") as f: | ||
f.write("%_netsharedpath /proc:/sys\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why this way? Isn't it better to set config_opts['macros']['%_netsharedpath']
as default in util.py? And mention it in sitedefault.cfg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but my impression was that config_opts['macros'] affects rpmbuild, but would have no effect on the package management commands used to set up the chroot. [Except for microdnf? Perhaps something is needed for that case as well?]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this is pretty much the only way to achieve the desired result at the moment if yum/dnf/whatever is run from outside the chroot. Rpm reads its configuration during Python module initialization, and since neither yum or dnf have a way to define arbitrary rpm macros via their configuration or cli-switches, this has to be placed in a file before yum/dnf is launched. If yum/dnf was run from within the target chroot then you could place the configuration there, but from the outside... you can't go changing system-wide configuration even temporarily, so that basically leaves what we have here: override HOME to a temporary directory and create ~/.rpmmacros there.
On the positive side, this could be used to house other similar configurations if needed, for example %_install_langs could be supported this way.
One possibility would be taking this all one step further and generate the temporary ~/.rpmmacros from config_opts['macros'] contents so the same config mechanism could be used for arbitrary macros on the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed DNF RFE and discussed that with DNF team in person:
https://bugzilla.redhat.com/show_bug.cgi?id=1673333
But it is a long run. It will not help with this PR - we can just make it easier later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Just realized/remembered that while module import still always reads in the rpm config, for many years now it's possible to reset + reload the configuration later. So there is an alternative: doing rpm.reloadConfig() inside the chroot. Just FWIW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure about that .rpmmacros
. It seems very hackish to me. It is not needed on bare metals hosts. Only in containers. Therefore, I would rather see this done in Toolbox itself or in Container kickstarts https://pagure.io/fedora-kickstarts/tree/master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this could be configured in /etc/rpm in the toolbox-support package which is suppose to be installed in all containers for toolbox, and I think it would get picked up correctly. Reasons I would rather see it here:
- Solve it once, why make someone trying to run mock under podman themselves without the toolbox wrapper hit this error and hopefully google search and find this PR and find this workaround. (Yes, maybe we could get it changed in the Fedora base containers, but might take a while.)
- This is really something that is needed for the container we create rather than the enclosing container; e.g., if mock was enhanced at some point to use systemd-nspawn --private-users to avoid needing root privileges, then you'd need the same thing.
That being said, if you really don't want this, I'd rather have the rest of this PR landed :-)
mock/py/mockbuild/mounts.py
Outdated
cmd.append('--rbind') | ||
else: | ||
cmd.append('--bind') | ||
bind = '--rbind' if self.recursive else '--bind' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not used at all. Probably some leftover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removed.
Generally, it seems to be good. Thou, I am not 100 % sure about that second commit. I will have to check it again when I will have more time whether there is no any regression for ordinary use cases. |
An alternative would be to have a special SysfsMountPoint that first tries mounting a fresh sysfs and falls back to a bind mount. We'd definitely want to go that way if problems do show up. This way seemed simpler to me, especially since it mostly affects --old-chroot. |
@xsuchy - did you have a chance to think any further about what you'd prefer here? |
would be nice to get this landed |
I am trying to get toolbox running, but I am getting:
|
mock/py/mockbuild/buildroot.py
Outdated
@@ -480,15 +496,39 @@ def _setup_devices(self): | |||
kver = os.uname()[2] | |||
self.root_log.debug("kernel version == %s", kver) | |||
for i in devFiles: | |||
src_path = "/" + i[2] | |||
chroot_path = make_chroot_path(i[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be self.make_chroot_path
If you were trying that on a Fedora 30 host, then it's containers/buildah#1504 It's specific to the combination of Fedora 30, Easiest alternative is to try a Fedora 29 host. |
When I try the mock package with this PR (
|
And the actual error is |
Hmm, the second commit is problematic. When I run it ( |
This was hard to track down - it comes down to the behavior of shared mounts - the basic idea is that when a mount is marked shared - then "copies" of that mount - by creating a new mount namespace, or a bind mount - are in the same sharing group, are also marked shared, and changes propagate bi-directionally.
The basic solution here is that when mock creates it's mount namespace, it should do This didn't come up in my testing inside containers, since /sys inside a container is going to be a private mount. Will post a new version of my patchset. |
I've pushed a new version that I think works correctly inside and outside of a container. The main changes are a) enforcing a non-shared mount namespace b) using a bind mount for /proc as well. The second change is necessary because between the initial version and now One thing that this patch doesn't do is to try and automatically detect when --old-chroot / use_nspawn=False is needed - that needs to be set manually by the user. If desired, that behavior could be implemented as a separate PR. |
@xsuchy - any thoughts on the latest version? (I know the description of what went wrong with bind mounts is complex - but I would consider the final 'mount --make-rprivate /' commit to be a straight-up bug fix to what mock what is doing already.) |
My problem is that I cannot really work with toolbox. And therefore I cannot test this. Or work on this. Previously, I was not even able to enter toolbox. This seems to be fixed now. But:
|
It is not just about |
Sorry you are having problems with toolbox! :-( Whats' 'rpm -q podman toolbox'? @debarshiray - any ideas of what is going on? |
$ rpm -q podman toolbox |
@xsuchy when you are inside the toolbox, what does Other than that there seems to be a DNF bug which triggers the |
The problem seems to be that:
and |
Interesting. Out of curiosity what's the Fedora version running on the host? I am asking because of the symbolic link into Anyway, to unbreak this ... If you have
|
After
After I have Fedora 30. |
|
Hmm, it works with |
BindMountPoint(srcpath='/sys', | ||
bindpath=rootObj.make_chroot_path('/sys'), | ||
recursive=True, | ||
options="nodev,noexec,nosuid,readonly"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this would work for --new-chroot
it would reveal all /proc/PIDs tree of host. Which goes against the isolation, which we want to have in Mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but I think this isn't the case for two reasons:
First, we're not really changing anything - previously the code mounted in a fresh /proc from the outer PID namespace - now it's bind mounting in the same thing. (The main difference is that if parts of /proc /sys are shadowed by overmounts, the shadowing is preserved - this is why the fresh mount is restricted in the userns case.)
But second, the "essential mounts", as I understand it, aren't actually mounted when systemd-nspawn is run - for USE_NSPAWN, they are only mounted for running package management commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, right.
I think that making this work with |
See: #234 (comment) - "One thing that this patch doesn't do is to try and automatically detect when --old-chroot / use_nspawn=False is needed - that needs to be set manually by the user. If desired, that behavior could be implemented as a separate PR." |
Fair enough - but current PR breaks --new-chroot (which is default nowadays) for everybody else. So we have to do something about it. |
On Wed, Jul 10, 2019, 8:02 AM Miroslav Suchý ***@***.***> wrote:
See: #234 (comment)
<#234 (comment)>
- "One thing that this patch doesn't do is to try and automatically detect
when --old-chroot / use_nspawn=False is needed - that needs to be set
manually by the user. If desired, that behavior could be implemented as a
separate PR."
Fair enough - but current PR breaks --new-chroot (which is default
nowadays) for everybody else. So we have to do something about it.
Your comments only mentioned testing --new-chroot inside toolbox. How does
it fail outside of toolbox? It was working for me on F29, but if it's
failing for you I'll do more testing.
|
Hmm, I tried it once again. And it indeed works in host. |
# source mount propagation status is 'shared' - changes to the mounts | ||
# will still propagate back to the parent namespace. Do the same | ||
# thing as unshare(1) and make all mounts private. | ||
do(['mount', '--make-rprivate', '/']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change the flag on / of host (!) - I am not sure we want this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? It's after the call to unshare()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am sure. Try to replace it with ['touch', '/DBG']
and see where the file will be created.
After the unshare call, I'd expect a private set of mount tables, but
pointing to the same filesystems. So touching a file isn't really the same
thing? But whether the mount propagation flags on the host are changed is
visible in /proc/self/mounts. Will double check that later today.
|
With my patches:
So that all looks right to me - the 'shared' flag was on the host mount before, was cleared inside the inner namespace, and is still on the host mount afterwards. This patch is bug fix that makes sense independent of the rest of the patchset. Without this change (stock mock) - under --old-chroot, the chroot can change the host mount table:
But with this change, that doesn't happen.
|
Ping? Does that make sense? |
mock/py/mockbuild/buildroot.py
Outdated
raiseExc=0, shell=False, env=self.env) | ||
|
||
if src_path == '/dev/tty': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to src_path in ['/dev/tty', '/dev/ptmx']
?
Additionaly, can you add comment like "This is run only if not bind-mounted". ('cos it took me some time to comprehend why you took it inside of the for-loop).
# If mknod gives us a permission error, fall back to a different | ||
# strategy of using a bind mount from root to host. This won't | ||
# work for the loop devices, so just skip them in this case. | ||
if e.errno == errno.EPERM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This is done even on bare metals. Not just for containers. Previously if the mknod failed on bare metal, it failed. Now we bind mount host device. I am not saying it is bad (not sure actually), but it is definitely change from past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fairly difficult to detect "on bare metal" reliably without encoding assumptions about the container system (docker, podman, lxd, systemd-nspawn...), so I've avoided "in a container" conditionalization anywhere in the patchset.
And conditionalizing based on !USE_NSPAWN as Pavel suggests makes the assumption that
systemp-nspawn will never ever work nested in a container - which I don't think is necessarily going to be true forever.
IMO, we can make the code simpler and more future proof by just trying the fallback in all cases - maybe it will help if we get an EPERM even on bare metal because of some other reason, e..g, selinux seccomp.
Making error reporting more confusing is always a concern, but it's not the code is expecting mknod to fail and producing a nice informative error message - it's just dying with a traceback in that case.
# avoid problems with kernel restrictions on mounting a new procfs, sysfs | ||
BindMountPoint(srcpath='/proc', | ||
bindpath=rootObj.make_chroot_path('/proc'), | ||
recursive=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we need the recursive
here at all? Previously when we mounted /proc
and /sys
, it mounted just the top level and did not mounted anything within. E.g., /proc/sys/fs/binfmt_misc
.
Why it worked in past and why we need to recursively traverse now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that a container system might bind-mount over parts of /proc and /sys to hide them from processes inside the container. To prevent the container from escaping from this, it's not allowed to mount a new copy of the filesystem, but it's also not allowed to bind mount the filesystem in a way that would reveal the hidden parts - thus the kernel bindmount of /proc will fail unless it's recursive. I can make the comment say more of that instead of just "kernel restrictions".
os.mknod(self.make_chroot_path(i[2]), i[0], i[1]) | ||
if os.path.exists(src_path) or "loop" in src_path: | ||
try: | ||
os.mknod(chroot_path, i[0], i[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to fall-back only if we are in the USE_NSPAWN
scenario? I.e. don't fallback for normal chroot()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking back ... toolbox actually implies normal chroot() inside podman container. So I should reverse my question: Do we want to fallback if USE_NSPAWN
is on? Because in that case mock is run in non-toolbox environment (normal host) - and it might be better to fail on mknod (if that really was failing) instead of trying useless bind mount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my reply to @xsuchy above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I view this PR approach (running toolbox/podman wrapper around mock, instead of starting containers by mock) as much more straight-forward way to make mock maintainable in future. And really superior. While USE_NSPAWN is better than nothing, the current nspawn support isn't complete (--installroot is still not run in container) and it IMO doesn't make much sense to continue with finishing it.. having it dropped one day sounds like good idea to me.
But still, ATM, USE_NSPAWN is used by default (and e.g. used by copr build system), and in that case - falling down from mknod to bind-mount doesn't make much sense.. Also, from the other side -- trying mknod when we know in advance it will fail (toolbox case) isn't easy reading. All that is mixed up with bare host + old-chroot use-case. Just saying.
But yeah, we seem to know now what is the change about and why - at least now, so +1 from me.
Sorry for all those comments and questions. I am just trying to comprehend all those changes and make sure that we do not introduce any regression or security issue to current users. |
if self.recursive: | ||
cmd.append('--rbind') | ||
else: | ||
cmd.append('--bind') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw., why --bind
doesn't work and --rbind
does? (considering that we don't actually need recursive mounts inside chroot for the package builds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above about why we need to --rbind for the /proc and /sys mounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, the question isn't answered here, moving here https://bugzilla.redhat.com/show_bug.cgi?id=1745048
The code attempted to do all mock mounts inside a private mount namespace, by using unshare(CLONE_NEWNS), but it turns out that that isn't effective if part or all of the original mount tree are marked as 'shared' mounts - changes to copies of those mounts will still propagate back to the original mounts. Mark the entire new mount tree as 'private' to avoid such sharing.
If mock is running in a user namespace, the the /proc and /sys we mount into the buildroot will be owned by nobody:nobody (the real host on the root) not root:root. To avoid failures during package installation set the %_netsharedpath RPM macro to /proc:/sys to skip these directories. (Suggestion from Panu Matilainen.) Implement by pointing HOME to a directory with only ~/.rpmmacros. (This means that /root/.rpmmacros will now be ignored; system RPM configuration will still be honored.)
Mounting a fresh sysfs from within a user namespace is allowed only in limited cases, so always use a bind mount instead to avoid problems. This needs to be a recursive bind mount so that any mounts on top of the parent /sys are preserved in the child (the kernel will fail a plain bind mount).
If mock is running inside a user namespace, then mknod will not succeed, so use bind mounts instead.
If we are in a user namespace, but not in a PID namespace, then a fresh mount of /proc will be denied, so similar to /sys, simply always use a bind-mount of /proc from the host.
Repushed:
Let me know if you want me to try and conditionalize on not-in-a-container or !USE_NSPAWN, or if you want me to remove the |
Iŧ's just awesome to see that mock can be run in rootless container, thanks a lot Owen for this work. |
I am going to merge this. However, I would welcome if you will continue the work with |
bindpath=rootObj.make_chroot_path('/proc'), | ||
recursive=True, | ||
options="nodev,noexec,nosuid,readonly"), | ||
BindMountPoint(srcpath='/sys', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, changing /sys from mount to bind mount caused this: https://bugzilla.redhat.com/show_bug.cgi?id=1740421
This set of patches is sufficient for me to get mock running in a rootless container. I was testing with https://github.com/debarshiray/fedora-toolbox - but something like
podman run --privileged fedora:29
should work as well.(I developed and tested this against 1.4.13, then rebased to devel - I don't think anything should have broken with the rebase, but it's possible.)
The three changes are: