Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI runtime support for nspawn #9762

Merged
merged 10 commits into from Mar 21, 2019

Conversation

@poettering
Copy link
Member

commented Jul 31, 2018

This is a big one, unfortunately.

For full drop-in OCI runtime support two bits are still missing: the OCI hooks need to be executed at the right times, and we need a command line tool that provides runc compatibility. I am on both, but this already grew large enough I didn't want to pile on.

But even without those two bits it's pretty comprehensive and should pretty much work.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2018

/cc @alban, @iaguis, @dongsupark


<listitem><para>Takes the path to an OCI runtime bundle to invoke, as specified in the <ulink
url="https://github.com/opencontainers/runtime-spec/blob/master/spec.md">OCI Runtime Specification</ulink>. In
this case no <filename>.settings</filename> file is loaded, and the root directory and various settings are

This comment has been minimized.

Copy link
@boucman

boucman Jul 31, 2018

Contributor

you mean .nspawn file here ?

This comment has been minimized.

Copy link
@poettering

poettering Aug 1, 2018

Author Member

i do!

@evverx

This comment has been minimized.

Copy link
Member

commented Jul 31, 2018

This pull request introduces 6 alerts when merging a9451ba into 5a8b164 - view on LGTM.com

new alerts:

  • 6 for Comparison result is always the same

Comment posted by LGTM.com

@poettering poettering force-pushed the poettering:nspawn-oci branch from a9451ba to 7572186 Jul 31, 2018

@evverx

This comment has been minimized.

Copy link
Member

commented Jul 31, 2018

This pull request introduces 6 alerts when merging 7572186 into 5a8b164 - view on LGTM.com

new alerts:

  • 6 for Comparison result is always the same

Comment posted by LGTM.com

#define JSON_VALUE_NULL ((JsonValue) {})

/* We use fake JsonVariant objects for some special values, in order to avoid memory allocations for them. Note that
* effectively this means that there are multiple ways to encode the some objects: via these magic values or as

This comment has been minimized.

Copy link
@dnicolodi

dnicolodi Jul 31, 2018

encode the some objects -> encode some objects

This comment has been minimized.

Copy link
@dnicolodi

dnicolodi Aug 2, 2018

Typo still here :)


/*
In case you wonder why we have our own JSON implementation, here are a couple of reasons why this implementation has
benefits over various other implementatins:

This comment has been minimized.

Copy link
@evverx

evverx Aug 1, 2018

Member

I'm not going to say it's a bold move to implement a json parser :-), but if it's going to be implemented, maybe it would be helpful to start fuzzing it right away. I ran a fuzzer compiled with UBSan for about ten seconds and it crashed after receiving relatively normal input like -34444444444541444444 and "0" with

../../src/systemd/src/basic/json.c:1845:48: runtime error: signed integer overflow: 10 * -3444444444454144444 cannot be represented in type 'long'
    #0 0x7fe6beefb805 in json_parse_number /work/build/../../src/systemd/src/basic/json.c:1845:48
    #1 0x7fe6beefa5ca in json_tokenize /work/build/../../src/systemd/src/basic/json.c:2009:29

and

../../src/systemd/src/basic/json.c:350:9: runtime error: index 1 out of bounds for type 'char [0]'
    #0 0x7fab2d4c2042 in json_variant_new_stringn /work/build/../../src/systemd/src/basic/json.c:350:22
    #1 0x7fab2d4c8690 in json_parse_internal /work/build/../../src/systemd/src/basic/json.c:2321:29
    #2 0x7fab2d4c781a in json_parse /work/build/../../src/systemd/src/basic/json.c:2477:16

There were also a couple warnings from clang, the first of which seems legit:

../../src/systemd/src/basic/json.c:2486:32: warning: unused variable 'opened' [-Wunused-variable]
        _cleanup_fclose_ FILE *opened = NULL;
                               ^
../../src/systemd/src/basic/json.c:2949:13: warning: variable 'source' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
        if (variant) {
            ^~~~~~~
../../src/systemd/src/basic/json.c:2955:13: note: uninitialized use occurs here
        if (source && source_line > 0 && source_column > 0)
            ^~~~~~
../../src/systemd/src/basic/json.c:2949:9: note: remove the 'if' if its condition is always true
        if (variant) {
        ^~~~~~~~~~~~~
../../src/systemd/src/basic/json.c:2936:27: note: initialize the variable 'source' to silence this warning
        const char *source;
                          ^
                           = NULL

This comment has been minimized.

Copy link
@poettering

poettering Aug 1, 2018

Author Member

../../src/systemd/src/basic/json.c:1845:48: runtime error: signed integer overflow: 10 * -3444444444454144444 cannot be represented in type 'long'

This one is misleading. If you look at the sources you find an overflow check right after, that divides the result again by 10 and checks if that result is -3444444444454144444 again.

This comment has been minimized.

Copy link
@poettering

poettering Aug 1, 2018

Author Member

../../src/systemd/src/basic/json.c:350:9: runtime error: index 1 out of bounds for type 'char [0]'

And this one is misleading too. If you look at the function you'll see we actually allocate the buffer long enough. The field in the structure is the last one in it, and the only reason we specify its size as [0] rather than leave it C99-style unspecified as [] is that we sometimes want arrays of the structure, in which case we don't use the remaining buffer...

This comment has been minimized.

Copy link
@evverx

evverx Aug 1, 2018

Member

Unfortunately, false positives are inevitable, but I don't think they make fuzzers less useful. In this particular case, __attribute__((no_sanitize("signed-integer-overflow"))) and __attribute__((no_sanitize("bounds"))) could probably be used to silence UBSan.

This comment has been minimized.

Copy link
@poettering

poettering Aug 2, 2018

Author Member

I figure I could rewrite the overflow check to not trigger ubsan. But for the [0] bounds check I have no idea how I could trick it to not complain. Any idea?

Quite frankly, ubsan should be smart enough to understand that [0] is special, and generally doesn't actually mean "zero sized array", but instead is similar to C99 [], i.e. more akin to "unspecified size"...

This comment has been minimized.

Copy link
@poettering

poettering Aug 2, 2018

Author Member

@evverx hmm, I figure it would make sense to simply pick up that #9782 and add it to this PR, what do you think?

This comment has been minimized.

Copy link
@evverx

evverx Aug 2, 2018

Member

@poettering it's fine by me, but it shouldn't be merged into master until google/oss-fuzz#1683 is merged.

This comment has been minimized.

Copy link
@evverx

evverx Aug 3, 2018

Member

@poettering google/oss-fuzz#1683 has been merged. I pushed the commit from #9782 on top of your branch.

This comment has been minimized.

Copy link
@evverx

evverx Aug 3, 2018

Member

Fedora CI seems to have vanished. Does anybody know how to bring it back or trigger a build manually?

This comment has been minimized.

Copy link
@evverx

evverx Aug 6, 2018

Member

I'm not sure what's happened, but the fuzzer is gone. It'd probably be better to leave it util later. I'll reopen #9782 so as not to forget that it isn't done yet.


json_dump_with_flags(v, stdout);
if (r < 0)
return r;

This comment has been minimized.

Copy link
@evverx

evverx Aug 1, 2018

Member

It'd be great if you could also take a look at the LGTM alerts. At least the last two of them don't look like false positives. For example, the if block really seems to be redundant here.

@rhatdan

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2018

@giuseppe FYI

@poettering poettering force-pushed the poettering:nspawn-oci branch 2 times, most recently from 9913a12 to 0f5b80c Aug 2, 2018

@poettering

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

I force pushed a new version now, addressing all raised issues. Let's see if LGTM likes it more this time. PTAL.

@poettering poettering force-pushed the poettering:nspawn-oci branch 2 times, most recently from a92797e to 470418a Aug 2, 2018

@evverx

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

test-stat-util seems to have started to fail on Fedora CI with

152/321 test-stat-util                          FAIL     0.19 s (killed by signal 6 SIGABRT)
--- command ---
RPM_OS='linux' _='/usr/bin/python3' LANG='C' container_uuid='05c04656-05d9-479d-aa1d-84652cc6678d' HISTCONTROL='ignoredups' RPM_ARCH='x86_64' HOSTNAME='' OLDPWD='/builddir/build/BUILD' CONFIG_SITE='NONE' RPM_PACKAGE_RELEASE='0.1.20180802134457.470418a.fc29' RPM_BUILD_DIR='/builddir/build/BUILD' RPM_LD_FLAGS='-Wl,-z,relro   -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' NOTIFY_SOCKET='/run/systemd/nspawn/notify' container='systemd-nspawn' USER='mockbuild' RPM_SOURCE_DIR='/builddir/build/SOURCES' PWD='/builddir/build/BUILD/systemd-240/x86_64-redhat-linux-gnu' RPM_PACKAGE_VERSION='240' HOME='/builddir' RPM_BUILD_ROOT='/builddir/build/BUILDROOT/systemd-240-0.1.20180802134457.470418a.fc29.x86_64' MAIL='/var/spool/mail/mockbuild' SHELL='/bin/bash' TERM='vt100' RPM_PACKAGE_NAME='systemd' RPM_DOC_DIR='/usr/share/doc' SHLVL='3' PROMPT_COMMAND='printf "\033]0;<mock-chroot>\007"' LOGNAME='mockbuild' PATH='/builddir/build/BUILD/systemd-240/x86_64-redhat-linux-gnu:/builddir/.local/bin:/builddir/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin' PKG_CONFIG_PATH=':/usr/lib64/pkgconfig:/usr/share/pkgconfig' RPM_OPT_FLAGS='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' HISTSIZE='1000' LC_CTYPE='C.UTF-8' SYSTEMD_KBD_MODEL_MAP='/builddir/build/BUILD/systemd-240/src/locale/kbd-model-map' SYSTEMD_LANGUAGE_FALLBACK_MAP='/builddir/build/BUILD/systemd-240/src/locale/language-fallback-map' /builddir/build/BUILD/systemd-240/x86_64-redhat-linux-gnu/test-stat-util
--- stderr ---
Assertion 'device_path_make_canonical(st.st_mode, st.st_rdev, &resolved) >= 0' failed at ../src/test/test-stat-util.c:113, function test_device_path_make_canonical_one(). Aborting.

It probably has something to do with the absence of /run/systemd/inaccessible, but I wouldn't say I'm 100% certain.

@poettering

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

Neat! CI all green now!

@evverx

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

Interestingly, it seems that LGTM restarted the analysis as soon as the label was removed.

@poettering poettering force-pushed the poettering:nspawn-oci branch from 33c9795 to a85090b Aug 2, 2018

@poettering

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

I force pushed another new version now, fixing the typo @dnicolodi found.

@poettering poettering force-pushed the poettering:nspawn-oci branch 3 times, most recently from 718e55c to 72f29ee Mar 8, 2019

poettering added some commits Mar 5, 2019

capability: keep CAP_SETPCAP while dropping bounding caps
The kernel only allows dropping bounding caps as long as we have
CAP_SETPCAP. Hence, let's keep that before dropping the bounding caps,
and afterwards drop them too.
nspawn: refactor setuid code a bit
Let's separate out the raw uid_t/gid_t handling from the username
handling. This is useful later on.

Also, let's use the right gid_t type for group types wherever
appropriate.
capability: let's protect against the kernel eventually doing more th…
…an 64 caps

Everyone will be in trouble then (as quite widely caps are store in
64bit fields). But let's protect ourselves at least to the point that we
ignore all higher caps for now.
nspawn: add support for executing OCI runtime bundles with nspawn
This is a pretty large patch, and adds support for OCI runtime bundles
to nspawn. A new switch --oci-bundle= is added that takes a path to an
OCI bundle. The JSON file included therein is read similar to a .nspawn
settings files, however with a different feature set.

Implementation-wise this mostly extends the pre-existing Settings object
to carry additional properties for OCI. However, OCI supports some
concepts .nspawn files did not support yet, which this patch also adds:

1. Support for "masking" files and directories. This functionatly is now
   also available via the new --inaccesible= cmdline command, and
   Inaccessible= in .nspawn files.

2. Support for mounting arbitrary file systems. (not exposed through
   nspawn cmdline nor .nspawn files, because probably not a good idea)

3. Ability to configure the console settings for a container. This
   functionality is now also available on the nspawn cmdline in the new
   --console= switch (not added to .nspawn for now, as it is something
   specific to the invocation really, not a property of the container)

4. Console width/height configuration. Not exposed through
   .nspawn/cmdline, but this may be controlled through $COLUMNS and
   $LINES like in most other UNIX tools.

5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on
   the cmdline, since containers likely have different user tables, and
   the existing --user= switch appears to be the better option)

6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to
   OCI)

7. Creation of additional devices nodes in /dev. Most likely not a good
   idea, hence not exposed in .nspawn/cmdline. There's already --bind=
   to achieve the same, which is the better alternative.

8. Explicit syscall filters. This is not a good idea, due to the skewed
   arch support, hence not exposed through .nspawn/cmdline.

9. Configuration of some sysctls on a whitelist. Questionnable, not
   supported in .nspawn/cmdline for now.

10. Configuration of all 5 types of capabilities. Not a useful concept,
    since the kernel will reduce the caps on execve() anyway. Not
    exposed through .nspawn/cmdline as this is not very useful hence.

Note that this only implements the OCI runtime logic itself. It does not
provide a runc-compatible command line tool. This is left for a later
PR. Only with that in place tools such as "buildah" can use the OCI
support in nspawn as drop-in replacement.

Currently still missing is OCI hook support, but it's already parsed and
everything, and should be easy to add. Other than that it's OCI is
implemented pretty comprehensively.

There's a list of incompatibilities in the nspawn-oci.c file. In a later
PR I'd like to convert this into proper markdown and add it to the
documentation directory.

@poettering poettering force-pushed the poettering:nspawn-oci branch from 72f29ee to a3fc6b5 Mar 15, 2019

@poettering

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

I rebased and added a fix for #11755 on top now.

@keszybz
Copy link
Member

left a comment

After staring at this for a few hours, I don't see anything wrong. I'm sure there must be something wrong, it just isn't possible to have so many lines bug-free, so this should get tested before we release v242. Let's merge.

Show resolved Hide resolved src/basic/capability-util.c
Show resolved Hide resolved src/basic/capability-util.c

r = mount_verbose(m->graceful ? LOG_DEBUG : LOG_ERR, NULL, where, NULL, MS_BIND|MS_RDONLY|MS_REMOUNT, NULL);
if (r < 0)
return m->graceful ? 0 : r;

This comment has been minimized.

Copy link
@keszybz

keszybz Mar 21, 2019

Member

I was a bit surprised that there's no unmounting performed if the operation fails halfway with m->graceful. In practice it doesn't probably matter that much, because the file is still protected by restrictive file permissions, but I wonder it wouldn't be proper to unmount it anyway.

This comment has been minimized.

Copy link
@poettering

poettering Mar 21, 2019

Author Member

will add

Show resolved Hide resolved src/nspawn/nspawn-oci.c
Show resolved Hide resolved src/nspawn/nspawn-oci.c
@keszybz

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Oh, I'll submit a PR with some follow-up cleanups shortly.

@keszybz keszybz merged commit d0b6a10 into systemd:master Mar 21, 2019

10 checks passed

CentOS CI Build finished.
Details
CentOS CI (Vagrant) Build finished.
Details
LGTM analysis: C/C++ No new or fixed alerts
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No code changes detected
Details
bionic-amd64 autopkgtest finished (success)
Details
bionic-i386 autopkgtest finished (success)
Details
bionic-s390x autopkgtest finished (success)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
semaphoreci The build passed on Semaphore.
Details

anitazha added a commit to anitazha/systemd that referenced this pull request Jun 4, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in systemd#9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional and log about it.

Fixes systemd#12539

anitazha added a commit to anitazha/systemd that referenced this pull request Jun 4, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in systemd#9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes systemd#12539

anitazha added a commit to anitazha/systemd that referenced this pull request Jun 20, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in systemd#9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes systemd#12539

anitazha added a commit to anitazha/systemd that referenced this pull request Jun 20, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in systemd#9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes systemd#12539

poettering added a commit that referenced this pull request Jun 20, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes #12539

edevolder added a commit to edevolder/systemd that referenced this pull request Jun 26, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in systemd#9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes systemd#12539

edevolder added a commit to edevolder/systemd that referenced this pull request Jun 26, 2019

nspawn: don't hard fail when setting capabilities
The OCI changes in systemd#9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).

Fixes systemd#12539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.