Skip to content

@cyphar cyphar released this Jul 2, 2020 · 4 commits to master since this release

This is intended to be the second-last RC release, with -rc92 having
very few large changes so that we can release runc 1.0 (at long last).

  • The long-awaited hooks changes have been merged into runc. This was
    one of the few remaining spec-related issues which were blocking us
    from releasing runc 1.0. Existing hook users will not be affected by
    this change, but runc now supports additional hooks that we expect
    users to migrate to eventually. The new hooks are:

    • createRuntime (replacement for the now-deprecated prestart)
    • createContainer
    • startContainer
  • A large amount of effort has been undertaken to support cgroupv2
    within runc. The support is still considered experimental, but it is
    mostly functional at this point. Please report any bugs you find when
    running under cgroupv2-only systems.

  • A minor-severity security bug was fixed. The devices list would
    be in allow-by-default mode from the outset, meaning that users would
    have to explicitly specify they wish to deny all device access at the
    beginning of the configuration. While this would normally be
    considered a high-severity vulnerability, all known users of runc had
    worked around this issue several years ago (hence why this fairly
    obvious bug was masked).

    In addition, the devices list code has been massively improved such
    that it will attempt to avoid causing spurrious errors in the
    container (such as while writing to /dev/null) when doing devices
    cgroup updates.

  • A security audit of runc was conducted in 2019, and the report PDF is
    now included in the runc repository. The previous release of runc
    has already addressed the security issues found in that report.

Thanks to the following people who made this release possible:

NOTE: For those who are confused by the massive version jump (rc10
to rc91), this was done to avoid issues with SemVer and lexical
comparisons -- there haven't been 90 other release candidates. Please
also note that runc 1.0.0-rc90 is identical to 1.0.0-rc10. See #2399
for more details.

Vote: +7 -0 #0
Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Jun 2, 2020 · 472 commits to master since this release

This release is identical to v1.0.0-rc10 (and thus the version string in
the binary will be v1.0.0-rc10).

The purpose of this release is to resolve an issue with our versioning
scheme (in particular, the format we've used under SemVer means that the
"-rcNN" string suffix is sorted lexicographically rather than in the
classic sort -V order).

Because we cannot do a post-1.0 release yet, this is a workaround to
make sure that systems such as Go modules correctly update to the latest
runc release. See #2399 for more details.

The next release (which would've originally been called -rc11) will be
1.0.0-rc91. I'm sorry.

Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Jan 24, 2020 · 472 commits to master since this release

This is a hot-fix for v1.0.0~rc9, primarily fixing CVE-2019-19921. Given
that the relevant runtime-spec PR which was considered a blocker has
been merged
the next rc release of runc should be the last one before
1.0.0.

Other notable changes include:

  • Fixing an exec-fifo race that could be triggered under Kubernetes (#2185).
  • Partial cgroupv2 support (#2209 for remaining issues).

Thanks to the following people who made this release possible:

Vote: +4 -0 #1
Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Oct 5, 2019 · 531 commits to master since this release

This is a hot-fix for v1.0.0~rc8, primarily fixing CVE-2019-16884.

Thanks to the following people who made this release possible:

Vote: +4 -0 #1
Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Apr 26, 2019 · 627 commits to master since this release

This is a hot-fix for v1.0.0-rc7, and fixes a regression on old kernels
(which don't support keycreate labeling). Users are strongly encouraged
to update, as this regression was introduced in 1.0.0-rc7 and has
blocked many users from updating to mitigate CVE-2019-5736.

Bugs: #2032 #2031 #2043

At the moment the only outlying issue before we can release 1.0.0 is
some spec discussions we are having about OCI hooks and how to handle
the integration with existing NVIDIA hooks. We will do our best to
finish this work as soon as we can.

Thanks to the following people who made this release possible:

Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Mar 28, 2019 · 636 commits to master since this release

WARNING: There is a regression in this release for old kernels, which we are working on fixing in #2031.

Due to CVE-2019-5736, we had to do another -rc release so users can update. We
hope to be able to release 1.0.0 in the near future (there is still an
outstanding spec-compliance issue with OCI hooks which we need to resolve
first).

This also updates runc to a vendored commit of the runtime-spec rather than a
full release, which will hopefully be rectified with runc 1.0.0. #k

Security:

  • Mitigate CVE-2019-5736. This is an updated version of the patch series sent
    out on openwall and we encourage users to update. #1982 #1984

    NOTE: This mitigation WILL NOT WORK if you run untrusted containers with
    host uid 0 and give them CAP_SYS_ADMIN (the protection operates through a
    hidden read-only bind-mount which can be re-mounted by CAP_SYS_ADMIN
    privileged users).

    Put simply -- we consider granting CAP_SYS_ADMIN to untrusted containers
    without user namespaces to be fundamentally insecure, as such we do not
    consider this to be a security issue
    .

    If you want an additional host-level mitigation, use chattr +i on the
    host file to ensure containers without CAP_LINUX_IMMUTABLE cannot write to
    it -- even with CAP_SYS_ADMIN. But as above, if you give
    CAP_LINUX_IMMUTABLE to a container you will have problems.

    An alternative is to bind-mount a sealed memfd copy of the runc binary over
    the binary (runc will detect this and will not attempt further mitigation,
    because sealed memfds are fundamentally unmodifiable) but this requires
    more in-depth work by administrators.

  • There appear to be production users of --no-pivot-root, which is something
    that we absolutely recommend against and do not consider to be a secure
    configuration
    -- since pivot_root(2) has many security properties that are
    not possible to provide with just chroot(2).

    However, a specific issue was discovered which we decided to mitigate in
    order to avoid production users being exploited by it. This security issue
    is not elligible for a CVE because it requires an insecure configuration
    (--no-pivot-root). #1962

Features:

  • Add intelrdt support for MBA to runc (a new intelrdt feature available in
    Linux 4.18+). #1919
  • Add support for specifying a CRIU configuration file for checkpoint/restore
    (which makes use of a new org.criu.config annotation). #1933 #1964
  • Add support for "runc exec --preserve-fds". #1995
  • Added support for SELinux labeling of keyrings. #2012

Fixes:

  • Correct handling of "runc kill" when a container is stopped or paused.
    #1934 #1943
  • Error out if built with nokmem and kmemcg limits were requested. #1939
  • Update check-config.sh to be in line with Docker's. #1942
  • Improve handling of kmem and the systemd cgroup driver. #1960
  • Improve resilience of adding setns tasks to cgroups. #1950
  • Remove (broken) detection of .scope for systemd. #1978
  • Fix console hanging with preserve-fds, where not enough fds have actually
    been provided to runc (which is a very common mistake when using
    --preserve-fds). #2000
  • Create bind-mounts when restoring. #1968
  • Fix regression of zombie "runc init" processes. #2023

Thanks to all of the contributors that made this release possible:

With special thanks and well-wishes to Victor Marmol and Rohit Jnagal, who have
both decided to give up their maintainership. Thanks for all of your
contributions over the years, and good luck with your future endeavours!

Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Nov 22, 2018 · 718 commits to master since this release

This is the final feature release of runc before 1.0, rather than 1.0
itself. The reason for tihs is that, during the preparations for this
release (which was originally meant to be 1.0) it was brought up that
there were several spec-compliance problems. One of these was related to
hook ordering, and upon trying to fix them it turns out that many users
(notably the NVIDIA OCI hooks) make use of our incorrect hook ordering.
Many of the proposed solutions to this problem all require a lot of time
and co-ordination, and thus would stall this release indefinitely.

So, the idea is to have an intermediate release which will mark a
freeze-on-everything-except-spec-compliance-bugs. No other changes will
be included pre-1.0 (aside from security patches obviously).

Features:

  • Upgrade to using Go 1.10. #1711
  • Upgrade to CRIU 3.11. #1711 #1864 #1935 #1936
  • Allow for checkpoint-restore into a foreign network namespace. #1849
  • The "type" field for bind-mounts is now ignored. This is important, because
    many users incorrectly assume that "type" defines a bind-mount and not
    "options". Previously you had to set both. #1753 #1845
  • "setgroups=allow" is now possible in rootless mode, but requires the use of
    the privileged newgidmap helper (fully-rootless still requires
    "setgroups=deny"). #1693
  • Rootless mode can now safely ignore a read-only cgroupfs. #1759 #1806
  • Several aspects of rootless mode are now used inside user namespaces. This
    is necessary for a bunch of useful things (such as running Docker inside an
    user namespace), but did cause some breakages. We think they've all been
    fixed -- but if not please submit an issue! #1688 #1808 #1816 #1862
  • Improve kernel.{domain,host}name sysctl handling, to allow the NIS
    domainname to be set from Docker or other callers without an OCI spec
    change. #1827
  • Add documentation for one of the more confusion parts of runc, how terminals
    are handled (including an explanation of --console-socket). All the gory
    details and recommendations are available in docs/terminals.md. #1730
  • Allow /proc to be bind-mounted over (useful for rootless containers). #1832
  • Ignore ENOSYS for keyctl(2) operations. This is necessary to get Docker
    working with LXC under the default seccomp profile (which is what ChromeOS
    uses). #1893
  • Add support for the Intel RDT/MBA resource control system. #1632 #1913
  • Allow building with completely-disabled kmemcg support, to get around
    problems with broken kernels (RHEL 7.5 can oops with kmemcg accounting
    enabled). #1921 #1922 #1930
  • Add support for cgroup namespaces, which in turn fixes a few other issues we
    encountered with the previous code (which could be moving us to a cgroup
    during Go execution). #1916

Fixes:

  • Namespace creation with user namespaces now plays a bit nicer with SELinux
    and IPC (which had a bug where the in-kernel mqueue mount would have the
    wrong tag if using unshare(CLONE_NEWUSER|CLONE_NEWIPC)). This is done to
    avoid future problems with broken kernel integration. #1562
  • Mild refactor of libcontainer/user. #1749
  • Fix null-pointer-exception when no cgroups were set. #1752
  • Various DBus and systemd related changes for the systemd-cgroup driver.
    #1754 #1772 #1776 #1781 #1805 #1917
  • Apply SELinux label to masked directories. #1756
  • Obey the XDG spec and set the sticky bit on runc's root when using
    XDG_RUNTIME_DIR (in rootless mode). #1760
  • Only configure network namespaces if we are creating them. #1777
  • Fix race in runc-exec against a currently-exiting pid1. #1812
  • Forward GOMAXPROCS to try to reduce the number of threads started by 'runc
    init'. Unforunately there's no way to stop Go from spawning new threads so
    this is more of a recommendation. #1830
  • Fix tmpcopyup in cases where /tmp is not a private mount. #1873
  • Whitelist /proc/loadavg for bind-mounting. #1882
  • Protect against deletion of runc state directory with a containerid of "..",
    as well as the addition of other path hardening code. #1883
  • Handle duplicated cgroupfs mountpoint entries more sanely, to make runc work
    on distributions that use-and-abuse shared subtrees. #1817
  • Fix console hanging in several cases. #1895 #1897
  • Lock-to-a-thread during 'runc init' to ensure that that we don't switch
    threads and run within a different SELinux label. #1814
  • Respect cgroupPath when trying to find the cgroupfs mountpoint (which can
    happen in cases where containers are given different cgroupfs mounts). #1872
  • And many other minor changes, many from first-time contributors! #1746 #1748
    #1749 #1784 #1779 #1785 #1796 #1819 #1825 #1836 #1824 #1820 #1838 #1840
    #1841 #1867 #1871 #1855 #1854 #1874 #1868 #1886 #1892 #1858 #1894 #1908
    #1880 #1910 #1915 #1903 #1922 #1926 #1928 #1925 #1911

Fixes (for spec violations):

  • Don't set a container to "running" when exec-ing into it (because it might
    be in the "created" state). #1771
  • oom_score_adj is now no longer modified if it was unspecified in config.json
    (this was a spec violation). #1759
  • Set "status" in hook stdin, as well as switch to using *spec.State to avoid
    JSON-representation drift. #1741

Thanks to all of the contributors that made this release possible:

Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Feb 27, 2018 · 910 commits to master since this release

This is planned to be the final -rc release of runc. While we really
haven't followed the rules for release candidates (with huge features
introduced each release, and with massive gaps between releases) the
hope is that once we've release 1.0.0 we will be much more liberal with
releases in future. Let's see how that pans out. :P

Features:

  • Support cgroups in rootless containers. This is a continuation of the
    previous work done, and allows for users that have specialised setups
    (such as having the LXC pam_cg.so module set up) to use cgroups with
    rootless containers. #1540
  • Add support for newuidmap and newgidmap with rootless containers.
    This is a continuation of some previous work, and allows users that
    have /etc/sub{uid,gid} configured to use the shadow-utils setuid
    helpers. Note that this support doesn't restrict users that don't want
    to use setuid binaries at all. #1529
  • runc will now use a chroot when mount namespaces aren't provided in
    the config.json. While chroot does have its (many) downsides, this
    does allow for specialised configurations to work properly. #1702
  • Expose annotations to hooks, so that the hook can have more direct
    information about the container it is being run against. #1687
  • Add "runc exec --additional-gids" support. #1608
  • Allow more signals to be sent with "runc kill" than are defined by
    Go's syscall package. #1706
  • Emit an error if users try to use MS_PRIVATE with --no-pivot, as that
    is simply not safe. #1606
  • Add support for "unbindable" and "runbindable" as rootfs propagation.
    #1655
  • Implement intelrdt support in runc. #1279 #1590
  • Add support for lazy migration with CRIU. This includes the addition
    of "runc checkpoint httpd" which acts as a remote pagefault request
    server. #1541
  • Add MIPS support. #1475

Fixes:

  • Delay seccomp application as late as possible, to reduce the syscall
    footprint of runc on profiles. #1569

  • Fix --read-only containers with user namespaces, which would
    previously fail under Docker because of privilege problems when trying
    to do the read-only remount. #1572

  • Switch away from stateDirFd entirely. This is an improvement over the
    protections we added for CVE-2016-9962, and protects against many
    other possible container escape bugs. #1570

  • Handle races between "runc start" and "runc delete" over the exec FIFO
    correctly, and avoid blocking "runc start" indefinitely. #1698

  • Correctly generate seccomp profiles that place requirements on syscall
    arguments, as well as multi-argument restrictions. #1616 #1424

  • Prospective patch for remounting of old-root during pivot_root. This
    is intended to solve one of the many "mount leak" bugs that have been
    popping up recently -- caused by lots of container churn and host
    mounts being pinned during container setup. #1500

  • Fix "runc exec" on big-endian architectures. #1727

  • Correct systemd slice expansion to work with cAdvisor. #1722

  • Fix races against systemd cgroup scope creation. #1683

  • Do not wait for signalled processes if libcontainer is running in a
    process that is a subreaper. #1678

  • Remove dependency on libapparmor entirely, and just use
    /proc/$pid/attr directly. #1675

  • Improvements to our integration tests. #1661 #1629 #1528

  • Handle systemd's quirky CPUQuotaPerSecUSec handling in
    fractions-of-a-percent edge-cases. #1651

  • Remove docker/docker import in runc by moving the package to runc.
    #1644

  • Switch from docker's pkg/symlink to cyphar/filepath-securejoin. #1622

  • Enable integration and unit tests on arm64. #1642 #1640

  • Add /proc/scsi to masked paths (mirror of Docker's CVE-2017-16539).
    #1641

  • Add several tests for specconv. #1626 #1619

  • Add more extensive tests for terminal handling. #1357

  • Always write freezer state during retry-loop, to avoid an indefinite
    hang when new tasks are spawned in the container. #1610

  • Create cwd when it doesn't exist in the container. #1604

  • Set initial console size based on process spec, to avoid SIGWINCH
    races where initial console size is completely wrong. #1275

  • Small fixes for static builds. #1579 #1577

  • Use epoll for PTY IO, to avoid issues with systemd's SAK protections.
    #1455

  • Update state.json after a "runc update". #1558

  • Switch to umoci's release scripts, to use a more "standardised" and
    distribution-friendly release scheme. Several makefile-fixes included
    as well. #1554 #1542 #1555

  • Reap "runc:[1:CHILD]" to avoid intermediate zombies building up. #1506

  • Use CRIU's RPC to check the version. #1535

  • Always save own namespace paths rather than the path given during
    start-up, to avoid issues where the path disappears afterwards. #1477

  • Fix that we incorrectly set the owners of devices. This is still (subtly)
    broken in user namespaces, but will be fixed in a future version. #1743

  • Lots of other miscellaneous fixes and cleanups, many of which were
    written by first-time contributors. Thanks for contributing, and
    welcome to the project! #1729 #1724 #1695 #1685 #1703 #1699 #1682
    #1665 #1667 #1669 #1654 #1664 #1660 #1645 #1640 #1621 #1607 #1206
    #1615 #1614 #1453 #1613 #1600 #1599 #1598 #1597 #1593 #1586 #1588
    #1587 #1589 #1575 #1578 #1573 #1561 #1560 #1559 #1556 #1551 #1553
    #1548 #1544 #1545 #1537

Removals:

  • Andrej Vagin stepped down as a maintainer. Thanks for all of your hard
    work Andrej, and have fun working on your other projects! #1543

Thanks to all of the contributors that made this release possible:

Vote: +5 -0 #2
Signed-off-by: Aleksa Sarai asarai@suse.de

Assets 7

@cyphar cyphar released this Aug 10, 2017 · 1140 commits to master since this release

Features:

  • runc now supports v1.0.0 of the OCI runtime specification. #1527
  • Rootless containers support has been released. The current state of
    this feature is that it only supports single-{uid,gid} mappings as an
    unprivileged user, and cgroups are completely unsupported. Work is
    being done to improve this. #774
  • Rather than relying on CRIU version nnumbers, actually check if the
    system supports pre-dumping. #1371
  • Allow the PIDs cgroup limit to be updated. #1423
  • Add support for checkpoint/restore of containers with orphaned PTYs
    (which is effectively all containers with terminal=true). #1355
  • Permit prestart hooks to modify the cgroup configuration of a
    container. #1239
  • Add support for a wide variety of mount options. #1460
  • Expose memory.use_hierarchy in MemoryStats. #1378

Fixes:

Removals:

  • Remove any semblance of non-Linux support. #1502
  • We no longer use shfmt for testing. #1510

Thanks to all of the contributors that made this release possible:

Vote-Closed: [Wed Aug 9 05:28:38 UTC 2017]
Vote-Results: [+5 -0 /2]

Assets 7

@cyphar cyphar released this Mar 21, 2017 · 1302 commits to master since this release

Features:

  • Add slice management support to the systemd cgroup driver. Checks are
    done to make sure that systemd supports the feature. #1084
  • Support for readonly mount labels. #1112
  • Add a tmpcopyup mount extension for tmpfs mounts that are mounted over
    already existing directories, allowing for the contents of a volume to
    be copied up transparently. #845
  • Switch our pivot_root usage to no longer require temporary
    directories, improving the state of containters running in entirely
    readonly contexts. #1125 #1148
  • Allow updating of rt_period_us and rt_runtime_us in cpuacct cgroup.
  • Reimplement console handling to use AF_UNIX sockets such that the
    console is created inside the container's (namespaced) devpts
    instance, solving a wide variety of historical pty bugs with runC.
    #1018 #1356
  • Support overlayfs in mounts. #1314
  • Support creating devices with types 'p' and 'u'. #1321
  • Add --preserve-fds=N to create and run commands. #1320
  • Add pre-dump and parent-path to checkpoint. #1001
  • Update to runtime-spec v1.0.0-rc5. #1370

Fixes:

  • Remove check for binding to /. #1090
  • Ensure we log to logrus on command errors. #1089
  • Don't enable kmem limits if they're not specified in the config. #1095
  • Handle cases where specs.Resources.* members would cause null
    dereferences. #1111 #1116
  • Fix bugs in the GetProcessStartTime implementation. #1136
  • Make sysctl config validation checks handle network namespaces more
    gracefully. #1138 #1149
  • Guarantee correct namespace creation ordering. This is part of the
    rootless container patchset, and is also required in certain SELinux
    setups. #977
  • Stop screwing around with '\n' in console output. #1146
  • Fix cpuset.cpu_exclusive handling. #1194
  • Sync HookState with the OCI specification. #1201
  • Split remounting mountpoints and bindmounts, resolving issues with
    mount options being dropped in certain cases. #1222
  • Fix leftover cgroup directory issue. #1196
  • Handle config.Devices and config.MaskPaths in checkpoint. #1110.
  • Don't create combined cgroup subsystem names. #1268
  • Ignore cgroupv2 mountpoints, fixing issues with systemd v232. #1266
  • Race condition when synchronising with children and grandchildren in
    nsexec.c. #1237
  • Fix state checks to no longer depend on _LIBCONTAINER being present in
    the environment, fixing both bugs as well as being part of the
    rootless container patchset. #1317
  • Fix systemd-notify when using different PID namespaces, and allow
    detach+notify socket. #1308
  • Don't fchown when inheriting stdio, which is necessary for rootless
    containers in certain scenarios. #1354
  • Fix cpu.cfs_quota_us being changed when systemd is reloaded. #1344
  • Add devices to whitelist for LXD, to make runC under LXC/LXD work
    better. #1327
  • Many improvements to testing. #1121 #1131 #1132 #1147

Security:

Thanks to all of the contributors that made this release possible:

Assets 7
You can’t perform that action at this time.