Skip to content

Commit

Permalink
nixos/systemd-nspawn: Allow to modify .nspawn-units with a derivation
Browse files Browse the repository at this point in the history
Sometimes it's needed to build a configuration within a `nix-build` for
systemd units. While this is fairly easy for .service-units (where you
can easily define overrides), it's not possible for `systemd-nspawn(1)`.

This is mostly a hack to get dedicated bind-mounts of store paths from
`pkgs.closureInfo` into the configuration without IFD.

In the long term we either want to fix this in systemd or find a more
suited solution though.

nixos/containers-next: initialize first draft for new NixOS containers w/networkd

This is the first batch of changes for a new container-module replacing
the current `nixos-container`-subsystem in the longterm.

The state in here is still strongly inspired by the
`containers`[1]-module to declare declarative nspawn-instances by using
NixOS config for the host and the container itself.

For now, this module uses the tentative namespace `nixos.containers',
but that's subject to change.

This new module will also contain the following key-differences:

* Rather than writing a big abstraction-layer on top, we'll rely on
  `.nspawn`-units[2]. This has the benefits that (1) we can stop adding
  options for each new nspawn-feature (such as MACVLANs, ephemeral
  instances, etc.) because it can be directly written into the
  `.nspawn`-unit using the module system like

      systemd.nspawn.foo.filesConfig = {
        BindReadOnly = /* ... */
      };

  Also, administrators don't need to learn too much about our
  abstractions, they only need to know a few basics about the
  module-system and how to write systemd units.

* This feature strictly enforces `systemd-networkd` on both the
  container & the host. It can be turned off for containers in the
  host-namespace without a private network though.

  The reason for this is that the current `nixos-container`
  implementation has the long-standing bug that the container's uplink
  is broken *until* the container has booted since the host-side of the
  veth-pair is configured in `ExecStartPost=`[3]. This is, because
  there's no proper way to take care of it in an earlier stage since
  `systemd-nspawn` creates the interface itself.

  This has e.g. the implication that services inside the container
  wrongly assume that they connect to e.g. an external database via
  network (since `network{,-online}.target` was reached), however this
  is not the case due to the unconfigured host-side veth interface.

  However, when using `systemd-networkd(8)` on both sides, this is not
  the case anymore since systemd will automatially take care of
  configuring the network correctly when an nspawn unit starts and
  `networkd` is active.

Apart from a basic draft, this also contains support for RFC1918
IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via
SLAAC and `radvd(8)` including support for ephemeral containers.

Further additions such as a better config-activation mechanism
and a tool to manage containers imperatively will follow.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-containers
[2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#
[3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240

nixos/containers-next: initialize first draft for new NixOS containers w/networkd

This is the first batch of changes for a new container-module replacing
the current `nixos-container`-subsystem in the longterm.

The state in here is still strongly inspired by the
`containers`[1]-module to declare declarative nspawn-instances by using
NixOS config for the host and the container itself.

For now, this module uses the tentative namespace `nixos.containers',
but that's subject to change.

This new module will also contain the following key-differences:

* Rather than writing a big abstraction-layer on top, we'll rely on
  `.nspawn`-units[2]. This has the benefits that (1) we can stop adding
  options for each new nspawn-feature (such as MACVLANs, ephemeral
  instances, etc.) because it can be directly written into the
  `.nspawn`-unit using the module system like

      systemd.nspawn.foo.filesConfig = {
        BindReadOnly = /* ... */
      };

  Also, administrators don't need to learn too much about our
  abstractions, they only need to know a few basics about the
  module-system and how to write systemd units.

* This feature strictly enforces `systemd-networkd` on both the
  container & the host. It can be turned off for containers in the
  host-namespace without a private network though.

  The reason for this is that the current `nixos-container`
  implementation has the long-standing bug that the container's uplink
  is broken *until* the container has booted since the host-side of the
  veth-pair is configured in `ExecStartPost=`[3]. This is, because
  there's no proper way to take care of it in an earlier stage since
  `systemd-nspawn` creates the interface itself.

  This has e.g. the implication that services inside the container
  wrongly assume that they connect to e.g. an external database via
  network (since `network{,-online}.target` was reached), however this
  is not the case due to the unconfigured host-side veth interface.

  However, when using `systemd-networkd(8)` on both sides, this is not
  the case anymore since systemd will automatially take care of
  configuring the network correctly when an nspawn unit starts and
  `networkd` is active.

Apart from a basic draft, this also contains support for RFC1918
IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via
SLAAC and `radvd(8)` including support for ephemeral containers.

Further additions such as a better config-activation mechanism
and a tool to manage containers imperatively will follow.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-containers
[2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#
[3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240

nixos/containers-next: implement small wrapper for nspawn port-forwards

This exposes a given `containerPort` to the host address. So if port 80
from the container is forwarded to the host's port 8080 and the
container uses `2001:DB8::42` and the host-side uses `2001:DB8::23` on
the veth-interface, then `[2001:DB::42]:80` will be available on the
host as `[2001:DB8::2]:8080`.

nixos/containers-next: implement more advanced networking tests

This change tests various combinations of static & dynamic addressing
and also fixes a bug where `radvd(8)` was errorneously configured for
veth-pairs where it's actually not needed.

This test is also supposed to show how to use `systemd`-configuration to
implement most of the features (for instance there's no custom set of
options to implement MACVLANs) and serves as regression-test for future
`systemd`-updates in NixOS.

Please note that the `ndppd`-hack is only here because QEMU doesn't do
proper IPv6 neighbour resolution. In fact, I left comments whenever some
workarounds were needed for the testing-facility.

nixos/tests/container-migration: init

This test is supposed to demonstrate how to migrate a single container
to the new subsystem. Of course, docs on how to rewrite config isn't
written yet, this is mainly a POC to show that it's generally possible
by

* Deploying a new configuration (using `nixos.containers`) being
  equivalent to the old one.
* Moving the state from `/var/lib/containers` to `/var/lib/machines`.
* Rebooting the host - unfortunately - because otherwise
  `systemd-networkd` will reach an inconsistent state - at least with
  v247.

For the reboot-part I also had to change the QEMU vm-builder a bit to
actually support persistent boot-disks.

nixos/containers-next: allow static configuration for a virtual zone as well

This is already the case for dynamically assigned addresses (e.g. via
SLAAC or DHCPv4) where `0.0.0.0/24` and `::/64` provides a pool of
private IPs. However if such a zone is supposed to be fully static, the
same should be possible as well.

nixos/switch-to-configuration: import old config activation changes

This is basically what I tried in NixOS#84608 at first - being able to reload
or restart a container based on the NixOS-specific
`re{load,start}IfChanged` options for systemd units, but with a few
differences:

* I switched back to using `nsenter(1)` from util-linux for the same
  rationale as in ebb6e38: without
  this, the activation would hang until a timeout is exceeded if the
  service-manager inside the container is reloaded.

* I also disabled `systemd-networkd-wait-online.service` inside the
  container because it'd also hang even if the interfaces are configured
  properly. We should investigate how to fix it / if it was already
  fixed at some point.

Also implemented a small test to ensure that a config-activation works
fine, even with networking.

nixos/containers-next: fix broken machinectl reboot and probably more

It seems as systemd ignores `systemd-nspawn@` (the template unit) if an
override exists and a custom unit for the service (i.e.
`systemd-nspawn@containername.service`):

    [root@server:~]# systemctl status systemd-nspawn@ldap
    ● systemd-nspawn@ldap.service
         Loaded: loaded (/nix/store/rm4kigdbzl78iai8jfbgxbslvalk8bwa-unit-systemd-nspawn-ldap.service/systemd-nspawn@ldap.service; linked; vendor preset: enabled)
        Drop-In: /nix/store/fr9zabpvp3077cbb6jnpxm42qxqw9yk2-system-units/systemd-nspawn@.service.d
                 └─overrides.conf
         Active: active (running) since Tue 2021-03-16 15:01:32 UTC; 23min ago

This breaks at least `machinectl reboot` which needs
`RestartForceExitStatus = 133` as setting. For now, I've added all
settings to the module itself.

nixos/switch-to-configuration: Implement more generic decisions for config activations in containers

Actually, using `re{load,start}IfChanged` isn't the best decision for
containers because some containers have to be reloaded or restarted
depending on what has changed. For instance, a new bind-mount requires a
`machinectl reboot`, but a change in the NixOS config only needs a
`systemctl reload` (which runs `switch-to-configuration` inside the
container).

To model this, I decided to add four keywords and an option
`activation.strategy` to declarative containers:

* `strategy = "none"` means that the container will be entirely ignored
  by `switch-to-configuration`.

* `strategy = "restart"` will always `machinectl reboot` the container
  if a change was detected.

* `strategy = "reload"` will always `systemctl reload` the container if
  a change was detected.

* `strategy = "dynamic"` will check what has changed inside the
  container. If only the NixOS config inside the container has changed,
  a reload will be scheduled, otherwise a restart.

Always did a nearly full rewrite of the activation test to cover several
corner-cases and combination of such settings.

nixos/containers-next: add read-only `nixos.containers.rendered` option

This option is an attr-set that maps containers to their NixOS
configuration since `nixos.containers.instances` directly transforms the
config to a NixOS derivation. Also, the raw `nixos.containers.instances`
isn't really usable since it usually contains a list of chunks that are
evaluated by the module-system.

This is actually useful to introspect the configuration just as it's
done with e.g. `resources.machines`[1] in nixops. For instance, I'm
configuring my Prometheus scraping targets like this by gathering all active
exporters in my machines and their containers:

    { config, lib, ... }: with lib;
    let
      containers = flip mapAttrsToList machine.nixos.containers.rendered (const (x: x.config));
    in
      flip concatMap (attrValues containers)
        (c: flip concatMap (attrValues c.services.prometheus.exporters)
          (exporter:
            (optional exporter.enable "${config.networking.fqdn}:${toString exporter.port}")))

[1] https://nixos.mayflower.consulting/blog/2018/10/26/nixops-machine-configs/

nixos/all-tests: register tests

Also add a `jobset.nix` to test this on my self-hosted Hydra (which btw
uses this feature already :p).

nixos/containers-next: make sure that the module works fine with `restrictedEval` being active

This is necessary to get it running on my Hydra.

nixos/containers-next: add test for SSH inside a nspawn machine

Just another small testcase to confirm that the container's network
works fine.

nixos/containers-next: enable private users by default

nixos/systemd-nspawn: make `/etc/systemd/nspawn` mutable

Now only `/etc/systemd/nspawn/<name>.nspawn` will be a symlink rather
than having the full directory as a symlink. This is actually consistent
with `networkd` (both don't have alternate locations for transient units)
and will become necessary when implementing imperative containers since
these should also use nspawn units.

nixos/containers-next: fix eval after 21.05 breaking changes

`stdenv.lib` and `pkgs.utillinux` are deprecated now and cause an
error when disallowing aliases (which is the default when evaluating
nixpkgs).

nixos-nspawn: init

This is a first draft for imperative containers - basically a
replacement for `nixos-container` - based on Python. It's still missing
a few features, but is actually a working POC with the following
key-differences:

* Rather than Perl, Python is used now. While the choice of a language
  is always debatable, I'm pretty convinced that Python is easier to
  access than Perl and a lot more people are willing to write Python
  code (that's for instance the reason why the test-driver was
  eventually ported to Python).

* Similar to `extra-container`[1], this also contains way more features
  than the stock `nixos-container` implementation. This is because we
  basically provide all options from `nixos.containers` and evaluate
  them after that. The additional configs (such as
  `activation`/`network`/etc) are rendered into JSON and can be read by
  the script to imperatively create `.nspawn` & `.network` units.

[1] https://github.com/erikarvstedt/extra-container

nixos/containers-next: implement proper user-namespacing support

Now we're doing it correct user-namespacing here as well, for that a few
filesystem-fixes had to be applied.

For more context, please refer to NixOS#67336
Also credits go to the author of the aforementioned PR, I basically
pulled these changes into this branch.

nixos/containers-next: add support for `LoadCredential=`

With user-namespacing set to `pick`[1], bind-mounts will always be owned
by `nouser:nogroup`. This is a problem for secrets since these shouldn't
be world-readable and with a `nouser:nogroup` from another
user-namespace (the `root` inside container isn't an actual `root`
anymore) the secrets would be unreadable.

To work around this, `LoadCredential=` can be used. In fact, using
`--load-credential` - unfortunately there's no switch for
`.nspawn`-units - passes a secret into a container where it can be
re-used by using the host's credential-ID as `path` in a `.service`-file
inside the container.

So basically

    {
      nixos.containers.instances.foo.credentials = [
        { id = "foo"; path = "/run/secrets/foo";}
      ];
    }

makes the secret available as `/run/host/credentials/foo` and by
specifying

    LoadCredential=foo:foo

in `example.service`, the credential will be readable by the `ExecStart=`
inside `example.service` from `/run/credentials/example.service/foo`.

[1] https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#--private-users=

nixos/containers-next-imperative: init

sudo-nspawn: init

This is a slightly modified sudo enabling `--enable-static-sudoers`
which ensures that `sudoers.so` is linked statically into the
executable[1]:

>  --enable-static-sudoers
>        By default, the sudoers plugin is built and installed as a
>        dynamic shared object.  When the --enable-static-sudoers
>        option is specified, the sudoers plugin is compiled directly
>        into the sudo binary.  Unlike --disable-shared, this does
>        not prevent other plugins from being used and the intercept
>        and noexec options will continue to function.

This is necessary here because of user-namespaced `nspawn`-instances:
these have their own UID/GID-range. If a container called `ldap` has
`PrivateUsers=pick` enabled, this may look like this:

    $ ls /var/lib/machines
    drwxr-xr-x 15 vu-ldap-0  vg-ldap-0  15 Mar 11  2021 ldap
    -rw-------  1 root       root        0 Sep 12 16:13 .#ldap.lck
    $ id vu-ldap-0
    uid=1758003200(vu-ldap-0) gid=65534(nogroup) groups=65534(nogroup)

However, this means that bind-mounts (such as `/nix/store`) will be
owned by `nobody:nogroup` which is a problem for `sudo(8)` which expects
`sudoers.so` being owned by `root`.

To work around this, the aforementioned configure-flag will be used to
ensure that this library is statically linked into `bin/sudo` itself. We
cannot do a full static build though since `sudo(8)` still needs to
`dlopen(3)` various other libraries to function properly with PAM.

[1] https://www.sudo.ws/install.html

nixos/switch-to-configuration: fix a few problems with nspawn instances

Config activation of declarative containers used to be error-prone in
some cases:

* If a machine was powered off and had its config changed, the
  activation broke like this:

      systemd-nspawn@ldap.service is not active, cannot reload.

  The easiest workaround is to just skip inactive containers. The
  host-side configuration - i.e. the `nspawn`-unit and (optionally) the
  network configuration - is still activated and will be used on the
  next start.

* Sometimes, `systemd-nspawn@`-instances are marked to be started by the
  diffing-code. This should not happen since `systemd-nspawn@`-instances
  are now treated specially which means that these will only be started
  if they're newly added.

* If both `dbus.service` and an arbitrary container will be reloaded in
  the same transaction (i.e. in the same `systemctl reload`-call) this
  will freeze the system making it unreachable even via `ssh(1)` for
  about two minutes and leaving the following errors in the log:

      Sep 11 21:32:16 roflmayr systemd[1]: Reloading D-Bus System Message Bus.
      Sep 11 21:32:41 roflmayr dbus-send[1868379]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
      Sep 11 21:32:41 roflmayr systemd[1]: dbus.service: Control process exited, code=exited, status=1/FAILURE
      Sep 11 21:32:41 roflmayr systemd[1]: Reload failed for D-Bus System Message Bus.

  While I'm not entirely sure what's going on here, I realized that this
  issue disappears if all services that are scheduled for reload are
  processed before the containers. I guess that this avoids host-side
  system-services interfering with a container's system-manager.

nixos-nspawn: misc improvements & cleanups

This enhances the test-coverage of the script significantly and also
adds fixes for a few existing problems such as

* missing call-traces
* a spurious error when invoking the command without arguments

and cleans the code up a bit.

nixos/containers-next: move to subdir and factor out defaults for containers

This was done because imperative & declarative containers have a common
base configuration that was duplicated before, so moving it into a file
used by both facilities is better here.

To avoid cluttering the `virtualisation/`-subtree of NixOS too much, I
decided to create a new subdir for this.

nixos-nspawn: implement activation & networking

However only in a simplified manner - my main intention was to write a
replacement for the `containers`-module and this was just a side-effect,
so further features should be implemented by the community.

Basically, `nixos-nspawn` update now activates the config on its own,
but without support for `strategy = "dynamic";` to avoid having to
duplicate the Perl implementation here. Instead, either
`reload`/`restart`/`none` is the default and can be overridden with
`nixos-nspawn --reload` / `nixos-nspawn --restart`. Since this is a
completely manual change anyways, this is IMHO good-enough for now. The
same applies to `nixos-nspawn rollback`.

Also, the rendered `.network`-units now support addresses just like
declarative containers do with the exception of IPv6 SLAAC because I'd
have to imperatively change `radvd` for this which is out of scope[1].

Finally, the test was enhanced to cover more cases related to the new
features.

[1] Actually, this would introduce too much impurity anyways. Instead,
    `networkd` should implement IPv6  SLAAC for nspawn on its own so we
    can remove `radvd` and properly implement this here.

nixos/activation-scripts: turn off `var`-script for containers

It's already taken care of and only causes `permission denied`-errors
that make config activations seem failed even though they aren't.

Revert "nixos/activation-scripts: turn off `var`-script for containers"

This reverts commit 6f281b9ad31cf6d9ef396de788d06ea4e35f8112.

This is actually not a good idea since the `var`-activation-script is
actually the component that ensures that `/var/empty` exists which is
`$HOME` for quite a number of services.

nixos/containers-next: only create OS structure in `/var/lib/machines` if it doesn't exist

Because after that, this can screw with permissions if the container is
using a private user-namespace. This actually solves the activation
issues and the `var`-script can still be used in here.

nixos/tests/containers-next: add testcase for custom `ResolvConf`-setting

nixos/container-migration-test: confirm that nixos-container is still usable after switching to the new API

nixos/containers-next: assert that networkd is used

nixos/tests/containers-next-imperative: ensure that imperative containers can be powered off without state issues

nixos/tests/container-migration: fix eval

nixos/containers-next: fix eval

nixos/qemu-vm: increase /boot to 120M

Otherwise test-cases that install several NixOS generations into `/boot`
will fail with `No space left on device`.

nixos/container-migration: actually move state of containers

nixos/containers-next: fix test

nixos/containers-next: s/literalExample/literalExpression/g

nixos/useHostResolvConf: deprecate option

nixos/containers-next-imperative: fix test

* Don't use underscores in hostnames, this appears to break
  systemd-resolved now.
* Minor fixes for the test.

nixos/containers-next: fix `systemd-networkd-wait-online.service` hanging indefinetely

See NixOS#140669 (comment)
for further context.

Co-authored-by: Franz Pletz <fpletz@fnordicwalking.de>
Co-authored-by: zseri <zseri.devel@ytrizja.de>

nixos/containers-next: config -> system-config

nixos/containers-next: confirm that exposed hostnames also work for services like nginx

nixos/containers-next: review fixes

* Fix naming of migration test.
* Explain why `persistentBookDisk` is needed.
* Document that `jobset.nix` is only temporary and should be removed
  before merging.
* Remove superfluous `touch $out`.

sudo-nspawn: merge with `pkgs.sudo`

The feature can now be activated via `withStaticSudoers`. Also, the
patches aren't needed anymore since these are part of the current
`sudo`-release that's also in `nixpkgs`.

nixos-nspawn: refactor python setup

* Simplify shebangs
* Fix `python3`-inclusion on `nix-shell`-shebang
* Don't `flake8` the code on build.

Co-authored-by: Sandro <sandro.jaeckel@gmail.com>

nixos/qemu-vm: fix manual evaluation

containers-next: Support independent use of container-options.nix

containers-next: Add bindMounts option

containers-next: Dont shut down imperative containers during rebuild
  • Loading branch information
Ma27 authored and m1cr0man committed Dec 6, 2022
1 parent 6b5cf53 commit 42d1ba5
Show file tree
Hide file tree
Showing 22 changed files with 2,649 additions and 40 deletions.
18 changes: 18 additions & 0 deletions jobset.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# FIXME remove before merging!

{ nixpkgs }:
let
release = import ./nixos/release.nix {
supportedSystems = [ "x86_64-linux" ];
inherit nixpkgs;
};
in

{
container-tests = {
general = release.tests.containers-next;
migration = release.tests.containers-migration;
activation = release.tests.containers-config-activation;
imperative = release.tests.containers-next-imperative;
};
}
9 changes: 9 additions & 0 deletions nixos/lib/test-driver/test_driver/machine.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,15 @@ def systemctl(self, q: str, user: Optional[str] = None) -> Tuple[int, str]:
)
return self.execute("systemctl {}".format(q))

def wait_until_unit_stops(self, unit: str) -> None:
def wait_inactive(_: Any) -> bool:
info = self.get_unit_info(unit)
state = info["ActiveState"]
return state == "inactive"

with self.nested(f"waiting for unit '{unit}' to stop"):
retry(wait_inactive)

def require_unit_state(self, unit: str, require_state: str = "active") -> None:
with self.nested(
"checking if unit ‘{}’ has reached state '{}'".format(unit, require_state)
Expand Down
1 change: 1 addition & 0 deletions nixos/modules/module-list.nix
Original file line number Diff line number Diff line change
Expand Up @@ -1320,6 +1320,7 @@
./virtualisation/container-config.nix
./virtualisation/containerd.nix
./virtualisation/containers.nix
./virtualisation/containers-next
./virtualisation/nixos-containers.nix
./virtualisation/oci-containers.nix
./virtualisation/cri-o.nix
Expand Down
173 changes: 165 additions & 8 deletions nixos/modules/system/activation/switch-to-configuration.pl
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
use IPC::Cmd;
use Sys::Syslog qw(:standard :macros);
use Cwd qw(abs_path);
use experimental 'smartmatch';

## no critic(ControlStructures::ProhibitDeepNests)
## no critic(ErrorHandling::RequireCarping)
Expand Down Expand Up @@ -299,6 +300,11 @@ sub unrecord_unit {
return;
}

sub comp_array {
my ($a, $b) = @_;
return join("\0", @{$a}) eq join("\0", @{$b});
};

# Compare the contents of two unit files and return whether the unit
# needs to be restarted or reloaded. If the units differ, the service
# is restarted unless the only difference is `X-Reload-Triggers` in the
Expand All @@ -322,11 +328,6 @@ sub compare_units { ## no critic(Subroutines::ProhibitExcessComplexity)
SourcePath
);

my $comp_array = sub {
my ($a, $b) = @_;
return join("\0", @{$a}) eq join("\0", @{$b});
};

# Comparison hash for the sections
my %section_cmp = map { $_ => 1 } keys(%{$new_unit});
# Iterate over the sections
Expand Down Expand Up @@ -370,7 +371,7 @@ sub compare_units { ## no critic(Subroutines::ProhibitExcessComplexity)
}
my @new_value = @{$new_unit->{$section_name}{$ini_key}};
# If the contents are different, the units are different
if (not $comp_array->(\@cur_value, \@new_value)) {
if (not comp_array(\@cur_value, \@new_value)) {
# Check if only the reload triggers changed or one of the ignored keys
if ($section_name eq "Unit") {
if ($ini_key eq "X-Reload-Triggers") {
Expand Down Expand Up @@ -418,6 +419,84 @@ sub compare_units { ## no critic(Subroutines::ProhibitExcessComplexity)
return $ret;
}


sub compare_nspawn_units {
# Intentionally trying to keep this similar to compare_units.
my ($cur_unit, $new_unit) = @_;

# Keys to ignore
my %ignored_keys = map { $_ => 1 } qw(
Parameters
X-ActivationStrategy
);

# Comparison hash for the sections
my %section_cmp = map { $_ => 1 } keys(%{$new_unit});

# Iterate over the sections
foreach my $section_name (keys(%{$cur_unit})) {
# Missing section in the new unit.
if (not exists($section_cmp{$section_name})) {
return 1;
}

# Delete the key from the hashmap. Used later to determine
# if some sections exist in new unit but not in current unit.
delete $section_cmp{$section_name};

# Comparison hash for the section contents
my %ini_cmp = map { $_ => 1 } keys(%{$new_unit->{$section_name}});

# Iterate over the keys of the section
foreach my $ini_key (keys(%{$cur_unit->{$section_name}})) {
# Check that the key exists in the new unit, matches in value or is ignored.
if (
exists($ini_cmp{$ini_key}) and (
defined($ignored_keys{$ini_key})
or comp_array(
\@{$cur_unit->{$section_name}{$ini_key}},
\@{$new_unit->{$section_name}{$ini_key}}
)
)
) {
# Delete the key from the hashmap. Used later to determine
# if some keys exist in new unit but not in current unit,
# or they differ.
delete $ini_cmp{$ini_key};

} else {
# Key is missing or differs to the new unit.
return 1;
}
}

# Missing key(s) in the current unit.
# If they are not ignorable, a restart is required.
foreach my $ini_key (keys(%ini_cmp)) {
if (not defined($ignored_keys{$ini_key})) {
return 1;
}
}
}

# Missing section(s) in the current unit.
if (%section_cmp) {
return 1;
}
return 0;
}


my $active_containers = `machinectl list`;
sub is_container_running {
my ($name) = @_;
if (index($active_containers, $name) != -1) {
return 1;
}
return 0;
}


# Called when a unit exists in both the old systemd and the new system and the units
# differ. This figures out of what units are to be stopped, restarted, reloaded, started, and skipped.
sub handle_modified_unit { ## no critic(Subroutines::ProhibitManyArgs, Subroutines::ProhibitExcessComplexity)
Expand Down Expand Up @@ -495,6 +574,11 @@ sub handle_modified_unit { ## no critic(Subroutines::ProhibitManyArgs, Subroutin
}
}

# Skip the following logic for systemd-nspawn units
if (index($unit, "systemd-nspawn@") ne -1) {
return;
}

# If the unit is not socket-activated, record
# that this unit needs to be started below.
# We write this to a file to ensure that the
Expand Down Expand Up @@ -534,6 +618,60 @@ sub handle_modified_unit { ## no critic(Subroutines::ProhibitManyArgs, Subroutin
%units_to_reload = map { $_ => 1 }
split(/\n/msx, read_file($reload_list_file, err_mode => "quiet") // "");

# Handle nspawn unit changes
my @current_nspawn_units = glob("/etc/systemd/nspawn/*.nspawn");
my @new_nspawn_units = glob("$out/etc/systemd/nspawn/*.nspawn");
foreach my $new_unit_file (@new_nspawn_units) {
my $container_name = basename($new_unit_file);
$container_name =~ s/\.nspawn//;
my $unit_name = "systemd-nspawn\@$container_name.service";

my $cur_unit_file = $new_unit_file;
$cur_unit_file =~ s/^$out//;
if ($cur_unit_file ~~ @current_nspawn_units) {
my %new_unit_info = parse_unit($new_unit_file);
my $strategy = $new_unit_info{"Exec"}{"X-ActivationStrategy"}[0] // "dynamic";

# Skip comparison logic/restart check if ActivationStrategy is "none"
next if $strategy eq "none";

my %cur_unit_info = parse_unit($cur_unit_file);
my $changed = compare_nspawn_units(\%cur_unit_info, \%new_unit_info);

=pod Truth table for restarts
|Strategy|Changed|Reload|Restart|
|--------|-------|------|-------|
|Dynamic |0 |Y |- |
|Dynamic |1 |- |Y |
|Restart |0 |- |- |
|Restart |1 |- |Y |
|Reload |0 |Y |- |
|Reload |1 |Y |- |
=cut
if ($strategy ne "restart" and ($changed == 0 or $strategy eq "reload")) {
if (is_container_running($container_name) == 1) {
$units_to_reload{$unit_name} = 1;
}
} elsif ($changed == 1) {
$units_to_restart{$unit_name} = 1;
}
} else {
# Start the unit if it didn't exist before
$units_to_start{$unit_name} = 1;
}
}

# Stop all now removed nspawn containers
foreach my $cur_unit_file (@current_nspawn_units) {
my %cur_unit_info = parse_unit($cur_unit_file);
unless (-f "$out$cur_unit_file" || $cur_unit_info{"Exec"}{'X-Imperative'}[0] == "1") {
my $container_name = basename($_);
$container_name =~ s/\.nspawn//;
my $unit_name = "systemd-nspawn\@$container_name.service";
$units_to_stop{$unit_name} = 1;
}
}

my $active_cur = get_active_units();
while (my ($unit, $state) = each(%{$active_cur})) {
my $base_unit = $unit;
Expand Down Expand Up @@ -871,8 +1009,27 @@ sub filter_units {
# Reload units that need it. This includes remounting changed mount
# units.
if (scalar(keys(%units_to_reload)) > 0) {
print STDERR "reloading the following units: ", join(", ", sort(keys(%units_to_reload))), "\n";
system("$new_systemd/bin/systemctl", "reload", "--", sort(keys(%units_to_reload))) == 0 or $res = 4;
my @to_reload = sort(keys(%units_to_reload));
print STDERR "reloading the following units: ", join(", ", @to_reload), "\n";

# Reloading containers & dbus.service in the same transaction causes
# the system to stall for about 1 minute.
my (@services, @containers);
foreach my $s (@to_reload) {
if (index($s, "systemd-nspawn@") == 0) {
push @containers, $s;
} else {
push @services, $s;
}
}

if (scalar(@services) > 0) {
system("$new_systemd/bin/systemctl", "reload", "--", @services) == 0 or $res = 4;
}
if (scalar(@containers) > 0) {
system("@systemd@/bin/systemctl", "reload", "--", @containers) == 0 or $res = 4;
}

unlink($reload_list_file);
}

Expand Down
71 changes: 50 additions & 21 deletions nixos/modules/system/boot/systemd/nspawn.nix
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ with lib;

let
cfg = config.systemd.nspawn;

checkExec = checkUnitConfig "Exec" [
(assertOnlyFields [
"Boot" "ProcessTwo" "Parameters" "Environment" "User" "WorkingDirectory"
Expand All @@ -16,7 +15,7 @@ let
"LimitNOFILE" "LimitAS" "LimitNPROC" "LimitMEMLOCK" "LimitLOCKS"
"LimitSIGPENDING" "LimitMSGQUEUE" "LimitNICE" "LimitRTPRIO" "LimitRTTIME"
"OOMScoreAdjust" "CPUAffinity" "Hostname" "ResolvConf" "Timezone"
"LinkJournal" "Ephemeral" "AmbientCapability"
"LinkJournal" "Ephemeral" "AmbientCapability" "X-ActivationStrategy"
])
(assertValueOneOf "Boot" boolValues)
(assertValueOneOf "ProcessTwo" boolValues)
Expand Down Expand Up @@ -80,24 +79,57 @@ let
{manpage}`systemd.nspawn(5)` for details.
'';
};

extraDrvConfig = mkOption {
type = types.nullOr types.package;
default = null;
description = ''
Extra config for an nspawn-unit that is generated via `nix-build`.
This is necessary since nspawn doesn't support overrides in
<literal>/etc/systemd/nspawn</literal> natively and sometimes a derivation
is needed for configs (e.g. to determine all needed store-paths to bind-mount
into a machine).
'';
};
};

};

makeUnit' = name: def:
if def.extraDrvConfig == null || !def.enable
then pkgs.runCommand "nspawn-inst" { } ("cat ${makeUnit name def}/${shellEscape name} > $out")
else pkgs.runCommand "nspawn-${mkPathSafeName name}-custom"
{ preferLocalBuild = true;
allowSubstitutes = false;
} (let
name' = shellEscape name;
in ''
if [ ! -f "${def.extraDrvConfig}" ]; then
echo "systemd.nspawn.${name}.extraDrvConfig is not a file!"
exit 1
fi
touch $out
cat ${makeUnit name def}/${name'} > $out
cat ${def.extraDrvConfig} >> $out
'');

instanceToUnit = name: def:
let base = {
text = ''
[Exec]
${attrsToSection def.execConfig}
let
base = {
text = ''
[Exec]
${attrsToSection def.execConfig}
[Files]
${attrsToSection def.filesConfig}
[Files]
${attrsToSection def.filesConfig}
[Network]
${attrsToSection def.networkConfig}
'';
} // def;
in base // { unit = makeUnit name base; };
[Network]
${attrsToSection def.networkConfig}
'';
} // (filterAttrs (n: const (elem n optWhitelist)) def);
optWhitelist = [ "extraDrvConfig" "enable" ];
in makeUnit' name base;

in {

Expand All @@ -113,17 +145,14 @@ in {

config =
let
units = mapAttrs' (n: v: let nspawnFile = "${n}.nspawn"; in nameValuePair nspawnFile (instanceToUnit nspawnFile v)) cfg;
units = mapAttrs' (name: value: {
name = "systemd/nspawn/${name}.nspawn";
value.source = instanceToUnit "${name}.nspawn" value;
}) cfg;
in
mkMerge [
(mkIf (cfg != {}) {
environment.etc."systemd/nspawn".source = mkIf (cfg != {}) (generateUnits {
allowCollisions = false;
type = "nspawn";
inherit units;
upstreamUnits = [];
upstreamWants = [];
});
environment.etc = units;
})
{
systemd.targets.multi-user.wants = [ "machines.target" ];
Expand Down
Loading

0 comments on commit 42d1ba5

Please sign in to comment.