Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
nixos/systemd-nspawn: Allow to modify .nspawn-units with a derivation
Sometimes it's needed to build a configuration within a `nix-build` for systemd units. While this is fairly easy for .service-units (where you can easily define overrides), it's not possible for `systemd-nspawn(1)`. This is mostly a hack to get dedicated bind-mounts of store paths from `pkgs.closureInfo` into the configuration without IFD. In the long term we either want to fix this in systemd or find a more suited solution though. nixos/containers-next: initialize first draft for new NixOS containers w/networkd This is the first batch of changes for a new container-module replacing the current `nixos-container`-subsystem in the longterm. The state in here is still strongly inspired by the `containers`[1]-module to declare declarative nspawn-instances by using NixOS config for the host and the container itself. For now, this module uses the tentative namespace `nixos.containers', but that's subject to change. This new module will also contain the following key-differences: * Rather than writing a big abstraction-layer on top, we'll rely on `.nspawn`-units[2]. This has the benefits that (1) we can stop adding options for each new nspawn-feature (such as MACVLANs, ephemeral instances, etc.) because it can be directly written into the `.nspawn`-unit using the module system like systemd.nspawn.foo.filesConfig = { BindReadOnly = /* ... */ }; Also, administrators don't need to learn too much about our abstractions, they only need to know a few basics about the module-system and how to write systemd units. * This feature strictly enforces `systemd-networkd` on both the container & the host. It can be turned off for containers in the host-namespace without a private network though. The reason for this is that the current `nixos-container` implementation has the long-standing bug that the container's uplink is broken *until* the container has booted since the host-side of the veth-pair is configured in `ExecStartPost=`[3]. This is, because there's no proper way to take care of it in an earlier stage since `systemd-nspawn` creates the interface itself. This has e.g. the implication that services inside the container wrongly assume that they connect to e.g. an external database via network (since `network{,-online}.target` was reached), however this is not the case due to the unconfigured host-side veth interface. However, when using `systemd-networkd(8)` on both sides, this is not the case anymore since systemd will automatially take care of configuring the network correctly when an nspawn unit starts and `networkd` is active. Apart from a basic draft, this also contains support for RFC1918 IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via SLAAC and `radvd(8)` including support for ephemeral containers. Further additions such as a better config-activation mechanism and a tool to manage containers imperatively will follow. [1] https://nixos.org/manual/nixos/stable/options.html#opt-containers [2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html# [3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240 nixos/containers-next: initialize first draft for new NixOS containers w/networkd This is the first batch of changes for a new container-module replacing the current `nixos-container`-subsystem in the longterm. The state in here is still strongly inspired by the `containers`[1]-module to declare declarative nspawn-instances by using NixOS config for the host and the container itself. For now, this module uses the tentative namespace `nixos.containers', but that's subject to change. This new module will also contain the following key-differences: * Rather than writing a big abstraction-layer on top, we'll rely on `.nspawn`-units[2]. This has the benefits that (1) we can stop adding options for each new nspawn-feature (such as MACVLANs, ephemeral instances, etc.) because it can be directly written into the `.nspawn`-unit using the module system like systemd.nspawn.foo.filesConfig = { BindReadOnly = /* ... */ }; Also, administrators don't need to learn too much about our abstractions, they only need to know a few basics about the module-system and how to write systemd units. * This feature strictly enforces `systemd-networkd` on both the container & the host. It can be turned off for containers in the host-namespace without a private network though. The reason for this is that the current `nixos-container` implementation has the long-standing bug that the container's uplink is broken *until* the container has booted since the host-side of the veth-pair is configured in `ExecStartPost=`[3]. This is, because there's no proper way to take care of it in an earlier stage since `systemd-nspawn` creates the interface itself. This has e.g. the implication that services inside the container wrongly assume that they connect to e.g. an external database via network (since `network{,-online}.target` was reached), however this is not the case due to the unconfigured host-side veth interface. However, when using `systemd-networkd(8)` on both sides, this is not the case anymore since systemd will automatially take care of configuring the network correctly when an nspawn unit starts and `networkd` is active. Apart from a basic draft, this also contains support for RFC1918 IPv4-addresses configured via DHCP and ULA-IPv6 addresses configured via SLAAC and `radvd(8)` including support for ephemeral containers. Further additions such as a better config-activation mechanism and a tool to manage containers imperatively will follow. [1] https://nixos.org/manual/nixos/stable/options.html#opt-containers [2] https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html# [3] https://github.com/NixOS/nixpkgs/blob/8b0f315b7691adcee291b2ff139a1beed7c50d94/nixos/modules/virtualisation/nixos-containers.nix#L189-L240 nixos/containers-next: implement small wrapper for nspawn port-forwards This exposes a given `containerPort` to the host address. So if port 80 from the container is forwarded to the host's port 8080 and the container uses `2001:DB8::42` and the host-side uses `2001:DB8::23` on the veth-interface, then `[2001:DB::42]:80` will be available on the host as `[2001:DB8::2]:8080`. nixos/containers-next: implement more advanced networking tests This change tests various combinations of static & dynamic addressing and also fixes a bug where `radvd(8)` was errorneously configured for veth-pairs where it's actually not needed. This test is also supposed to show how to use `systemd`-configuration to implement most of the features (for instance there's no custom set of options to implement MACVLANs) and serves as regression-test for future `systemd`-updates in NixOS. Please note that the `ndppd`-hack is only here because QEMU doesn't do proper IPv6 neighbour resolution. In fact, I left comments whenever some workarounds were needed for the testing-facility. nixos/tests/container-migration: init This test is supposed to demonstrate how to migrate a single container to the new subsystem. Of course, docs on how to rewrite config isn't written yet, this is mainly a POC to show that it's generally possible by * Deploying a new configuration (using `nixos.containers`) being equivalent to the old one. * Moving the state from `/var/lib/containers` to `/var/lib/machines`. * Rebooting the host - unfortunately - because otherwise `systemd-networkd` will reach an inconsistent state - at least with v247. For the reboot-part I also had to change the QEMU vm-builder a bit to actually support persistent boot-disks. nixos/containers-next: allow static configuration for a virtual zone as well This is already the case for dynamically assigned addresses (e.g. via SLAAC or DHCPv4) where `0.0.0.0/24` and `::/64` provides a pool of private IPs. However if such a zone is supposed to be fully static, the same should be possible as well. nixos/switch-to-configuration: import old config activation changes This is basically what I tried in NixOS#84608 at first - being able to reload or restart a container based on the NixOS-specific `re{load,start}IfChanged` options for systemd units, but with a few differences: * I switched back to using `nsenter(1)` from util-linux for the same rationale as in ebb6e38: without this, the activation would hang until a timeout is exceeded if the service-manager inside the container is reloaded. * I also disabled `systemd-networkd-wait-online.service` inside the container because it'd also hang even if the interfaces are configured properly. We should investigate how to fix it / if it was already fixed at some point. Also implemented a small test to ensure that a config-activation works fine, even with networking. nixos/containers-next: fix broken machinectl reboot and probably more It seems as systemd ignores `systemd-nspawn@` (the template unit) if an override exists and a custom unit for the service (i.e. `systemd-nspawn@containername.service`): [root@server:~]# systemctl status systemd-nspawn@ldap ● systemd-nspawn@ldap.service Loaded: loaded (/nix/store/rm4kigdbzl78iai8jfbgxbslvalk8bwa-unit-systemd-nspawn-ldap.service/systemd-nspawn@ldap.service; linked; vendor preset: enabled) Drop-In: /nix/store/fr9zabpvp3077cbb6jnpxm42qxqw9yk2-system-units/systemd-nspawn@.service.d └─overrides.conf Active: active (running) since Tue 2021-03-16 15:01:32 UTC; 23min ago This breaks at least `machinectl reboot` which needs `RestartForceExitStatus = 133` as setting. For now, I've added all settings to the module itself. nixos/switch-to-configuration: Implement more generic decisions for config activations in containers Actually, using `re{load,start}IfChanged` isn't the best decision for containers because some containers have to be reloaded or restarted depending on what has changed. For instance, a new bind-mount requires a `machinectl reboot`, but a change in the NixOS config only needs a `systemctl reload` (which runs `switch-to-configuration` inside the container). To model this, I decided to add four keywords and an option `activation.strategy` to declarative containers: * `strategy = "none"` means that the container will be entirely ignored by `switch-to-configuration`. * `strategy = "restart"` will always `machinectl reboot` the container if a change was detected. * `strategy = "reload"` will always `systemctl reload` the container if a change was detected. * `strategy = "dynamic"` will check what has changed inside the container. If only the NixOS config inside the container has changed, a reload will be scheduled, otherwise a restart. Always did a nearly full rewrite of the activation test to cover several corner-cases and combination of such settings. nixos/containers-next: add read-only `nixos.containers.rendered` option This option is an attr-set that maps containers to their NixOS configuration since `nixos.containers.instances` directly transforms the config to a NixOS derivation. Also, the raw `nixos.containers.instances` isn't really usable since it usually contains a list of chunks that are evaluated by the module-system. This is actually useful to introspect the configuration just as it's done with e.g. `resources.machines`[1] in nixops. For instance, I'm configuring my Prometheus scraping targets like this by gathering all active exporters in my machines and their containers: { config, lib, ... }: with lib; let containers = flip mapAttrsToList machine.nixos.containers.rendered (const (x: x.config)); in flip concatMap (attrValues containers) (c: flip concatMap (attrValues c.services.prometheus.exporters) (exporter: (optional exporter.enable "${config.networking.fqdn}:${toString exporter.port}"))) [1] https://nixos.mayflower.consulting/blog/2018/10/26/nixops-machine-configs/ nixos/all-tests: register tests Also add a `jobset.nix` to test this on my self-hosted Hydra (which btw uses this feature already :p). nixos/containers-next: make sure that the module works fine with `restrictedEval` being active This is necessary to get it running on my Hydra. nixos/containers-next: add test for SSH inside a nspawn machine Just another small testcase to confirm that the container's network works fine. nixos/containers-next: enable private users by default nixos/systemd-nspawn: make `/etc/systemd/nspawn` mutable Now only `/etc/systemd/nspawn/<name>.nspawn` will be a symlink rather than having the full directory as a symlink. This is actually consistent with `networkd` (both don't have alternate locations for transient units) and will become necessary when implementing imperative containers since these should also use nspawn units. nixos/containers-next: fix eval after 21.05 breaking changes `stdenv.lib` and `pkgs.utillinux` are deprecated now and cause an error when disallowing aliases (which is the default when evaluating nixpkgs). nixos-nspawn: init This is a first draft for imperative containers - basically a replacement for `nixos-container` - based on Python. It's still missing a few features, but is actually a working POC with the following key-differences: * Rather than Perl, Python is used now. While the choice of a language is always debatable, I'm pretty convinced that Python is easier to access than Perl and a lot more people are willing to write Python code (that's for instance the reason why the test-driver was eventually ported to Python). * Similar to `extra-container`[1], this also contains way more features than the stock `nixos-container` implementation. This is because we basically provide all options from `nixos.containers` and evaluate them after that. The additional configs (such as `activation`/`network`/etc) are rendered into JSON and can be read by the script to imperatively create `.nspawn` & `.network` units. [1] https://github.com/erikarvstedt/extra-container nixos/containers-next: implement proper user-namespacing support Now we're doing it correct user-namespacing here as well, for that a few filesystem-fixes had to be applied. For more context, please refer to NixOS#67336 Also credits go to the author of the aforementioned PR, I basically pulled these changes into this branch. nixos/containers-next: add support for `LoadCredential=` With user-namespacing set to `pick`[1], bind-mounts will always be owned by `nouser:nogroup`. This is a problem for secrets since these shouldn't be world-readable and with a `nouser:nogroup` from another user-namespace (the `root` inside container isn't an actual `root` anymore) the secrets would be unreadable. To work around this, `LoadCredential=` can be used. In fact, using `--load-credential` - unfortunately there's no switch for `.nspawn`-units - passes a secret into a container where it can be re-used by using the host's credential-ID as `path` in a `.service`-file inside the container. So basically { nixos.containers.instances.foo.credentials = [ { id = "foo"; path = "/run/secrets/foo";} ]; } makes the secret available as `/run/host/credentials/foo` and by specifying LoadCredential=foo:foo in `example.service`, the credential will be readable by the `ExecStart=` inside `example.service` from `/run/credentials/example.service/foo`. [1] https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#--private-users= nixos/containers-next-imperative: init sudo-nspawn: init This is a slightly modified sudo enabling `--enable-static-sudoers` which ensures that `sudoers.so` is linked statically into the executable[1]: > --enable-static-sudoers > By default, the sudoers plugin is built and installed as a > dynamic shared object. When the --enable-static-sudoers > option is specified, the sudoers plugin is compiled directly > into the sudo binary. Unlike --disable-shared, this does > not prevent other plugins from being used and the intercept > and noexec options will continue to function. This is necessary here because of user-namespaced `nspawn`-instances: these have their own UID/GID-range. If a container called `ldap` has `PrivateUsers=pick` enabled, this may look like this: $ ls /var/lib/machines drwxr-xr-x 15 vu-ldap-0 vg-ldap-0 15 Mar 11 2021 ldap -rw------- 1 root root 0 Sep 12 16:13 .#ldap.lck $ id vu-ldap-0 uid=1758003200(vu-ldap-0) gid=65534(nogroup) groups=65534(nogroup) However, this means that bind-mounts (such as `/nix/store`) will be owned by `nobody:nogroup` which is a problem for `sudo(8)` which expects `sudoers.so` being owned by `root`. To work around this, the aforementioned configure-flag will be used to ensure that this library is statically linked into `bin/sudo` itself. We cannot do a full static build though since `sudo(8)` still needs to `dlopen(3)` various other libraries to function properly with PAM. [1] https://www.sudo.ws/install.html nixos/switch-to-configuration: fix a few problems with nspawn instances Config activation of declarative containers used to be error-prone in some cases: * If a machine was powered off and had its config changed, the activation broke like this: systemd-nspawn@ldap.service is not active, cannot reload. The easiest workaround is to just skip inactive containers. The host-side configuration - i.e. the `nspawn`-unit and (optionally) the network configuration - is still activated and will be used on the next start. * Sometimes, `systemd-nspawn@`-instances are marked to be started by the diffing-code. This should not happen since `systemd-nspawn@`-instances are now treated specially which means that these will only be started if they're newly added. * If both `dbus.service` and an arbitrary container will be reloaded in the same transaction (i.e. in the same `systemctl reload`-call) this will freeze the system making it unreachable even via `ssh(1)` for about two minutes and leaving the following errors in the log: Sep 11 21:32:16 roflmayr systemd[1]: Reloading D-Bus System Message Bus. Sep 11 21:32:41 roflmayr dbus-send[1868379]: Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Sep 11 21:32:41 roflmayr systemd[1]: dbus.service: Control process exited, code=exited, status=1/FAILURE Sep 11 21:32:41 roflmayr systemd[1]: Reload failed for D-Bus System Message Bus. While I'm not entirely sure what's going on here, I realized that this issue disappears if all services that are scheduled for reload are processed before the containers. I guess that this avoids host-side system-services interfering with a container's system-manager. nixos-nspawn: misc improvements & cleanups This enhances the test-coverage of the script significantly and also adds fixes for a few existing problems such as * missing call-traces * a spurious error when invoking the command without arguments and cleans the code up a bit. nixos/containers-next: move to subdir and factor out defaults for containers This was done because imperative & declarative containers have a common base configuration that was duplicated before, so moving it into a file used by both facilities is better here. To avoid cluttering the `virtualisation/`-subtree of NixOS too much, I decided to create a new subdir for this. nixos-nspawn: implement activation & networking However only in a simplified manner - my main intention was to write a replacement for the `containers`-module and this was just a side-effect, so further features should be implemented by the community. Basically, `nixos-nspawn` update now activates the config on its own, but without support for `strategy = "dynamic";` to avoid having to duplicate the Perl implementation here. Instead, either `reload`/`restart`/`none` is the default and can be overridden with `nixos-nspawn --reload` / `nixos-nspawn --restart`. Since this is a completely manual change anyways, this is IMHO good-enough for now. The same applies to `nixos-nspawn rollback`. Also, the rendered `.network`-units now support addresses just like declarative containers do with the exception of IPv6 SLAAC because I'd have to imperatively change `radvd` for this which is out of scope[1]. Finally, the test was enhanced to cover more cases related to the new features. [1] Actually, this would introduce too much impurity anyways. Instead, `networkd` should implement IPv6 SLAAC for nspawn on its own so we can remove `radvd` and properly implement this here. nixos/activation-scripts: turn off `var`-script for containers It's already taken care of and only causes `permission denied`-errors that make config activations seem failed even though they aren't. Revert "nixos/activation-scripts: turn off `var`-script for containers" This reverts commit 6f281b9ad31cf6d9ef396de788d06ea4e35f8112. This is actually not a good idea since the `var`-activation-script is actually the component that ensures that `/var/empty` exists which is `$HOME` for quite a number of services. nixos/containers-next: only create OS structure in `/var/lib/machines` if it doesn't exist Because after that, this can screw with permissions if the container is using a private user-namespace. This actually solves the activation issues and the `var`-script can still be used in here. nixos/tests/containers-next: add testcase for custom `ResolvConf`-setting nixos/container-migration-test: confirm that nixos-container is still usable after switching to the new API nixos/containers-next: assert that networkd is used nixos/tests/containers-next-imperative: ensure that imperative containers can be powered off without state issues nixos/tests/container-migration: fix eval nixos/containers-next: fix eval nixos/qemu-vm: increase /boot to 120M Otherwise test-cases that install several NixOS generations into `/boot` will fail with `No space left on device`. nixos/container-migration: actually move state of containers nixos/containers-next: fix test nixos/containers-next: s/literalExample/literalExpression/g nixos/useHostResolvConf: deprecate option nixos/containers-next-imperative: fix test * Don't use underscores in hostnames, this appears to break systemd-resolved now. * Minor fixes for the test. nixos/containers-next: fix `systemd-networkd-wait-online.service` hanging indefinetely See NixOS#140669 (comment) for further context. Co-authored-by: Franz Pletz <fpletz@fnordicwalking.de> Co-authored-by: zseri <zseri.devel@ytrizja.de> nixos/containers-next: config -> system-config nixos/containers-next: confirm that exposed hostnames also work for services like nginx nixos/containers-next: review fixes * Fix naming of migration test. * Explain why `persistentBookDisk` is needed. * Document that `jobset.nix` is only temporary and should be removed before merging. * Remove superfluous `touch $out`. sudo-nspawn: merge with `pkgs.sudo` The feature can now be activated via `withStaticSudoers`. Also, the patches aren't needed anymore since these are part of the current `sudo`-release that's also in `nixpkgs`. nixos-nspawn: refactor python setup * Simplify shebangs * Fix `python3`-inclusion on `nix-shell`-shebang * Don't `flake8` the code on build. Co-authored-by: Sandro <sandro.jaeckel@gmail.com> nixos/qemu-vm: fix manual evaluation containers-next: Support independent use of container-options.nix containers-next: Add bindMounts option containers-next: Dont shut down imperative containers during rebuild
- Loading branch information