Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-diff: trivial diff #1

Closed
wants to merge 3 commits into from
Closed

nix-diff: trivial diff #1

wants to merge 3 commits into from

Conversation

lukegb
Copy link
Owner

@lukegb lukegb commented Dec 2, 2020

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@grahamc
Copy link

grahamc commented Dec 2, 2020

Fancy!

lukegb pushed a commit that referenced this pull request Mar 13, 2022
This effectively fixes the majority of all VM tests which were broken
because `/dev/vda` (or any other block device) wasn't mountable:

      machine # mounting /dev/vda on /...
      machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device[    2.820976] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
      machine # [    2.821757] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS
      machine # [    2.821757] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      machine # [    2.821757] Call Trace:
      machine # [    2.821757]  dump_stack+0x6b/0x83
      machine # [    2.821757]  panic+0x101/0x2c8
      machine # [    2.821757]  do_exit.cold+0x14/0xb3
      machine # [    2.821757]  do_group_exit+0x33/0xa0
      machine # [    2.821757]  __x64_sys_exit_group+0x14/0x20
      machine # [    2.821757]  do_syscall_64+0x33/0x40
      machine # [    2.821757]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      machine # [    2.821757] RIP: 0033:0x7f67ec2800f6
      machine # [    2.821757] Code: 00 4c 8b 0d 2c 5d 11 00 eb 19 66 2e 0f 1f 84 00 00 00 00 00 89 d7 89 f0 0f 05 48 3d 00 f0 ff ff 77 22 f4 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff ff 76 e2 f7 d8 64 41 89 01 eb da 66 2e 0f 1f 84 00
      machine # [    2.821757] RSP: 002b:00007fff8f5a71d8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      machine # [    2.821757] RAX: ffffffffffffffda RBX: 0000000000699704 RCX: 00007f67ec2800f6
      machine # [    2.821757] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
      machine # [    2.821757] RBP: 0000000000000004 R08: 00000000000000e7 R09: ffffffffffffff80
      machine # [    2.821757] R10: 00007f67ec33f3e0 R11: 0000000000000202 R12: 000000000000000b
      machine # [    2.821757] R13: 00007fff8f5a75a8 R14: 0000000000000000 R15: 00000000004fc198
      machine # [    2.821757] Kernel Offset: 0x31e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      machine # [    2.821757] Rebooting in 1 seconds..

This happened because the kernel failed to load modules such as `ext4`
from `boot.initrd.availableKernelModules`[1] on e.g. a `mount(2)` syscall.

The problem is that `kmod` isn't linked against `libpthread.so.0`
anymore because it got merged into `libc.so.6` (however, the .so still
exists), but still needs it:

      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # writev(2, [{iov_base="/nix/store/kdc9n48ksdc1a8y8w512w"..., iov_len=69}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="libpthread.so.0", iov_len=15}, {iov_base=": ", iov_len=2}, {iov_base="cy
      machine # ) = 184
      machine # exit_group(127)                         = ?
      machine # +++ exited with 127 +++
      machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device
      machine # [   19.167180] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
      machine # [   19.167711] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS

This is not a problem

* inside stage-1 because `LD_LIBRARY_PATH` points to `$out/lib` of
  extra-utils where `libpthread.so.6` also exists.
* on a running system because `${pkgs.glibc}/lib` is part of kmod's
  rpath.

However this is a problem inside the kernel which calls `modprobe` (in
our case `kmod`) to load modules and doesn't know about
`LD_LIBRARY_PATH`. Also, the rpath-reference was nuked.

To work around this, the kernel's `modprobe`
(i.e. `/proc/sys/kernel/modprobe`) now points to a wrapper which
explicitly declares `LD_LIBRARY_PATH`. We can't use `makeWrapper` here
because `modprobe` itself must not be renamed. Otherwise, `kmod` (which
is the link-target of `modprobe`) won't work because it expects
`argv[0] == "modprobe"` to perform modprobe's tasks.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-boot.initrd.availableKernelModules
@lukegb lukegb closed this Dec 5, 2022
@lukegb lukegb deleted the test-shellcheck branch December 5, 2022 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants