Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal runtime error: assertion failed: output.write(&bytes).is_ok() #125952

Open
dpc opened this issue Jun 4, 2024 · 6 comments
Open

fatal runtime error: assertion failed: output.write(&bytes).is_ok() #125952

dpc opened this issue Jun 4, 2024 · 6 comments
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example O-NixOS Operating system: NixOS, https://nixos.org/ T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@dpc
Copy link
Contributor

dpc commented Jun 4, 2024

So I was recompiling a typescript nextjs project in a nix derivation, that previously worked (Nix builds are kind of reproducible, so that's very unexpected) and it failed with a weird error:

guardian-ui> @fedimint/types:build: cache miss, executing a873a14720ccc5d2
guardian-ui>  WARNING  passwd database shell="/noshell" which is not executable (ENOENT: No such file or directory), falling back to /bin/sh
guardian-ui> @fedimint/types:build: fatal runtime error: assertion failed: output.write(&bytes).is_ok()
guardian-ui> @fedimint/types:build:
guardian-ui>  Tasks:    0 successful, 1 total
guardian-ui> Cached:    0 cached, 1 total
guardian-ui>   Time:    1.095s

I suspect my kernel version might be different because I just upgraded to NixOS 24.05 recently.

I traced this panic to

rtassert!(output.write(&bytes).is_ok());

I got a strace output:

guardian-ui> [pid   675] setsid()                    = 675
guardian-ui> [pid   675] ioctl(0, TIOCSCTTY, 0)      = 0
guardian-ui> [pid   675] open("/dev/fd", O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = 17
guardian-ui> [pid   675] fcntl(17, F_SETFD, FD_CLOEXEC) = 0
guardian-ui> [pid   675] getdents64(17, 0x7ffff7b2f4b8 /* 22 entries */, 2048) = 528
guardian-ui> [pid   675] getdents64(17, 0x7ffff7b2f4b8 /* 0 entries */, 2048) = 0
guardian-ui> [pid   675] close(5)                    = 0                                                                                   (6 results) 20:25:45 [2127/32726]
guardian-ui> [pid   675] close(6)                    = 0
guardian-ui> [pid   675] close(7)                    = 0
guardian-ui> [pid   675] close(8)                    = 0
guardian-ui> [pid   675] close(9)                    = 0
guardian-ui> [pid   675] close(10)                   = 0
guardian-ui> [pid   675] close(11)                   = 0
guardian-ui> [pid   675] close(12)                   = 0
guardian-ui> [pid   675] close(13)                   = 0
guardian-ui> [pid   675] close(14)                   = 0
guardian-ui> [pid   675] close(15)                   = 0
guardian-ui> [pid   675] close(16)                   = 0
guardian-ui> [pid   675] close(17)                   = -1 EBADF (Bad file descriptor)
guardian-ui> [pid   675] close(18)                   = 0
guardian-ui> [pid   601] <... recvfrom resumed>"", 8, 0, NULL, NULL) = 0
guardian-ui> [pid   675] close(24 <unfinished ...>
guardian-ui> [pid   601] close(17 <unfinished ...>
guardian-ui> [pid   675] <... close resumed>)        = 0
guardian-ui> [pid   675] execve("/build/yarn--1717471544555-0.05173506673993855/yarn", ["/build/yarn--1717471544555-0.051"..., "run", "build"], 0x7ffff7e5b1a0 /* 135 vars *
/ <unfinished ...>
guardian-ui> [pid   601] <... close resumed>)        = 0
guardian-ui> [pid   675] <... execve resumed>)       = -1 ETXTBSY (Text file busy)
guardian-ui> [pid   675] write(18, "\0\0\0\32NOEX", 8) = -1 EBADF (Bad file descriptor)
guardian-ui> [pid   675] write(2, "fatal runtime error: assertion f"..., 68) = 68
guardian-ui> [pid   675] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
guardian-ui> [pid   675] tkill(675, SIGABRT)         = 0
guardian-ui> [pid   675] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
guardian-ui> [pid   675] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=675, si_uid=1000} ---
guardian-ui> [pid   601] close(14)                   = 0
guardian-ui> [pid   601] close(15)                   = 0
guardian-ui> [pid   601] close(16)                   = 0
guardian-ui> [pid   601] fcntl(11, F_DUPFD_CLOEXEC, 0) = 14
guardian-ui> [pid   601] fcntl(14, F_SETFD, FD_CLOEXEC) = 0
guardian-ui> [pid   601] fcntl(11, F_DUPFD_CLOEXEC, 0) = 15
guardian-ui> [pid   601] fcntl(15, F_SETFD, FD_CLOEXEC) = 0

I'm not sure where to even report it, and a bit tired to dig deeper. Creating the issue just for reference.

The whole thing can be reproduced with:

nix build 'github:fedimint/ui?rev=1fc0cc6322f4ebb0f0854cd870b79c9971ff4b34#guardian-ui'

I'm going to try it on some machines and see when it fails and when works.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 4, 2024
@saethlin saethlin added O-NixOS Operating system: NixOS, https://nixos.org/ T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 4, 2024
@dpc
Copy link
Contributor Author

dpc commented Jun 4, 2024

On a machine with Ubuntu and nix I had around with Linux 5.15.0-101-generic, this works. On my two systems with new NixOS and Linux 6.9.2 it fails.

@dpc
Copy link
Contributor Author

dpc commented Jun 4, 2024

I have verified that downgrading to linux kernel 6.8.11 makes the problem go away.

@jieyouxu jieyouxu added S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example C-bug Category: This is a bug. and removed S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 4, 2024
@tbu-
Copy link
Contributor

tbu- commented Jun 4, 2024

Nix builds are kind of reproducible, so that's very unexpected

Nix does not take kernel version into account in its reproducibility guarantees.

@dpc
Copy link
Contributor Author

dpc commented Jun 4, 2024

Yes, everything else is locked in place (kind of). That's why I immediately suspected the kernel might be a problem.

So to sum up: something about very recent linux kernel version is breaking some assumptions in Rust standard code w.r.t forking/execing, which leads to this internal panic. It's hard for me to tell is it a kernel regression, or Rust's stdlib assumptions were incorrect, or maybe I'm missing something else entirely.

I am happening to witness it because I'm running as recent kernel version as NixOS can provide trying to avoid some bcachefs bugs. With time the problem might become more widespread.

@workingjubilee
Copy link
Contributor

Ah, a relatively small diff, then! Should be easy to find the offending commit. torvalds/linux@f610c35...c8eef17

@dpc
Copy link
Contributor Author

dpc commented Jun 5, 2024

image

🤔 , dozens of rebuild + reboot cycles... . I'll see if I can find a time to do it. No promises. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example O-NixOS Operating system: NixOS, https://nixos.org/ T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants