Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang / deadlock while compiling? #1011

Open
roy-work opened this issue Jun 11, 2021 · 5 comments
Open

Hang / deadlock while compiling? #1011

roy-work opened this issue Jun 11, 2021 · 5 comments

Comments

@roy-work
Copy link

We seem to be seeing a hang / deadlock while compiling with sccache:

62315 root       20   0 9682M  428M  7120 S  0.0  0.3  0:31.03 │     ├─ /usr/local/cargo/bin/sccache
43861 root       20   0  255M  117M 79048 S  0.0  0.1  0:00.80 │     │  ├─ /usr/local/rustup/toolchains/1.50.0-x86_64-unknown-linux-gnu/bin/rustc --crate-name actix_service --edition=2018 /usr/local/cargo/r
43563 root       20   0  427M  241M  145M S  0.0  0.2  0:01.27 │     │  ├─ /usr/local/rustup/toolchains/1.50.0-x86_64-unknown-linux-gnu/bin/rustc --crate-name parse_zoneinfo /usr/local/cargo/registry/src/-e
42975 root       20   0  631M  400M  142M S  0.0  0.3  0:10.65 │     │  ├─ /usr/local/rustup/toolchains/1.50.0-x86_64-unknown-linux-gnu/bin/rustc --crate-name regex /usr/local/cargo/registry/src/-e91f24c38b
38792 root       20   0  939M  693M 89236 S  0.0  0.5  0:30.71 │     │  ├─ /usr/local/rustup/toolchains/1.50.0-x86_64-unknown-linux-gnu/bin/rustc --crate-name log4rs --edition=2018 /usr/local/cargo/registry
36525 root       20   0  638M  426M  160M S  0.0  0.3  0:10.27 │     │  └─ /usr/local/rustup/toolchains/1.50.0-x86_64-unknown-linux-gnu/bin/rustc --crate-name tokio --edition=2018 /usr/local/cargo/registry/

All of these seem to be stuck on a pipe (FD 6, specifically; they're all writing to it, include sccache.)

Specifically, we can grab all the rustc PIDs:

/proc# pgrep rustc
36525
38792
42975
43563
43861

We can see that they're all stuck writing to FD 6:

Output
/proc# strace $(cd 36525/task; for d in *; do printf ' -p %s' "$d"; done)
strace: Process 36525 attached
strace: Process 36627 attached
strace: Process 43551 attached
strace: Process 43553 attached
[pid 43553] write(6, "|", 1 <unfinished ...>
[pid 43551] futex(0x7fbd66d5df4c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 36627] futex(0x7fbd700df9b8, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 36525] futex(0x7fbd6f97f9d0, FUTEX_WAIT, 15701, NULL^Cstrace: Process 36525 detached
 <detached ...>
strace: Process 36627 detached
strace: Process 43551 detached
strace: Process 43553 detached
/proc# strace $(cd 38792/task; for d in *; do printf ' -p %s' "$d"; done)
strace: Process 38792 attached
strace: Process 39121 attached
strace: Process 42643 attached
strace: Process 42648 attached
[pid 42648] write(6, "|", 1 <unfinished ...>
[pid 42643] futex(0x7fd5defe6d2c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 39121] futex(0x7fd5ea8df9b8, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 38792] futex(0x7fd5ea17f9d0, FUTEX_WAIT, 17910, NULL^Cstrace: Process 38792 detached
 <detached ...>
strace: Process 39121 detached
strace: Process 42643 detached
strace: Process 42648 detached
/proc# strace $(cd 42975/task; for d in *; do printf ' -p %s' "$d"; done)
strace: Process 42975 attached
strace: Process 43005 attached
strace: Process 43742 attached
strace: Process 43745 attached
[pid 43745] write(6, "|", 1 <unfinished ...>
[pid 43742] futex(0x7f4b877f958c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43005] futex(0x7f4b920df9b8, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 42975] futex(0x7f4b9197f9d0, FUTEX_WAIT, 21597, NULL^Cstrace: Process 42975 detached
 <detached ...>
strace: Process 43005 detached
strace: Process 43742 detached
strace: Process 43745 detached
/proc# strace $(cd 43563/task; for d in *; do printf ' -p %s' "$d"; done)
strace: Process 43563 attached
strace: Process 43722 attached
strace: Process 44076 attached
strace: Process 44077 attached
[pid 44077] write(6, "|", 1 <unfinished ...>
[pid 44076] futex(0x7fdb4c367dcc, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43722] futex(0x7fdb55adf958, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...>
[pid 43563] futex(0x7fdb5537f9d0, FUTEX_WAIT, 22132, NULL^Cstrace: Process 43563 detached
 <detached ...>
strace: Process 43722 detached
strace: Process 44076 detached
strace: Process 44077 detached
root@agent-9vgr9nLE-vm:/proc# strace $(cd 43861/task; for d in *; do printf ' -p %s' "$d"; done)
strace: Process 43861 attached
strace: Process 44024 attached
strace: Process 44196 attached
strace: Process 44197 attached
[pid 44197] write(6, "|", 1 <unfinished ...>
[pid 44196] futex(0x7f12af3c417c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 43861] futex(0x7f12b957f9d0, FUTEX_WAIT, 22386, NULL <unfinished ...>
[pid 44024] futex(0x7f12b9cdf9b8, FUTEX_WAIT_PRIVATE, 4294967295, NULL^Cstrace: Process 43861 detached
strace: Process 44024 detached
 <detached ...>
strace: Process 44196 detached
strace: Process 44197 detached

And that FD 6 is a pipe, and the same pipe:

/proc# ls -l {36525,38792,42975,43563,43861}/fd/6
l-wx------ 1 root root 64 Jun  3 18:38 36525/fd/6 -> 'pipe:[604355]'
l-wx------ 1 root root 64 Jun  3 18:38 38792/fd/6 -> 'pipe:[604355]'
l-wx------ 1 root root 64 Jun  3 18:38 42975/fd/6 -> 'pipe:[604355]'
l-wx------ 1 root root 64 Jun  3 18:38 43563/fd/6 -> 'pipe:[604355]'
l-wx------ 1 root root 64 Jun  3 18:38 43861/fd/6 -> 'pipe:[604355]'

It's also what sccache itself is stuck on:

# ps -ef | grep sccache
[snip]
root     62315 17967  0 18:35 ?        00:00:31 /usr/local/cargo/bin/sccache
# strace -p 62315
strace: Process 62315 attached
write(6, "|", 1^Cstrace: Process 62315 detached
 <detached ...>

I know that's not a lot to go on; I'm mostly hoping that pipe write looks familiar to someone & then it would be known from there what high-level system is implicated here. I'm not clear on how rustc even gets this pipe, though, since it doesn't seem like any argument would inform it of the existence of this pipe…?

@luser
Copy link
Contributor

luser commented Jun 14, 2021

This is the jobserver pipe, which is implemented in the jobserver crate:
https://github.com/alexcrichton/jobserver-rs/blob/5756d028d0fcf73e62ef8811674925c699b78a41/src/unix.rs#L29

@roy-work
Copy link
Author

Do you know why rustc would be writing to that? (That part seemed confusing to me; while I expected some IPC from sccache, seeing rustc waiting on the same pipe was surprising. I figured sccache would just kick off rustc & wait for the child to finish.)

@luser
Copy link
Contributor

luser commented Jun 17, 2021

rustc has internal parallelism for codegen, it uses the jobserver so it can let cargo dictate the maximum amount of parallelism when multiple rustc processes are running.

@roy-work
Copy link
Author

Ah. Do you know what is the mechanism by which it knows which FD the pipe is? (It doesn't seem to get passed in either the command line args or the environment, but perhaps I've missed something?)

@64
Copy link

64 commented May 31, 2024

Similar hang: #2145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants