Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious failures in sccache #40240

Closed
alexcrichton opened this issue Mar 3, 2017 · 22 comments · Fixed by #41623
Closed

Spurious failures in sccache #40240

alexcrichton opened this issue Mar 3, 2017 · 22 comments · Fixed by #41623
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason)

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Mar 3, 2017

Outstandling failures:

Failed to spawn child broken pipe

maybe fixed in #40809?

error: failed to execute compile
caused by: failed to spawn child
caused by: Broken pipe (os error 32)
make[3]: *** [bin/llvm-config] Error 2

Notes:

Error 254

Suspected bug in sccache. Hidden by mozilla/sccache#82 by accident. Sccache update in #40676

maybe fixed in #40809?

notes:

  • One job failed nul bytes in env, args, or program passed to Command when creating preprocessor. This seems crazy. Memory corruption?
  • OSX or Linux so far, no Windows.
  • Most indicate just one compilation failing, the last log indicates the entire server crashed w/ a segfault or something

Fixed failures

Unexpected EOF

Hoping to be fixed by mozilla/sccache#79

error: failed to execute compile
caused by: error reading compile response from server
caused by: IoError(Error { repr: Custom(Custom { kind: Other, error: StringError("unexpected EOF") }) })
caused by: unexpected EOF
make[3]: *** [tools/llvm-readobj/CMakeFiles/llvm-readobj.dir/ELFDumper.cpp.o] Error 2

Forcibly closed by remote host

Suspected fixed in mozilla/sccache@5f6932b

PR sent in #40676

error: failed to execute compile
caused by: Failed to send data to or receive data from server
caused by: IoError(Error { reprError { : Os { code: 10054repr: Os { code: 10054, message: "An existing connection was forcibly closed by the remote host." } })

notes:

  • Only windows so far
  • Both 32 and 64 bit
  • Ignore segfaults, they're likely Remove dependency on named_pipe mozilla/sccache#83
  • Both beginning and end of the build
  • appears that the server process is crashing
  • suspect, crossbeam segfaulting (there are known segfaults in there)

Wedged in build logs

wedged in mio, likely fixed in #40809

Errors not at the fault of sccache

ar.exe/ranlib errors

More tracked at #40546 suspected to be unrelated to sccache and instead related to MinGW makefiles (and how they're not always the best...)

Hoping to be fixed by #40548

@alexcrichton
Copy link
Member Author

First attempt to debug this landed in #40240, although that didn't cover OSX/Windows which seem to be failing a lot as well. Second attempt to debug this is #40442

@alexcrichton
Copy link
Member Author

I've made a PR to sccache to hopefully squash at least some of the spurious failures we've been seeing (I don't think it will fix all of them though)

@alexcrichton
Copy link
Member Author

The first failure above is getting an EPIPE on this line which baffles me. I have no idea how an EPIPE would arise during process creation.

@Mark-Simulacrum
Copy link
Member

Looking at the two lines above, there's Stdio::piped(), which implies that during process creation we're opening the stdout and stderr file descriptors as pipes, which may be the cause of these problems. Not sure if that's the case though.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 16, 2017
I've built a local copy with mozilla/sccache#79 and mozilla/sccache#78. Let's
see if that helps rust-lang#40240!
@alexcrichton
Copy link
Member Author

Looks like "broken pipe" errors are indeed coming from kevent, confirmed now that I've tested on an older mac. tokio-rs/mio#583 should fix.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 4, 2017
I've tracked down what I believe is the last spurious sccache failure on rust-lang#40240
to behavior in mio (tokio-rs/mio#583), and this commit updates the binaries to
a version which has that fix incorporated.
alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 5, 2017
I've tracked down what I believe is the last spurious sccache failure on rust-lang#40240
to behavior in mio (tokio-rs/mio#583), and this commit updates the binaries to
a version which has that fix incorporated.
frewsxcv added a commit to frewsxcv/rust that referenced this issue Apr 5, 2017
…sxcv

travis: Update sccache binaries

I've tracked down what I believe is the last spurious sccache failure on rust-lang#40240
to behavior in mio (tokio-rs/mio#583), and this commit updates the binaries to
a version which has that fix incorporated.
frewsxcv added a commit to frewsxcv/rust that referenced this issue Apr 5, 2017
…sxcv

travis: Update sccache binaries

I've tracked down what I believe is the last spurious sccache failure on rust-lang#40240
to behavior in mio (tokio-rs/mio#583), and this commit updates the binaries to
a version which has that fix incorporated.
@alexcrichton
Copy link
Member Author

Haven't seen any new issues in the past week or so, so assuming that we can close this now. Yay!

@TimNN
Copy link
Contributor

TimNN commented Apr 28, 2017

Again a broken pipe spawn :(

ERROR:sccache::server: [Error(Msg("failed to spawn child"), State { next_error: Some(Error { repr: Os { code: 32, message: "Broken pipe" } }) })] 	SearchForAddressOfSpecialSymbol.cpp.o
ERROR:sccache::server: [Error { repr: Os { code: 32, message: "Broken pipe" } }] 	SearchForAddressOfSpecialSymbol.cpp.o

https://travis-ci.org/rust-lang/rust/jobs/226778218

@TimNN TimNN reopened this Apr 28, 2017
@alexcrichton
Copy link
Member Author

Oh oops, the fix in mio wasn't actually released until mio 0.6.7 came out on crates.io. I was using a local override previously but when I recently rebuilt sccache I used the crates.io version and forgot that.

To fix that we need to update sccache's Cargo.lock to pull in an updated mio, make new binaries, upload them, and then make a PR.

@TimNN you scared me, my weekend was almost ruined! :)

alexcrichton added a commit to mozilla/sccache that referenced this issue Apr 28, 2017
Contains tokio-rs/mio#583 to fix tokio-rs/mio#582, transitively fixing
flakiness discovered in rust-lang/rust#40240.
@alexcrichton
Copy link
Member Author

I've updated sccache's mio, next step is binaries.

I'll be flying for the next ~20 hours and won't have the ability to do so until I land unfortunately :(

@alexcrichton
Copy link
Member Author

oops didn't mean to close

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 29, 2017
Pulls in mozilla/sccache@ef0d77543 to fix rust-lang#40240 again after the builds included
in rust-lang#41447 forgot to include the mio fixed included in rust-lang#41076.

Closes rust-lang#40240
bors added a commit that referenced this issue Apr 29, 2017
ci: Update sccache build

Pulls in mozilla/sccache@ef0d77543 to fix #40240 again after the builds included
in #41447 forgot to include the mio fixed included in #41076.

Closes #40240
@kennytm
Copy link
Member

kennytm commented Aug 9, 2017

There has been two sccache failure this week, both talking about invalid checksum.

43652 (failure on Linux):

ERROR:sccache::server: ["ARMFastISel.cpp.o"] fatal error: Error(Io(Error { repr: Custom(Custom { kind: Other, error: StringError("Invalid checksum") }) }), State { next_error: None })

43732 (failure on Windows):

ERROR:sccache::server: ["LoopInfo.cpp.obj"] fatal error: Error(Io(Error { repr: Custom(Custom { kind: Other, error: StringError("Invalid checksum") }) }), State { next_error: None })

Reopen?

@alexcrichton
Copy link
Member Author

I think this issue is probably no longer useful, @kennytm mind opening a new issue? You can cc mozilla/sccache#171, the upstream report tracking this.

@kennytm
Copy link
Member

kennytm commented Aug 9, 2017

@alexcrichton Opened #43775.

@alexcrichton
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants