Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cascading failure on bots when interrupted when performing git operations #34595

Closed
alexcrichton opened this issue Jul 1, 2016 · 4 comments
Closed
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@alexcrichton
Copy link
Member

This is basically a best guess as to what's happening on the bots, but from time to time it appears that the build directory gets entirely corrupted with error messages like:

extracting C:\bot\slave\auto-win-msvc-64-opt-rustbuild\build\obj\build\cache\2016-05-24\rust-std-beta-x86_64-pc-windows-msvc.tar.gz
extracting C:\bot\slave\auto-win-msvc-64-opt-rustbuild\build\obj\build\cache\2016-05-24\rustc-beta-x86_64-pc-windows-msvc.tar.gz
extracting C:\bot\slave\auto-win-msvc-64-opt-rustbuild\build\obj\build\cache\2016-05-22\cargo-nightly-x86_64-pc-windows-msvc.tar.gz
Synchronizing submodule url for 'src/compiler-rt'
Synchronizing submodule url for 'src/jemalloc'
Synchronizing submodule url for 'src/liblibc'
Synchronizing submodule url for 'src/llvm'
Synchronizing submodule url for 'src/rt/hoedown'
Synchronizing submodule url for 'src/rust-installer'
fatal: Needed a single revision


command did not execute successfully: "git" "submodule" "update"
expected success, got: exit code: 1


Makefile:23: recipe for target 'all' failed
>> rustjob: found 0 remaining processes
Unable to find current revision in submodule path 'src/llvm'
make: *** [all] Error 1

This happened on a MSVC rustbuild bot, but I believe I've seen this before on basically all bots (but I feel the windows bots are affected more often).

My best guess as to what's happening here is:

  • A build is running, and it's updating the LLVM submodule
  • The build is canceled for any number of a few reasons
  • The LLVM update process is interrupted, leaving the git directory in a corrupt state
  • All future builds attempt to use this corrupt directory and fail

I... don't actually know what the best solution here is. May as well track it though!

@alexcrichton alexcrichton added the A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) label Jul 1, 2016
@alexcrichton
Copy link
Member Author

FWIW the fix now is to log into the bots and delete the obj directory and restart everything.

@alexcrichton
Copy link
Member Author

Correction, fix is to remove src/llvm, not obj ...

@alexcrichton
Copy link
Member Author

This unfortunately has happened quite a bit recently. I believe it's isolated to rustbuild currently because the ./configure step (which is where the makefiles deal with submodules) is an uninterruptible step in buildbot. That at least means I think the fix only needs to be in rustbuild!

The only fix here that I can think of is that we detect this failure, then detect where it failed, blow away the entire subdirectory if it exists, then try the whole operation again. That way it can automatically recover from a corrupt directory.

Would be a great way to jump into rustbuild if anyone's interested!

@alexcrichton
Copy link
Member Author

Buildbots are almost gone now, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

No branches or pull requests

1 participant