Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android builder runs out of memory #18116

Closed
jdm opened this issue Aug 16, 2017 · 10 comments
Closed

Android builder runs out of memory #18116

jdm opened this issue Aug 16, 2017 · 10 comments

Comments

@jdm
Copy link
Member

@jdm jdm commented Aug 16, 2017

The first time this was observed was while trying to merge #17891 on 8/15 at 7:39am PDT. This looks like:

rustc: /checkout/src/llvm/lib/Support/SmallVector.cpp:36: void llvm::SmallVectorBase::grow_pod(void*, size_t, size_t): Assertion `NewElts && "Out of memory"' failed.

or

fatal runtime error: allocator memory exhausted
error: Could not compile `script`.

or

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335:20
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:380
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:396
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:610
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:571
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:521
   7: rust_begin_unwind
             at /checkout/src/libstd/panicking.rs:497
   8: core::panicking::panic_fmt
             at /checkout/src/libcore/panicking.rs:71
   9: core::panicking::panic
             at /checkout/src/libcore/panicking.rs:51
  10: __rust_maybe_catch_panic
             at /checkout/src/libcore/macros.rs:32
             at /checkout/src/libpanic_unwind/gcc.rs:101
             at /checkout/src/libpanic_unwind/lib.rs:101
  11: <F as alloc::boxed::FnBox<A>>::call_box
  12: std::sys::imp::thread::Thread::new::thread_start
             at /checkout/src/liballoc/boxed.rs:692
             at /checkout/src/libstd/sys_common/thread.rs:21
             at /checkout/src/libstd/sys/unix/thread.rs:84
  13: start_thread
  14: clone
fatal runtime error: failed to initiate panic, error 5
error: Could not compile `script`.
@highfive
Copy link

@highfive highfive commented Aug 16, 2017

@jdm
Copy link
Member Author

@jdm jdm commented Aug 16, 2017

I suspect this is a regression from the most recent Rust upgrade (#18046) which finished merging at 7:38am PDT on 8/15.

@larsbergstrom
Copy link
Contributor

@larsbergstrom larsbergstrom commented Aug 16, 2017

8GB was already cutting it close - we should probably upgrade regardless:
servo/saltfs#711

But there's likely a rust bug there too.

@jdm
Copy link
Member Author

@jdm jdm commented Aug 16, 2017

This gives us a rustc regression range between rust-lang/rust@599be0d and rust-lang/rust@13d94d5.

@larsbergstrom
Copy link
Contributor

@larsbergstrom larsbergstrom commented Aug 16, 2017

@jdm
Copy link
Member Author

@jdm jdm commented Aug 16, 2017

14:02 <acrichto> jdm: hm good question! is it "easy" to run rustc on each day to see where it appeared?
14:03 <acrichto> jdm: or is it detectable via -Ztime-passes?
14:03 <acrichto> jdm: if you've got the date range I can also check the perf server to see if anything is awry
14:03 <jdm> acrichto: well, the symptom is that rustc OOMs while building the script crate, but not every single time
14:03 <jdm> acrichto: https://github.com/servo/servo/issues/18116
14:03 <acrichto> eh that's fine, we can still try to detect it
14:04 <jdm> acrichto: the date range is 7/26 and 8/10
14:04 <acrichto> hm
14:05 <acrichto> so here's the graph for that range -- http://perf.rust-lang.org/graphs.html?start=2017-07-26T00%3A00%3A00%2B00%3A00&end=2017-08-12T00%3A00%3A00%2B00%3A00&crates=%7B%22list%22%3A%22All%22%2C%22content%22%3Anull%7D&phases=%7B%22list%22%3A%22All%22%2C%22content%22%3Anull%7D&group_by=crate&type=rss&yaxis=pct
14:05 <acrichto> ish
14:05 <acrichto> the perf server is being very wonky right now
14:05 <acrichto> do you have any idea where it's faulting?
14:05 <acrichto> you may be getting hit by https://github.com/rust-lang/rust/pull/43506
14:06 <jdm> acrichto: no; it's either a general "allocator memory exhausted" or an LLVM assertion
14:06 <acrichto> hm ok an llvm assertion means we're in trans
14:06 <acrichto> which makes that pr highly suspect
14:06 <jdm> acrichto: the version of rustc that is symptomatic includes #43506, yes.
14:06 <acrichto> well, are you building with codegen units or incremental at all?
14:06 jdm checks
14:09 <jdm> acrichto: it looks like we're using codegen-units=4
14:09 <acrichto> hm ok
14:09 <acrichto> mw: you may be interested in this
14:09 <acrichto> mw: so servo's getting OOM "recently" tracked at https://github.com/servo/servo/issues/18116
14:10 <jdm> as a data point, our nightly android build that does not use codegen-units did not fail
14:10 <acrichto> mw: which I suspect may be related to the "async llvm" patch -- https://github.com/rust-lang/rust/pull/43506
14:10 <acrichto> ah yeah that seems pretty relevant
14:10 <jdm> however, that's a single build vs. a bunch of builds, so it's not a smoking gun
14:10 <acrichto> this is specifically failing when using codgen-units
14:10 <acrichto> mw: my impression was that async-llvm would reduce memory consumption, no?
14:10 <acrichto> not increase it?
14:10 <acrichto> although w/ only 4 codegen units it in theory wouldn't reduce it that much
14:10 <acrichto> nor would it spike it that much...
@jdm
Copy link
Member Author

@jdm jdm commented Aug 16, 2017

Two possibilities to test this theory - we could remove the default codegen-units=4 from Cargo.toml, or we could switch the android builder to use --release. Either of these should allow us to determine if the codegen-units are making the memory usage significantly worse for us.

@glennw
Copy link
Member

@glennw glennw commented Aug 16, 2017

@jdm I know that @kvark was recently experiencing severe memory usage issues compiling Servo locally, and I believe these were resolved by disabling codegen units...

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Aug 17, 2017

@larsbergstrom suggested looking at each builder from http://build.servo.org/buildslaves . Indeed, servo-linux-cross1 was consistently the one failing in recent builds. It had a salt-minion process consuming ~100% CPU and 1.7 GB RAM. I rebooted it, let’s see if that helps.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Aug 17, 2017

#18073 has landed, so I think this is solved. Feel free to reopen if appropriate.

@SimonSapin SimonSapin closed this Aug 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.