New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen_llvm_back: improve allocations #55871

Merged
merged 1 commit into from Dec 4, 2018

Conversation

Projects
None yet
8 participants
@ljedrz
Copy link
Contributor

ljedrz commented Nov 11, 2018

This commit was split out from #54864. Last time it was causing an LLVM OOM, which was most probably caused by not collecting the globals.

  • preallocate vectors of known length
  • extend instead of append where the argument is consumable
  • turn 2 push loops into extends
  • create a vector from a function producing one instead of using extend_from_slice on it
  • consume modules when no longer needed
  • return an impl Iterator from generate_lto_work
  • don't collect globals, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Nov 11, 2018

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 11, 2018

@bors try

Lets do a perf run :)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 11, 2018

⌛️ Trying commit dc1b2c7 with merge 684fb37...

bors added a commit that referenced this pull request Nov 11, 2018

Auto merge of #55871 - ljedrz:llvm_back_allocations, r=<try>
codegen_llvm_back: improve allocations

This commit was split out from #54864. Last time it was causing an LLVM OOM, presumably due to aggressive preallocation strategy in `thin_lto`.

This time preallocations are more cautious and there are a few additional memory-related improvements (last 3 points from the list below).

- _gently_ preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- return an `impl Iterator` from `generate_lto_work`
- don't `collect` `globals`, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 11, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 11, 2018

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Nov 11, 2018

Success: Queued 684fb37 with parent b76ee83, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Nov 11, 2018

Finished benchmarking try commit 684fb37

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 11, 2018

It seems that both instruction counts and max-rss have suffered a fair hit, while for most benchmarks the minimum rss has also dropped significantly. Essentially this means we have increased deviation, without a clear win in mean rss.

@ljedrz ljedrz force-pushed the ljedrz:llvm_back_allocations branch from dc1b2c7 to 9e8bafc Nov 12, 2018

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 12, 2018

The reds from instructions were in the high-variance benchmarks, so they could be just noise. The one that is not marked with a ? and had a significant change seems to be style-servo-opt (which is a huge piece of benchmark) that went down by 2.5% in the clean build. As for max-rss, true - the variance has increased; there are some weird results.

Since there do seem to be possible wins with some of these changes (I'd like to get those minimums from max-rss), I have a suggestion - I have removed the parts of the commit that potentially reserve high capacities (those were also the risky ones in terms of LLVM OOM) and we could rerun perf. The queue is empty anyway :).

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 12, 2018

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 12, 2018

⌛️ Trying commit 9e8bafc with merge f7360e5...

bors added a commit that referenced this pull request Nov 12, 2018

Auto merge of #55871 - ljedrz:llvm_back_allocations, r=<try>
codegen_llvm_back: improve allocations

This commit was split out from #54864. Last time it was causing an LLVM OOM, presumably due to aggressive preallocation strategy in `thin_lto`.

This time preallocations are more cautious and there are a few additional memory-related improvements (last 3 points from the list below).

- _gently_ preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- return an `impl Iterator` from `generate_lto_work`
- don't `collect` `globals`, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 12, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 12, 2018

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Nov 12, 2018

Success: Queued f7360e5 with parent 0195812, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Nov 12, 2018

Finished benchmarking try commit f7360e5

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 12, 2018

Since the changes at this point are all pretty harmless I'd say that the benchmark results are statistical noise.

That being said, these changes don't seem to be beneficial performance-wise, so I'm ok with closing the PR, unless you believe that they are a readability improvement / more idiomatic.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 12, 2018

Uhh, no matter how I look at it, the max-rss results still seem like a hit-or-miss.

@nikic

This comment has been minimized.

Copy link
Contributor

nikic commented Nov 12, 2018

@nagisa max-rss has very high variance. Here's how it looks for a random recent commit: http://perf.rust-lang.org/compare.html?start=ca79ecd6940e30d4b2466bf378632efcdf5745c7&end=775eab58835f9bc0f1f01ccbb79725dce0c73b51&stat=max-rss The results for this PR seem to be within the "usual" noise.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 12, 2018

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 12, 2018

📌 Commit 9e8bafc has been approved by nagisa

pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 15, 2018

Rollup merge of rust-lang#55871 - ljedrz:llvm_back_allocations, r=nagisa
codegen_llvm_back: improve allocations

This commit was split out from rust-lang#54864. Last time it was causing an LLVM OOM, presumably due to aggressive preallocation strategy in `thin_lto`.

This time preallocations are more cautious and there are a few additional memory-related improvements (last 3 points from the list below).

- _gently_ preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- return an `impl Iterator` from `generate_lto_work`
- don't `collect` `globals`, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.

pietroalbini added a commit to pietroalbini/rust that referenced this pull request Nov 15, 2018

Rollup merge of rust-lang#55871 - ljedrz:llvm_back_allocations, r=nagisa
codegen_llvm_back: improve allocations

This commit was split out from rust-lang#54864. Last time it was causing an LLVM OOM, presumably due to aggressive preallocation strategy in `thin_lto`.

This time preallocations are more cautious and there are a few additional memory-related improvements (last 3 points from the list below).

- _gently_ preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- return an `impl Iterator` from `generate_lto_work`
- don't `collect` `globals`, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.

bors added a commit that referenced this pull request Nov 16, 2018

Auto merge of #55990 - pietroalbini:rollup, r=pietroalbini
Rollup of 16 pull requests

Successful merges:

 - #55527 (Implement checked_add_duration for SystemTime)
 - #55564 (test/linkage-visibility: Ignore on musl targets)
 - #55672 (miri: accept extern types in structs if they are the only field)
 - #55827 (A few tweaks to iterations/collecting)
 - #55834 (Forward the ABI of the non-zero sized fields of an union if they have the same ABI)
 - #55867 (do not panic just because cargo failed)
 - #55871 (codegen_llvm_back: improve allocations)
 - #55894 (miri enum discriminant handling: Fix treatment of pointers, better error when it is undef)
 - #55916 (Make miri value visitor usfeful for mutation)
 - #55921 (Add placeholder types)
 - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs)
 - #55949 (ty: return impl Iterator from Predicate::walk_tys)
 - #55952 (Update to Clang 7 on CI.)
 - #55953 (#53488 Refactoring UpvarId)
 - #55962 (rustdoc: properly calculate spans for intra-doc link resolution errors)
 - #55963 (Stress test for MPSC)

Failed merges:

r? @ghost
@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Nov 16, 2018

@bors r-

Still OOMing on AppVeyor.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 16, 2018

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 16, 2018

Maybe, but how? These changes shouldn't negatively impact allocations - IMO they should make them easier to optimize.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 16, 2018

IMO they should make them easier to optimize.

The optimizability is irrelevant if it is indeed what I think it is. And is hardly related to the number of allocations, but rather to the size of them. Alas, rust perf does not collect the information of interest to tell for sure.

It would be interesting to do a perf run (and they really should be run that way all the time) with overcommit disabled.

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 16, 2018

hardly related to the number of allocations

As long as the length of the iterator is known, changing a push loop to an extend reduces the number of allocations to just one.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Nov 17, 2018

☔️ The latest upstream changes (presumably #55627) made this pull request unmergeable. Please resolve the merge conflicts.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Nov 17, 2018

I’m not sure we’re talking about the same thing. Even though the following snippet would OOM on windows, it would work just fine on UNIXes due to overcommit:

auto x = new uint8_t[2 << 42]; // 4TiB or thereabouts…
x[0] = 42;

This has nothing to do with known length or allocation count.

@rust-lang/infra is it possible to make @rust-timer to collect additional information (e.g. max-virtual-mem, in addition to max-rss…)? What repository should I fill an issue to request this?


While something like this happening is clearly a bug somewhere (and I cannot tell where exactly), it is fairly obvious some change from those in the commit are making LLVM to commit too much memory that likely ends up never being used.

One thing to debug this you could do is to compile stage1 core on your own UNIX machine and see what the maximum virtual memory ends up being. If it ends up being significantly larger than RSS, that would confirm my suspicions.

Another thing you could do is disabling overcommit (if you’re on Linux, that’s sysctl vm.overcommit=2).

Finally, we could also just bisect – there aren’t that many different changes in this PR. We could try landing them one by one (though there’s a danger that all these changes are cumulatively slightly raising the committed memory and none of them would fail CI on their own).

@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Nov 17, 2018

is it possible to make @rust-timer to collect additional information (e.g. max-virtual-mem, in addition to max-rss…)? What repository should I fill an issue to request this?

https://github.com/rust-lang-nursery/rustc-perf

@rust-timer

This comment was marked as off-topic.

Copy link

rust-timer commented Nov 17, 2018

Insufficient permissions to issue commands to rust-timer.

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 17, 2018

@nagisa ah, ok, thanks for the explanation; I wasn't thinking in terms of a possible memory bug.

I will do some test builds on a Linux machine when I have a bit of free time.

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 20, 2018

@nagisa I ran ./x.py build src/libcore --stage 1 with and without the changes applied (doing a clean) in between and monitored the VIRT column in htop, but there were no significant differences.

I tried to disable overcommit like you described too, but this command doesn't work in the setup I used (Manjaro Linux).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Nov 26, 2018

What are the current thoughts here? I'm sort of inclined to close this PR as "not worth the trouble", but do you all still want to poke at it? Can I assign the review to someone else (@nagisa?).

})
.collect::<Vec<_>>();
});

This comment has been minimized.

@nikic

nikic Nov 26, 2018

Contributor

You're iterating over globals here and then adding new globals in the loop below. With the collect that's fine, as you'll only iterate existing globals. Without the collect this is going to be an infinite loop and you OOM.

This comment has been minimized.

@ljedrz

ljedrz Nov 27, 2018

Contributor

Weird that this doesn't seem to cause issues on architectures different than i686 (at least that's the one where the OOM was being hit on appveyor); I can compile with these changes on x64 without issues.

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Nov 27, 2018

@nikomatsakis At this point I was interested more in the possibility of uncovering some memory-related bug (as described by @nagisa). I think @nikic is onto something - it might not be a bug but a peculiarity of unsafe code, where an iterator can be extended while being iterated over.

@ljedrz ljedrz force-pushed the ljedrz:llvm_back_allocations branch from 9e8bafc to ce4bce1 Dec 3, 2018

@ljedrz ljedrz force-pushed the ljedrz:llvm_back_allocations branch from ce4bce1 to 2043d30 Dec 3, 2018

@ljedrz

This comment has been minimized.

Copy link
Contributor

ljedrz commented Dec 3, 2018

@nikomatsakis I just remembered that the initial version of this PR included a change that had a -2.5% win for style-servo-opt which is considerable, especially since it is a huge benchmark; I rebased, re-included the win for servo and removed the problematic bit that @nikic marked as the one causing the OOM, so hopefully now it's good to go 🤞.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Dec 3, 2018

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 3, 2018

📌 Commit 2043d30 has been approved by nagisa

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 4, 2018

⌛️ Testing commit 2043d30 with merge 431e0ab...

bors added a commit that referenced this pull request Dec 4, 2018

Auto merge of #55871 - ljedrz:llvm_back_allocations, r=nagisa
codegen_llvm_back: improve allocations

This commit was split out from #54864. Last time it was causing an LLVM OOM, which was most probably caused by not collecting the globals.

- preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- ~~return an `impl Iterator` from `generate_lto_work`~~
- ~~don't `collect` `globals`, as they are iterated over and consumed right afterwards~~

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 4, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing 431e0ab to master...

@bors bors merged commit 2043d30 into rust-lang:master Dec 4, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@ljedrz ljedrz deleted the ljedrz:llvm_back_allocations branch Dec 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment