New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split `Liveness::users` into three. #54211

Merged
merged 1 commit into from Sep 20, 2018

Conversation

Projects
None yet
7 participants
@nnethercote
Contributor

nnethercote commented Sep 14, 2018

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a check-clean build of keccak it reduces max-rss
by 20%.

r? @nikomatsakis, but I want to do a perf run. Locally, I had these results:

  • instructions: slight regression
  • max-rss: big win on "Clean" builds
  • faults: big win on "Clean" and "Nll" builds
  • wall-time: small win on "Clean" and "Nll" builds

So I want to see how a different machine compares.

Split `Liveness::users` into three.
This reduces memory usage on some benchmarks because no space is wasted
for padding. For a `check-clean` build of `keccak` it reduces `max-rss`
by 20%.
@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Sep 14, 2018

@bors try

@bors

This comment has been minimized.

Contributor

bors commented Sep 14, 2018

⌛️ Trying commit efae70c with merge 7da277b...

bors added a commit that referenced this pull request Sep 14, 2018

Auto merge of #54211 - nnethercote:keccak-Liveness-memory, r=<try>
Split `Liveness::users` into three.

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a `check-clean` build of `keccak` it reduces `max-rss`
by 20%.

r? @nikomatsakis, but I want to do a perf run. Locally, I had these results:
- instructions: slight regression
- max-rss: big win on "Clean" builds
- faults: big win on "Clean" and "Nll" builds
- wall-time: small win on "Clean" and "Nll" builds

So I want to see how a different machine compares.
// separate `Vec`s so that no space is wasted for padding.
users_reader: Vec<LiveNode>,
users_writer: Vec<LiveNode>,
users_used: Vec<bool>,

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum Sep 14, 2018

Member

Is it worth also making this a bitset/bitvec?

This comment has been minimized.

@nnethercote

nnethercote Sep 14, 2018

Contributor

#54208 (comment) discusses this. It would reduce memory more but increase instruction counts. Vec<bool> might be the best middle ground; let's see how the perf results look with it.

@bors

This comment has been minimized.

Contributor

bors commented Sep 14, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Sep 14, 2018

@rust-timer

This comment has been minimized.

rust-timer commented Sep 14, 2018

Success: Queued 7da277b with parent 4f921d7, comparison URL.

@nikomatsakis

r=me if the perf results look good; note that we would like to rewrite this completely to operate on MIR. But that's a bigger job (@wesleywiser was interested, I think).

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Sep 14, 2018

Perf results look like a small perf hit (5-6%) but a big memory use hit (20% etc in some cases). Interesting.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Sep 14, 2018

I think this is a rare case where instruction counts are misleading!

Note that keccak is the one mostly clearly affected, and inflate and clap-rs are also affected, and nothing else is. So only look at the results for those three benchmarks; the rest is noise. Also keccak-check is the only one that measures NLL.

With all that in mind, here are the results just for keccak, including check, debug and opt.

  • cpu-clock: 0--6% better, 0.7% better for nll-check
  • cycles: 0--4% better, 0.6% better for nll-check
  • faults: 4--18% better, 6.6% better for nll-check
  • instructions: 0--3% worse, 0.3% worse for nll-check
  • max-rss: 0--20% better, no change for nll-check
  • task-clock: 0--6% better, 0.7% better for nll-check
  • wall-time: 0--6% better, 0.7% better for nll-check

Instructions gets worse, but everything else gets better. And we have a simple theoretical explanation for this: less memory traffic. So I think we should land this, but I am happy to defer to @nikomatsakis's decision.

@wesleywiser

This comment has been minimized.

Member

wesleywiser commented Sep 14, 2018

I think the other important bit to note is that it seems to be mostly the clean incremental builds which are showing regressions. A 5% regression on the clean incremental time is usually a very, very small amount of clock time since clean incremental builds are usually very fast.

@nnethercote

This comment has been minimized.

Contributor

nnethercote commented Sep 15, 2018

A 5% regression on the clean incremental time is usually a very, very small

At the risk of laboring the point: it's a win, not a regression on the time. It's only a regression on instruction counts.

@wesleywiser

This comment has been minimized.

Member

wesleywiser commented Sep 15, 2018

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Sep 18, 2018

@bors r+

@bors

This comment has been minimized.

Contributor

bors commented Sep 18, 2018

📌 Commit efae70c has been approved by nikomatsakis

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Sep 18, 2018

Sorry, forgot I hadn't already written that. Thanks @nnethercote for the extra details.

@bors

This comment has been minimized.

Contributor

bors commented Sep 20, 2018

⌛️ Testing commit efae70c with merge 1d33aed...

bors added a commit that referenced this pull request Sep 20, 2018

Auto merge of #54211 - nnethercote:keccak-Liveness-memory, r=nikomats…
…akis

Split `Liveness::users` into three.

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a `check-clean` build of `keccak` it reduces `max-rss`
by 20%.

r? @nikomatsakis, but I want to do a perf run. Locally, I had these results:
- instructions: slight regression
- max-rss: big win on "Clean" builds
- faults: big win on "Clean" and "Nll" builds
- wall-time: small win on "Clean" and "Nll" builds

So I want to see how a different machine compares.
@bors

This comment has been minimized.

Contributor

bors commented Sep 20, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing 1d33aed to master...

@bors bors merged commit efae70c into rust-lang:master Sep 20, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:keccak-Liveness-memory branch Sep 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment