New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline read_{un,}signed_leb128 and opaque::Decoder functions. #37083

Merged
merged 1 commit into from Oct 18, 2016

Conversation

Projects
None yet
7 participants
@nnethercote
Contributor

nnethercote commented Oct 11, 2016

read_unsigned_leb128 is hot within rustc because it's heavily used
during the reading of crate metadata. This commit tweaks its signature
(and that of read_signed_leb128, for consistency) so it can increment
the buffer index directly instead of maintaining its own copy, the
change in which is then used by the caller to advance the index.

This reduces the instruction count (as measured by Cachegrind) for some
benchmarks a bit, e.g. hyper-0.5.0 by 0.7%.

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive Oct 11, 2016

Collaborator

r? @erickt

(rust_highfive has picked a reviewer for you, use r? to override)

Collaborator

rust-highfive commented Oct 11, 2016

r? @erickt

(rust_highfive has picked a reviewer for you, use r? to override)

@Mark-Simulacrum

This comment has been minimized.

Show comment
Hide comment
@Mark-Simulacrum

Mark-Simulacrum Oct 11, 2016

Member

Leaving a comment on these methods noting why this was done is probably a good idea so a future clean-up doesn't accidentally remove this optimization, since it's not intuitively a good idea to maintain a pointer to a usize instead of copying it back and forth.

Member

Mark-Simulacrum commented Oct 11, 2016

Leaving a comment on these methods noting why this was done is probably a good idea so a future clean-up doesn't accidentally remove this optimization, since it's not intuitively a good idea to maintain a pointer to a usize instead of copying it back and forth.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 14, 2016

Contributor

FWIW, I'm treating @Mark-Simulacrum's comment as a suggestion worth considering, but I'm still waiting for @erickt to review before I make any changes.

Contributor

nnethercote commented Oct 14, 2016

FWIW, I'm treating @Mark-Simulacrum's comment as a suggestion worth considering, but I'm still waiting for @erickt to review before I make any changes.

@erickt

This comment has been minimized.

Show comment
Hide comment
@erickt

erickt Oct 17, 2016

Contributor

@nnethercote: Looks good to me if you add the comment. This seems a tad magical though. I would have expected llvm to generate similar code. Maybe it's having trouble doing return-value optimization? I wonder if this is an area a MIR optimization could target.

Contributor

erickt commented Oct 17, 2016

@nnethercote: Looks good to me if you add the comment. This seems a tad magical though. I would have expected llvm to generate similar code. Maybe it's having trouble doing return-value optimization? I wonder if this is an area a MIR optimization could target.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Oct 17, 2016

Member

@erickt It's because the function is not marked as #[inline] 😞.

Member

eddyb commented Oct 17, 2016

@erickt It's because the function is not marked as #[inline] 😞.

@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 18, 2016

Contributor

@erickt It's because the function is not marked as #[inline] 😞.

I've added that, and also inlined all the methods within opaque::Decoder and the results are much better than the original change :) More measurements and an updated patch coming soon.

Contributor

nnethercote commented Oct 18, 2016

@erickt It's because the function is not marked as #[inline] 😞.

I've added that, and also inlined all the methods within opaque::Decoder and the results are much better than the original change :) More measurements and an updated patch coming soon.

Inline read_{un,}signed_leb128 and opaque::Decoder functions.
These functions are all hot in rustc and inlining them speeds up most of
the rustc-benchmarks by 1--2%.
@nnethercote

This comment has been minimized.

Show comment
Hide comment
@nnethercote

nnethercote Oct 18, 2016

Contributor

stage1:

futures-rs-test  4.416s vs  4.317s --> 1.023x faster (variance: 1.007x, 1.010x)
helloworld       0.234s vs  0.229s --> 1.021x faster (variance: 1.007x, 1.014x)
html5ever-2016-  5.442s vs  5.389s --> 1.010x faster (variance: 1.008x, 1.005x)
hyper.0.5.0      5.131s vs  5.054s --> 1.015x faster (variance: 1.001x, 1.010x)
inflate-0.1.0    4.902s vs  4.852s --> 1.010x faster (variance: 1.013x, 1.003x)
issue-32062-equ  0.348s vs  0.344s --> 1.012x faster (variance: 1.010x, 1.006x)
issue-32278-big  1.711s vs  1.706s --> 1.003x faster (variance: 1.006x, 1.017x)
jld-day15-parse  1.550s vs  1.541s --> 1.006x faster (variance: 1.016x, 1.007x)
piston-image-0. 12.046s vs 11.915s --> 1.011x faster (variance: 1.037x, 1.009x)
regex.0.1.30     2.557s vs  2.500s --> 1.023x faster (variance: 1.005x, 1.017x)
rust-encoding-0  2.109s vs  2.055s --> 1.026x faster (variance: 1.006x, 1.008x)
syntex-0.42.2   32.859s vs 32.609s --> 1.008x faster (variance: 1.004x, 1.032x)
syntex-0.42.2-i 17.998s vs 17.732s --> 1.015x faster (variance: 1.008x, 1.021x)
Contributor

nnethercote commented Oct 18, 2016

stage1:

futures-rs-test  4.416s vs  4.317s --> 1.023x faster (variance: 1.007x, 1.010x)
helloworld       0.234s vs  0.229s --> 1.021x faster (variance: 1.007x, 1.014x)
html5ever-2016-  5.442s vs  5.389s --> 1.010x faster (variance: 1.008x, 1.005x)
hyper.0.5.0      5.131s vs  5.054s --> 1.015x faster (variance: 1.001x, 1.010x)
inflate-0.1.0    4.902s vs  4.852s --> 1.010x faster (variance: 1.013x, 1.003x)
issue-32062-equ  0.348s vs  0.344s --> 1.012x faster (variance: 1.010x, 1.006x)
issue-32278-big  1.711s vs  1.706s --> 1.003x faster (variance: 1.006x, 1.017x)
jld-day15-parse  1.550s vs  1.541s --> 1.006x faster (variance: 1.016x, 1.007x)
piston-image-0. 12.046s vs 11.915s --> 1.011x faster (variance: 1.037x, 1.009x)
regex.0.1.30     2.557s vs  2.500s --> 1.023x faster (variance: 1.005x, 1.017x)
rust-encoding-0  2.109s vs  2.055s --> 1.026x faster (variance: 1.006x, 1.008x)
syntex-0.42.2   32.859s vs 32.609s --> 1.008x faster (variance: 1.004x, 1.032x)
syntex-0.42.2-i 17.998s vs 17.732s --> 1.015x faster (variance: 1.008x, 1.021x)

@nnethercote nnethercote changed the title from Tweak read_{un,}signed_leb128. to Inline read_{un,}signed_leb128 and opaque::Decoder functions. Oct 18, 2016

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb
Member

eddyb commented Oct 18, 2016

@bors r+

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 18, 2016

Contributor

📌 Commit 6a4bb35 has been approved by eddyb

Contributor

bors commented Oct 18, 2016

📌 Commit 6a4bb35 has been approved by eddyb

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors Oct 18, 2016

Contributor

⌛️ Testing commit 6a4bb35 with merge 753ea76...

Contributor

bors commented Oct 18, 2016

⌛️ Testing commit 6a4bb35 with merge 753ea76...

bors added a commit that referenced this pull request Oct 18, 2016

Auto merge of #37083 - nnethercote:uleb128, r=eddyb
Inline read_{un,}signed_leb128 and opaque::Decoder functions.

`read_unsigned_leb128` is hot within rustc because it's heavily used
during the reading of crate metadata. This commit tweaks its signature
(and that of `read_signed_leb128`, for consistency) so it can increment
the buffer index directly instead of maintaining its own copy, the
change in which is then used by the caller to advance the index.

This reduces the instruction count (as measured by Cachegrind) for some
benchmarks a bit, e.g. hyper-0.5.0 by 0.7%.

@bors bors merged commit 6a4bb35 into rust-lang:master Oct 18, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@nnethercote nnethercote deleted the nnethercote:uleb128 branch Oct 18, 2016

@brson brson added the relnotes label Oct 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment