New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the fast path for assert_eq! and assert_ne! #57815

Merged
merged 1 commit into from Feb 13, 2019

Conversation

Projects
None yet
8 participants
@dotdash
Copy link
Contributor

dotdash commented Jan 21, 2019

Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.

By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.

We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.

Speed up the fast path for assert_eq! and assert_ne!
Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.

By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.

We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 21, 2019

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 21, 2019

@bors try

Local numbers seem promising, but I'd like to have "official" numbers from prlo as well

screen shot 2019-01-21 at 19 44 26-fullpage

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 21, 2019

⌛️ Trying commit 5a7cd84 with merge 261660d...

bors added a commit that referenced this pull request Jan 21, 2019

Auto merge of #57815 - dotdash:asserts, r=<try>
Speed up the fast path for assert_eq! and assert_ne!

Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.

By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.

We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.
@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 21, 2019

FWIW, the above numbers are on top of a build with my pending PRs already applied.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jan 21, 2019

Would it be worth adding a codegen test to make sure these continue to behave nicely?

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 21, 2019

Would it be worth adding a codegen test to make sure these continue to behave nicely?

Good idea. I'll try to add one tomorrow. I will have to check how to do so, it's been a while and I'm out of time for today.

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 21, 2019

Oh, just in case somebody's wondering about it, a good deal of the perf win is probably due to the fact that functions using those macros are now smaller, which allows for them to be inlined, pontentially avoiding even more copies. This was the case for the to_bits function, which I had been investigating, and which wasn't inlined even though it had the inline attribute which even lowers the bar for inlining. With those changes, the function is properly inlined.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 21, 2019

☀️ Test successful - checks-travis
State: approved= try=True

@hellow554

This comment has been minimized.

Copy link
Contributor

hellow554 commented Jan 22, 2019

Maybe my issue will be solved then? #55914

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 22, 2019

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 22, 2019

Actually, I just checked again, and there's still some extra work happening that's not strictly required.

Assembly looks like this:

assert_eq!

	pushq	%rax
	.cfi_def_cfa_offset 16
	cmpl	%esi, %edi
	jne	.LBB10_1
	movb	$1, %al
	popq	%rcx
	.cfi_def_cfa_offset 8
	retq

assert_eq! (new)

	subq	$104, %rsp
	.cfi_def_cfa_offset 112
	movl	%edi, (%rsp)
	movl	%esi, 4(%rsp)
	cmpl	%esi, %edi
	jne	.LBB11_1
	movb	$1, %al
	addq	$104, %rsp
	.cfi_def_cfa_offset 8
	retq

😢

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Jan 22, 2019

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Jan 22, 2019

Success: Queued 261660d with parent 7164a9f, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

rust-timer commented Jan 22, 2019

Finished benchmarking try commit 261660d

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 22, 2019

Hm, that's interesting, the number's aren't even close to what I got here. I'll check later whether there is some interaction between this and my other PRs that were included on both sides here.

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 24, 2019

Seems that my baseline compiler was somehow slower than it should be, after cleaning out the build directory of that one and doing a fresh build (x.py build on its own didn't do anything), I get the same numbers as prlo. Sorry for providing wrong nunmbers initially.

I'm experimenting with an improved version of this to get (closer to) a result that is more or less equal to the code that assert(a == b), possibly outlining the panic call as well, but there are currently some roadblocks to that, so it might take a while (or even fail).

Do we want to merge this anyway (possibly with an adapted commit message) or shall I close it for now?

@hellow554

This comment has been minimized.

Copy link
Contributor

hellow554 commented Jan 24, 2019

I did some benchmarks with outlining the panic. It will give me this:

example::unlkly:
        pushq   %rax
        cmpl    %esi, %edi
        jne     .LBB2_1
        popq    %rcx
        retq
.LBB2_1:
        callq   example::moep
        ud2

IMHO, this should be even more simplified to

example::unlkly:
        cmpl    %esi, %edi
        jne     .LBB2_1
        retq
.LBB2_1:
       # [doing stuff]
        callq   example::moep
        ud2

Either way, I don't understand why rust/LLVM does pushq %rax and popq %rcx. Is it a bug?

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 24, 2019

Either way, I don't understand why rust/LLVM does pushq %rax and popq %rcx. Is it a bug?

No, these instructions adjust %rsp to get the correct stack alignment. Using those push/pops leads to shorter instructions than doing a sub on %rsp itself.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jan 24, 2019

So it sounds like there's still a performance improvement from this change, it's just smaller than originally thought? SGTM if that's the case.

@hellow554

This comment has been minimized.

Copy link
Contributor

hellow554 commented Jan 24, 2019

Either way, I don't understand why rust/LLVM does pushq %rax and popq %rcx. Is it a bug?

No, these instructions adjust %rsp to get the correct stack alignment. Using those push/pops leads to shorter instructions than doing a sub on %rsp itself.

I don't think that this applies here, because you don't need to align the stack at all... Just do a compare, branch (if needed) or return.

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Jan 24, 2019

Either way, I don't understand why rust/LLVM does pushq %rax and popq %rcx. Is it a bug?

No, these instructions adjust %rsp to get the correct stack alignment. Using those push/pops leads to shorter instructions than doing a sub on %rsp itself.

I don't think that this applies here, because you don't need to align the stack at all... Just do a compare, branch (if needed) or return.

You can skip the alignment step in leaf functions, but this is not a leaf function, there's a call to example::moep. And IIRC (just vaguely though) the stack pointer adjustment has to happen in the function prologue, so you can't move it past the branch.

@Dylan-DPC

This comment has been minimized.

Copy link
Member

Dylan-DPC commented Feb 11, 2019

ping from triage @sfackler awaiting your review on this

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Feb 12, 2019

@bors r+ rollup

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 12, 2019

📌 Commit 5a7cd84 has been approved by sfackler

Centril added a commit to Centril/rust that referenced this pull request Feb 13, 2019

Rollup merge of rust-lang#57815 - dotdash:asserts, r=sfackler
Speed up the fast path for assert_eq! and assert_ne!

Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.

By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.

We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.

Centril added a commit to Centril/rust that referenced this pull request Feb 13, 2019

Rollup merge of rust-lang#57815 - dotdash:asserts, r=sfackler
Speed up the fast path for assert_eq! and assert_ne!

Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.

By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.

We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.

bors added a commit that referenced this pull request Feb 13, 2019

Auto merge of #58413 - Centril:rollup, r=Centril
Rollup of 13 pull requests

Successful merges:

 - #57693 (Doc rewording)
 - #57815 (Speed up the fast path for assert_eq! and assert_ne!)
 - #58034 (Stabilize the time_checked_add feature)
 - #58057 (Stabilize linker-plugin based LTO (aka cross-language LTO))
 - #58137 (Cleanup: rename node_id_to_type(_opt))
 - #58166 (allow shorthand syntax for deprecation reason)
 - #58196 (Add specific feature gate error for const-unstable features)
 - #58200 (fix str mutating through a ptr derived from &self)
 - #58273 (Rename rustc_errors dependency in rust 2018 crates)
 - #58289 (impl iter() for dyn Error)
 - #58387 (Disallow `auto` trait alias syntax)
 - #58404 (use Ubuntu keyserver for CloudABI ports)
 - #58405 (Remove some dead code from libcore)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Feb 13, 2019

Rollup merge of rust-lang#57815 - dotdash:asserts, r=sfackler
Speed up the fast path for assert_eq! and assert_ne!

Currently, the panic!() calls directly borrow the value bindings. This
causes those bindings to always be initialized, i.e. they're initialized
even before the values are even compared. This causes noticeable
overhead in what should be a really cheap operation.

By performing a reborrow of the value in the call to panic!(), we allow
LLVM to optimize that code, so that the extra borrow only happens in the
error case.

We could achieve the same result by dereferencing the values passed to
panic!(), as the format machinery borrows them anyway, but this causes
assertions to fail to compile if one of the values is unsized, i.e. it
would be a breaking change.

bors added a commit that referenced this pull request Feb 13, 2019

Auto merge of #58415 - Centril:rollup, r=Centril
Rollup of 12 pull requests

Successful merges:

 - #57693 (Doc rewording)
 - #57815 (Speed up the fast path for assert_eq! and assert_ne!)
 - #58034 (Stabilize the time_checked_add feature)
 - #58057 (Stabilize linker-plugin based LTO (aka cross-language LTO))
 - #58137 (Cleanup: rename node_id_to_type(_opt))
 - #58166 (allow shorthand syntax for deprecation reason)
 - #58200 (fix str mutating through a ptr derived from &self)
 - #58273 (Rename rustc_errors dependency in rust 2018 crates)
 - #58289 (impl iter() for dyn Error)
 - #58387 (Disallow `auto` trait alias syntax)
 - #58404 (use Ubuntu keyserver for CloudABI ports)
 - #58405 (Remove some dead code from libcore)

Failed merges:

r? @ghost

@bors bors merged commit 5a7cd84 into rust-lang:master Feb 13, 2019

1 check passed

homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment