Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate excessive null-checks from slice iterators #21886

Merged
merged 2 commits into from Feb 19, 2015

Conversation

Projects
None yet
@dotdash
Copy link
Contributor

dotdash commented Feb 3, 2015

The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193

running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured

Fixes #18193

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Feb 3, 2015

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 3, 2015

From the description I take it this doesn't suffer from the compile time problem mentioned in #21418 (comment)? (Out of interest what changed in LLVM? Just general optimisations?)

Also, I found that the compiler aborted compiling libcore (can't remember if it was stage1 or stage2) when I did a similar change (#21448); this doesn't suffer from that?

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 3, 2015

From the description I take it this doesn't suffer from the compile time problem mentioned in #21418 (comment)?

Right, I compared compile times for (IIRC) rustc_trans and rustc_typeck and there was no measureable difference in compile times with the assume in stage 2 (or maybe it was stage 1 that I tested, shouldn't matter).

I also just bootstrapped a new version of #21418 and that took ~24 minutes from finishing stage0 libcore to finishing stage2 librustc. Considering that just rustc_trans took 13 minutes when the slowdown was there, it looks good to me ;-)

(Out of interest what changed in LLVM? Just general optimisations?)

I don't know.

Also, I found that the compiler aborted compiling libcore (can't remember if it was stage1 or stage2) when I did a similar change (#21448); this doesn't suffer from that?

The benchmark was taken with the stage2 rustc, so no, it didn't abort.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 3, 2015

Could you add the assume to next_back as well?

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 3, 2015

Why next_back? Wouldn't as_mut_slice and into_iter be the places that correspond to as_slice?

I originally had it in next (which is why I forgot to add it to as_mut_slice, because next is generated in a macro that is used for both iter types), and that gave worse results, because the assumption got lost at some point when SROA (I think) replaced the load in next() with the one from as_slice.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 3, 2015

Oh, huh. I totally misread this PR. I was thinking this was in <slice::Iter<T> as Iterator>::next.

@dotdash dotdash force-pushed the dotdash:fast_slice_iter branch from fb2e27b to a2dcc1e Feb 3, 2015

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 3, 2015

I suspect that means that this doesn't solve a case like http://is.gd/pteFUM,

.LBB0_2:
    testq   %rdx, %rdx
    je  .LBB0_4
    addl    (%rdx), %eax
    addq    $4, %rdx
    addq    $-4, %rcx
    jne .LBB0_2

since there's no Vec involved. I wonder if that would be addressed by placing assumptions in the iter/iter_mut constructor?

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 3, 2015

Will try to see if that helps

@@ -427,8 +428,12 @@ impl<T> Vec<T> {
#[stable(feature = "rust1", since = "1.0.0")]
pub fn as_mut_slice<'a>(&'a mut self) -> &'a mut [T] {
unsafe {
let ptr = *self.ptr;
if cfg!(not(stage0)) { // NOTE remove cfg after next snapshot
assume(ptr != 0 as *mut T);

This comment has been minimized.

@jdm

jdm Feb 3, 2015

Contributor

ptr::null_mut() here and all other uses of 0 as *mut T

This comment has been minimized.

@dotdash

dotdash Feb 3, 2015

Author Contributor

Actually, using is_null() works now, that didn't work the first time I tried, either because of an error on my side or because of the ptrtoint instruction it was generated with the old is_null() implementation.
Thanks!

@dotdash dotdash force-pushed the dotdash:fast_slice_iter branch from a2dcc1e to e49f6d6 Feb 3, 2015

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 3, 2015

I wonder if that would be addressed by placing assumptions in the iter/iter_mut constructor?

At least in this case, the assume() gets canonicalized to !nonnull metadata on a load, and that gets lost in the SROA pass. I've added the call anyway, hoping for it to be useful in the future.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 3, 2015

@bors r+ e49f

@nikomatsakis nikomatsakis assigned huonw and unassigned nikomatsakis Feb 3, 2015

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 3, 2015

@bors: rollup

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 4, 2015

rollup merge of rust-lang#21886: dotdash/fast_slice_iter
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from rust-lang#18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes rust-lang#18193
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 4, 2015

@bors: rollup-

bors added a commit that referenced this pull request Feb 4, 2015

Auto merge of #21886 - dotdash:fast_slice_iter, r=huonw
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes #18193
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 4, 2015

⌛️ Testing commit e49f6d6 with merge cd0d956...

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 4, 2015

💔 Test failed - auto-win-32-nopt-t

@bluss

This comment has been minimized.

Copy link
Contributor

bluss commented Feb 5, 2015

@huonw, that fold testcase seems to have been resolved now, it vectorizes! (rustc 1.0.0-nightly (ba2f13ef0 2015-02-04 20:03:55 +0000) Probably due to the other awesome nonnull changes?

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Feb 7, 2015

The failure seems legitimate; although, possibly a LLVM bug?

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 9, 2015

Yeah, I can reproduce that in my Windows VM.

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 10, 2015

dotdash added a commit to dotdash/rust that referenced this pull request Feb 14, 2015

Update LLVM to release_36@229036
Disable asserts in the PassInfo cache to improve performance. Also fixes
the crash blocking rust-lang#21886.

Fixes rust-lang#22233

dotdash added a commit to dotdash/rust that referenced this pull request Feb 14, 2015

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 17, 2015

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 18, 2015

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Feb 18, 2015

LLVM update landed. Needs rebase.

dotdash added some commits Feb 2, 2015

Eliminate excessive null-checks from slice iterators
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes #18193
Avoid ptrtoint when checking if a pointer is null
Casting the pointer to an integer requires a ptrtoint, while casting 0
to a pointer is directly folded to a `null` value.

@dotdash dotdash force-pushed the dotdash:fast_slice_iter branch from e49f6d6 to 7412d1b Feb 18, 2015

@dotdash

This comment has been minimized.

Copy link
Contributor Author

dotdash commented Feb 18, 2015

@bors r=huonw 7412d1b

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 18, 2015

⌛️ Testing commit 7412d1b with merge c103604...

bors added a commit that referenced this pull request Feb 18, 2015

Auto merge of #21886 - dotdash:fast_slice_iter, r=huonw
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes #18193
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 18, 2015

💔 Test failed - auto-win-32-nopt-t

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Feb 18, 2015

@bors: retry

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 18, 2015

⌛️ Testing commit 7412d1b with merge a1cfc62...

bors added a commit that referenced this pull request Feb 18, 2015

Auto merge of #21886 - dotdash:fast_slice_iter, r=huonw
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from #18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes #18193
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Feb 18, 2015

💔 Test failed - auto-win-32-nopt-t

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Feb 18, 2015

@bors: retry

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 19, 2015

rollup merge of rust-lang#21886: dotdash/fast_slice_iter
The data pointer used in the slice is never null, using assume() to tell
LLVM about it gets rid of various unneeded null checks when iterating
over the slice.

Since the snapshot compiler is still using an older LLVM version, omit
the call in stage0, because compile times explode otherwise.

Benchmarks from rust-lang#18193
````
running 5 tests
test _range    ... bench:     33329 ns/iter (+/- 417)
test assembly  ... bench:     33299 ns/iter (+/- 58)
test enumerate ... bench:     33318 ns/iter (+/- 83)
test iter      ... bench:     33311 ns/iter (+/- 130)
test position  ... bench:     33300 ns/iter (+/- 47)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
````

Fixes rust-lang#18193

@bors bors merged commit 7412d1b into rust-lang:master Feb 19, 2015

1 of 2 checks passed

homu Test failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dotdash dotdash deleted the dotdash:fast_slice_iter branch Feb 22, 2015

dotdash added a commit to dotdash/rust that referenced this pull request Feb 22, 2015

Eliminate more excessive null-checks from slice iterators
This adds the assume() calls back that got lost when rebasing rust-lang#21886.

Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 23, 2015

Rollup merge of rust-lang#22669 - dotdash:fast_slice_iter, r=huonw
 This adds the assume() calls back that got lost when rebasing rust-lang#21886.

Manishearth added a commit to Manishearth/rust that referenced this pull request Feb 23, 2015

Rollup merge of rust-lang#22669 - dotdash:fast_slice_iter, r=huonw
 This adds the assume() calls back that got lost when rebasing rust-lang#21886.

bors added a commit that referenced this pull request Feb 24, 2015

Auto merge of #22669 - dotdash:fast_slice_iter, r=huonw
This adds the assume() calls back that got lost when rebasing #21886.

bors added a commit that referenced this pull request Feb 27, 2015

Auto merge of #22669 - dotdash:fast_slice_iter, r=huonw
This adds the assume() calls back that got lost when rebasing #21886.

bors added a commit that referenced this pull request Feb 28, 2015

Auto merge of #22669 - dotdash:fast_slice_iter, r=huonw
This adds the assume() calls back that got lost when rebasing #21886.
@frol

This comment has been minimized.

Copy link

frol commented May 9, 2018

Could someone, please, point me to a tracking issue or a documentation about the excessive null-checks?

I want to learn more about the cause of extra testq %rsi, %rsi (test rsi, rsi) blocks when I compile the following snippets:

pub fn foo(s: &[u32]) -> u32 {
    s.iter().sum()
}
pub fn foo(s: &[u32]) -> u32 {
    s.iter().sum()
}
pub fn foo(s: Vec<i32>) -> usize {
    s.len()
}

and even

pub fn foo(_s: Vec<i32>) -> usize {
    0
}

produces assembly code which might have been avoided:

playground::foo:
	movq	8(%rdi), %rsi
	testq	%rsi, %rsi
	je	.LBB0_2
	pushq	%rax
	movq	(%rdi), %rdi
	shlq	$2, %rsi
	movl	$4, %edx
	callq	__rust_dealloc@PLT
	addq	$8, %rsp

.LBB0_2:
	xorl	%eax, %eax
	retq
@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented May 9, 2018

In both these cases the length, rather than pointer is being checked. For the slice case as early exit before a vectorised solution is used, for Vec drop case because Vec may actually not have backing storage (Vec::new does not allocate).

@frol

This comment has been minimized.

Copy link

frol commented May 9, 2018

for Vec drop case because Vec may actually not have backing storage (Vec::new does not allocate).

Indeed, I completely forgot that the owned object must be deallocated here.

For the slice case as early exit before a vectorised solution is used

Right...

@nagisa I am sorry for the trouble! Rust is awesome! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.