Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vec::from_elem with primitives should be as fast as calloc #7136

Closed
erickt opened this issue Jun 14, 2013 · 13 comments · Fixed by #10918
Closed

vec::from_elem with primitives should be as fast as calloc #7136

erickt opened this issue Jun 14, 2013 · 13 comments · Fixed by #10918

Comments

@erickt
Copy link
Contributor

erickt commented Jun 14, 2013

While @cmr landed a nice optimization of vec::from_elem in #6876, he said that it's still not performing as fast as doing a malloc and a ptr::set_memory. We should figure out why it is not performing as well as it should be and fix it in order to remove the temptation of using the faster unsafe functions.

@erickt
Copy link
Contributor Author

erickt commented Jun 15, 2013

It turns out this is substantially slower than I expected. With this test benchmark:

extern mod extra;

use std::vec;
use std::ptr;

#[bench]
fn bench_from_elem(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = vec::from_elem(1024, 0u8);
    }
}

#[bench]
fn bench_set_memory(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let mut v: ~[u8] = vec::with_capacity(1024);
        unsafe {
            let vp = vec::raw::to_mut_ptr(v);
            ptr::set_memory(vp, 0, 1024);
            vec::raw::set_len(&mut v, 1024);
        }
    }
}

fn main() {}

I'm getting these results:

running 2 tests
test bench_from_elem ... bench: 16351 ns/iter (+/- 192)
test bench_set_memory ... bench: 384 ns/iter (+/- 6)

This is related to #6623 and #7137.

@erickt
Copy link
Contributor Author

erickt commented Jun 15, 2013

@huonw mentioned in irc that with -O the difference is:

test bench_from_elem ... bench: 799 ns/iter (+/- 0)
test bench_set_memory ... bench: 200 ns/iter (+/- 0)

Which is much better, but still not great.

@emberian
Copy link
Member

memset "cheats" with sse. from_elem spends a lot of time in move_val_init without optimizations so there's a bunch of unnecessary function calls. With optimizations, the thing that's killing it is the copies.

@thestinger
Copy link
Contributor

LLVM knows how to optimize loops to the same code as memcpy/memmove/memset though, as long as you generate good IR.

http://llvm.org/docs/doxygen/html/LoopIdiomRecognize_8cpp_source.html

@luqmana
Copy link
Member

luqmana commented Jun 15, 2013

I added a bench for the literal vec repeat syntax as well:

#[bench]
fn bench_vec_repeat(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = ~[0u8, ..1024];
    }
}

Without optimizations:

running 3 tests
test bench_from_elem ... bench: 21803 ns/iter (+/- 597)
test bench_set_memory ... bench: 381 ns/iter (+/- 6)
test bench_vec_repeat ... bench: 2125 ns/iter (+/- 12)

With optimizations (-O):

running 3 tests
test bench_from_elem ... bench: 752 ns/iter (+/- 0)
test bench_set_memory ... bench: 183 ns/iter (+/- 0)
test bench_vec_repeat ... bench: 90 ns/iter (+/- 3)

Looking at the optimized IR, both set_memory and vec_repeat become a memset but set_memory seems to have a lot more overhead. from_elem does not become a memset hence the bad results.

@thestinger
Copy link
Contributor

@erickt: this gets a lot better with the optimization passes from #7466, from ~3x as slow to ~2x as slow

Before (opt-level=2):

test bench_from_elem ... bench: 986 ns/iter (+/- 3)
test bench_set_memory ... bench: 343 ns/iter (+/- 3)
test bench_vec_repeat ... bench: 175 ns/iter (+/- 3)

After (opt-level=2)

test bench_from_elem ... bench: 667 ns/iter (+/- 0)
test bench_set_memory ... bench: 343 ns/iter (+/- 0)
test bench_vec_repeat ... bench: 178 ns/iter (+/- 0)

@thestinger
Copy link
Contributor

@luqmana: the problem is with the code bloat from the surrounding code, rather than set_memory itself - I think we actually produce some IR there with undefined behaviour

@thestinger
Copy link
Contributor

#7682 greatly speeds up with_capacity, fixing half of the issue

Before:

test bench_from_elem ... bench: 683 ns/iter (+/- 3)
test bench_set_memory ... bench: 334 ns/iter (+/- 3)
test bench_vec_repeat ... bench: 176 ns/iter (+/- 0)

After:

test bench_from_elem ... bench: 461 ns/iter (+/- 0)
test bench_set_memory ... bench: 143 ns/iter (+/- 0)
test bench_vec_repeat ... bench: 148 ns/iter (+/- 0)

@thestinger
Copy link
Contributor

The performance of all 3 has improved, but from_elem got slower relative to the others:

test bench_from_elem ... bench: 409 ns/iter (+/- 4)
test bench_set_memory ... bench: 83 ns/iter (+/- 1)
test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)

@thestinger
Copy link
Contributor

It looks like the remaining issue is our pointer arithmetic being slow. The offset and mut_offset functions compile to conversions to and from integers, rather than actual pointer arithmetic with the stricter semantics. I think we need to make the + and - implementations for pointers a compiler feature.

bors added a commit that referenced this issue Jul 30, 2013
Closes #8118, #7136

~~~rust
extern mod extra;

use std::vec;
use std::ptr;

fn bench_from_elem(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = vec::from_elem(1024, 0u8);
    }
}

fn bench_set_memory(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let mut v: ~[u8] = vec::with_capacity(1024);
        unsafe {
            let vp = vec::raw::to_mut_ptr(v);
            ptr::set_memory(vp, 0, 1024);
            vec::raw::set_len(&mut v, 1024);
        }
    }
}

fn bench_vec_repeat(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = ~[0u8, ..1024];
    }
}
~~~

Before:

    test bench_from_elem ... bench: 415 ns/iter (+/- 17)
    test bench_set_memory ... bench: 85 ns/iter (+/- 4)
    test bench_vec_repeat ... bench: 83 ns/iter (+/- 3)

After:

    test bench_from_elem ... bench: 84 ns/iter (+/- 2)
    test bench_set_memory ... bench: 84 ns/iter (+/- 5)
    test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)
@thestinger
Copy link
Contributor

Fixed by #8121.

test bench_from_elem ... bench: 84 ns/iter (+/- 1)
test bench_set_memory ... bench: 84 ns/iter (+/- 3)
test bench_vec_repeat ... bench: 84 ns/iter (+/- 2)

@eddyb
Copy link
Member

eddyb commented Dec 10, 2013

This has regressed since that fix, @cmr is attempting a bisect atm.

Update: bisect finished, #8780 is to blame. This loop isn't optimized. Maybe the Finallyalizer drop method needs to be inlined for further optimizations.

@huonw huonw reopened this Dec 10, 2013
@emberian
Copy link
Member

Note that set_memory has regressed by ~16ns, cause is #9933 (df187a0)

@bors bors closed this as completed in 09bf5de Dec 14, 2013
flip1995 pushed a commit to flip1995/rust that referenced this issue Apr 27, 2021
…alid_sugg_macro_expansion, r=llogiq

manual_unwrap_or: fix invalid code suggestion, due to macro expansion

fixes rust-lang#6965

changelog: fix invalid code suggestion in `manual_unwrap_or` lint, due to macro expansion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants