performance issue #15

imhun · 2018-04-28T09:08:52Z

Hi,On my mac book pro (8c 16g), the library has a large performance difference compared to the code I manually implement.
I read the library implementation code, logically it should be the same, but why is there such a big difference in performance?
I didn't read all the library code carefully, could you please give me some advice on whether my test method is wrong or the code of the library itself?
rustc 1.27.0-nightly,the bench result:

running 4 tests    
test get_bitfield ... bench:      39,371 ns/iter (+/- 2,489)
test get_trivial  ... bench:         901 ns/iter (+/- 102)
test set_bitfield ... bench:      39,439 ns/iter (+/- 779)
test set_trivial  ... bench:       2,752 ns/iter (+/- 76)

my trivial code and bench code:

#![feature(test)]

extern crate bit_field;

use bit_field::*;

pub trait BitOper {
    fn get_b(&self, idx: usize) -> bool;
    fn set_b(&mut self, idx: usize, val: bool);
    fn toggle(&mut self, idx: usize);
}

impl BitOper for u8 {
    fn set_b(&mut self, idx: usize, val: bool) {
        assert!(idx < 8);
        if val {
            *self |= 1 << idx;
        } else {
            *self &= !(1 << idx);
        }
    }

    fn get_b(&self, idx: usize) -> bool {
        assert!(idx < 8);
        (self & 1 << idx) != 0
    }

    fn toggle(&mut self, idx: usize) {
        assert!(idx < 8);
        *self ^= 1 << idx;
    }
}

impl BitOper for Vec<u8> {
    fn get_b(&self, idx: usize) -> bool {
        self[idx / 8].get_b((idx % 8))
    }

    fn set_b(&mut self, idx: usize, val: bool) {
        self[idx / 8].set_b((idx % 8), val);
    }

    fn toggle(&mut self, idx: usize) {
        self[idx / 8].toggle((idx % 8));
    }
}

extern crate test;

use test::Bencher;

const len: usize = 256;

#[bench]
fn set_bitfield(b: &mut Bencher) {
    let mut v = vec![0u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            v.as_mut_slice().set_bit(i, true);;
        }
    });
}

#[bench]
fn set_trivial(b: &mut Bencher) {
    let mut v = vec![0u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            v.set_b(i, true);
        }
    });
}

#[bench]
fn get_bitfield(b: &mut Bencher) {
    let mut v = vec![1u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            let b = v.as_mut_slice().get_bit(i);
        }
    });
}

#[bench]
fn get_trivial(b: &mut Bencher) {
    let mut v = vec![1u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            let b = v.get_b(i);
        }
    });
}

kjetilkjeka · 2018-04-28T09:29:51Z

I get similar results on my work computer (Ubuntu 16.04, i7-8700k)

running 4 tests
test get_bitfield ... bench:      25,142 ns/iter (+/- 719)
test get_trivial  ... bench:         676 ns/iter (+/- 18)
test set_bitfield ... bench:      26,329 ns/iter (+/- 1,106)
test set_trivial  ... bench:       1,800 ns/iter (+/- 63)

kjetilkjeka · 2018-04-28T10:03:31Z

The problem seems to be a mix between inlining and const evaluation. I have produced a patch with the following benchmarks:

running 4 tests
test get_bitfield ... bench:         510 ns/iter (+/- 16)
test get_trivial  ... bench:         681 ns/iter (+/- 24)
test set_bitfield ... bench:       1,814 ns/iter (+/- 59)
test set_trivial  ... bench:       1,803 ns/iter (+/- 60)

PR and a couple of new issues inc

kjetilkjeka · 2018-04-28T10:21:01Z

So it looks like @phil-opp already have fixed half the problem (const stuff) in #13

The other half of the problem is that rust will not inline stuff between crate boundaries unless functions are marked with #[inline] or lto is used (allegedly, not tried myself).

After #13 and #16 is addressed i will suggest inlining relevant methods in BitField and BitArray.

phil-opp · 2018-04-28T13:59:45Z

Thanks for reporting and investigating! Seems like we should mark the relevant functions with #[inline].

kjetilkjeka · 2018-04-28T17:24:24Z

BitField and BitArray methods are now similar to the "trivial" functions (on my comp). @huwsun you should try a re-run with the master branch to confirm this works for you as well.

Here are the benchmarks:

running 4 tests
test get_bitfield ... bench:         509 ns/iter (+/- 14)
test get_trivial  ... bench:         678 ns/iter (+/- 171)
test set_bitfield ... bench:       1,802 ns/iter (+/- 94)
test set_trivial  ... bench:       1,793 ns/iter (+/- 49)

imhun · 2018-04-29T01:49:47Z

Good job! Now the result on my benchmark:

running 4 tests
test get_bitfield ... bench:         686 ns/iter (+/- 48)
test get_trivial  ... bench:         672 ns/iter (+/- 45)
test set_bitfield ... bench:       2,716 ns/iter (+/- 66)
test set_trivial  ... bench:       2,787 ns/iter (+/- 263)

phil-opp · 2018-04-29T19:55:35Z

Awesome, thanks so much for your work @kjetilkjeka! Seems like we can close this issue then.

It would be great these benchmarks to the repository. Would you be willing to open a pull request, @huwsun? (I also like the idea of adding a CI test that fails on huge regressions like proposed in #17, but we can always to this in a follow up PR.)

kjetilkjeka mentioned this issue Apr 28, 2018

Should there be benchmarking tests as part of the test suite? #17

Open

phil-opp closed this as completed Apr 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance issue #15

performance issue #15

imhun commented Apr 28, 2018 •

edited

Loading

kjetilkjeka commented Apr 28, 2018 •

edited

Loading

kjetilkjeka commented Apr 28, 2018

kjetilkjeka commented Apr 28, 2018

phil-opp commented Apr 28, 2018

kjetilkjeka commented Apr 28, 2018 •

edited

Loading

imhun commented Apr 29, 2018

phil-opp commented Apr 29, 2018

performance issue #15

performance issue #15

Comments

imhun commented Apr 28, 2018 • edited Loading

kjetilkjeka commented Apr 28, 2018 • edited Loading

kjetilkjeka commented Apr 28, 2018

kjetilkjeka commented Apr 28, 2018

phil-opp commented Apr 28, 2018

kjetilkjeka commented Apr 28, 2018 • edited Loading

imhun commented Apr 29, 2018

phil-opp commented Apr 29, 2018

imhun commented Apr 28, 2018 •

edited

Loading

kjetilkjeka commented Apr 28, 2018 •

edited

Loading

kjetilkjeka commented Apr 28, 2018 •

edited

Loading