Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issue #15

Closed
imhun opened this issue Apr 28, 2018 · 7 comments
Closed

performance issue #15

imhun opened this issue Apr 28, 2018 · 7 comments

Comments

@imhun
Copy link
Contributor

imhun commented Apr 28, 2018

Hi,On my mac book pro (8c 16g), the library has a large performance difference compared to the code I manually implement.
I read the library implementation code, logically it should be the same, but why is there such a big difference in performance?
I didn't read all the library code carefully, could you please give me some advice on whether my test method is wrong or the code of the library itself?
rustc 1.27.0-nightly,the bench result:

running 4 tests    
test get_bitfield ... bench:      39,371 ns/iter (+/- 2,489)
test get_trivial  ... bench:         901 ns/iter (+/- 102)
test set_bitfield ... bench:      39,439 ns/iter (+/- 779)
test set_trivial  ... bench:       2,752 ns/iter (+/- 76)

my trivial code and bench code:

#![feature(test)]

extern crate bit_field;

use bit_field::*;

pub trait BitOper {
    fn get_b(&self, idx: usize) -> bool;
    fn set_b(&mut self, idx: usize, val: bool);
    fn toggle(&mut self, idx: usize);
}

impl BitOper for u8 {
    fn set_b(&mut self, idx: usize, val: bool) {
        assert!(idx < 8);
        if val {
            *self |= 1 << idx;
        } else {
            *self &= !(1 << idx);
        }
    }

    fn get_b(&self, idx: usize) -> bool {
        assert!(idx < 8);
        (self & 1 << idx) != 0
    }

    fn toggle(&mut self, idx: usize) {
        assert!(idx < 8);
        *self ^= 1 << idx;
    }
}

impl BitOper for Vec<u8> {
    fn get_b(&self, idx: usize) -> bool {
        self[idx / 8].get_b((idx % 8))
    }

    fn set_b(&mut self, idx: usize, val: bool) {
        self[idx / 8].set_b((idx % 8), val);
    }

    fn toggle(&mut self, idx: usize) {
        self[idx / 8].toggle((idx % 8));
    }
}

extern crate test;

use test::Bencher;

const len: usize = 256;

#[bench]
fn set_bitfield(b: &mut Bencher) {
    let mut v = vec![0u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            v.as_mut_slice().set_bit(i, true);;
        }
    });
}

#[bench]
fn set_trivial(b: &mut Bencher) {
    let mut v = vec![0u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            v.set_b(i, true);
        }
    });
}

#[bench]
fn get_bitfield(b: &mut Bencher) {
    let mut v = vec![1u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            let b = v.as_mut_slice().get_bit(i);
        }
    });
}

#[bench]
fn get_trivial(b: &mut Bencher) {
    let mut v = vec![1u8; len];

    b.iter(|| {
        for i in 0..v.len() * 8 {
            let b = v.get_b(i);
        }
    });
}
@kjetilkjeka
Copy link
Contributor

kjetilkjeka commented Apr 28, 2018

I get similar results on my work computer (Ubuntu 16.04, i7-8700k)

running 4 tests
test get_bitfield ... bench:      25,142 ns/iter (+/- 719)
test get_trivial  ... bench:         676 ns/iter (+/- 18)
test set_bitfield ... bench:      26,329 ns/iter (+/- 1,106)
test set_trivial  ... bench:       1,800 ns/iter (+/- 63)

@kjetilkjeka
Copy link
Contributor

The problem seems to be a mix between inlining and const evaluation. I have produced a patch with the following benchmarks:

running 4 tests
test get_bitfield ... bench:         510 ns/iter (+/- 16)
test get_trivial  ... bench:         681 ns/iter (+/- 24)
test set_bitfield ... bench:       1,814 ns/iter (+/- 59)
test set_trivial  ... bench:       1,803 ns/iter (+/- 60)

PR and a couple of new issues inc

@kjetilkjeka
Copy link
Contributor

So it looks like @phil-opp already have fixed half the problem (const stuff) in #13

The other half of the problem is that rust will not inline stuff between crate boundaries unless functions are marked with #[inline] or lto is used (allegedly, not tried myself).

After #13 and #16 is addressed i will suggest inlining relevant methods in BitField and BitArray.

@phil-opp
Copy link
Owner

Thanks for reporting and investigating! Seems like we should mark the relevant functions with #[inline].

@kjetilkjeka
Copy link
Contributor

kjetilkjeka commented Apr 28, 2018

BitField and BitArray methods are now similar to the "trivial" functions (on my comp). @huwsun you should try a re-run with the master branch to confirm this works for you as well.

Here are the benchmarks:

running 4 tests
test get_bitfield ... bench:         509 ns/iter (+/- 14)
test get_trivial  ... bench:         678 ns/iter (+/- 171)
test set_bitfield ... bench:       1,802 ns/iter (+/- 94)
test set_trivial  ... bench:       1,793 ns/iter (+/- 49)

@imhun
Copy link
Contributor Author

imhun commented Apr 29, 2018

Good job! Now the result on my benchmark:

running 4 tests
test get_bitfield ... bench:         686 ns/iter (+/- 48)
test get_trivial  ... bench:         672 ns/iter (+/- 45)
test set_bitfield ... bench:       2,716 ns/iter (+/- 66)
test set_trivial  ... bench:       2,787 ns/iter (+/- 263)

@phil-opp
Copy link
Owner

Awesome, thanks so much for your work @kjetilkjeka! Seems like we can close this issue then.

It would be great these benchmarks to the repository. Would you be willing to open a pull request, @huwsun? (I also like the idea of adding a CI test that fails on huge regressions like proposed in #17, but we can always to this in a follow up PR.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants