Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise Rng::gen on arrays? #269

Closed
dhardy opened this issue Feb 23, 2018 · 11 comments
Closed

Optimise Rng::gen on arrays? #269

dhardy opened this issue Feb 23, 2018 · 11 comments
Labels
C-optimisation P-low Priority: Low P-postpone Waiting on something else

Comments

@dhardy
Copy link
Member

dhardy commented Feb 23, 2018

I noticed that Rng::gen and Rng::fill don't do the same thing on arrays and have different results when the element size is small:

  • rng.gen() essentially does [rng.gen(), rng.gen(), ..., rng.gen()]
  • rng.fill(&mut buf) casts buf to &mut [u8] then uses RngCore::fill_bytes — more efficient, and only implemented for integer arrays/slices

We likely want to keep the current behaviour for rng.gen() when the element type is f32/f64/complex types, but for integer types can we optimise to use fill_bytes?

@clarfonthey
Copy link

Perhaps fill and try_fill should only work on byte slices and other arrays can just use distributions?

@pitdicker
Copy link
Contributor

Perhaps fill and try_fill should only work on byte slices and other arrays can just use distributions?

That is how they work right now 😄.

This issue asks the question if we can/should optimise the rng.gen() implementation for integer arrays to also use fill like it is a slice.

@dhardy
Copy link
Member Author

dhardy commented Mar 8, 2018

Actually fill also works on integer slices/arrays; it's fill_bytes which is restricted to byte slices.

The point is that you can write let x: [u32; 32] = rng.gen(); in the current rand, but it would be much faster to write the following (and also have different result, but statistically equivalent):

let mut x = [0u32; 32];
rng.fill(&mut x);

@dhardy
Copy link
Member Author

dhardy commented Jun 2, 2018

Well, the current gen() (aka Standard) implementation for arrays is actually very fast:

test gen_1k_fill                     ... bench:         377 ns/iter (+/- 16) = 2716 MB/s
test gen_1k_gen_array                ... bench:         261 ns/iter (+/- 5) = 3923 MB/s
test gen_1k_gen_iter                 ... bench:         633 ns/iter (+/- 20) = 1617 MB/s
test gen_1k_iter_repeat              ... bench:         383 ns/iter (+/- 11) = 2673 MB/s
test gen_1k_sample_iter              ... bench:         390 ns/iter (+/- 8) = 2625 MB/s

... for u64 elements, at least.

I found a way to make gen() use fill on arrays:

// Specialized implementation:
impl<T> Distribution<T> for Standard where T: AsByteSliceMut+Default {
    #[inline]
    fn sample<R: Rng + ?Sized>(&self, _rng: &mut R) -> T {
        let mut result: T = Default::default();
        _rng.fill(&mut result);
        result
    }
}

The problems are (1) that it conflicts with other implementations (even using #![feature(specialization)] on nightly, all versions I've tried appear to conflict), and (2) that for u64 at least this version actually hurts performance.

If we can sort out the specialisation (when Rust supports it on stable, or alternatively negative trait bounds), and we restrict to small types (u8, i8, probably also u16 and i16), then this may work:

// within array_impl macro:
        // Specialized implementation:
        impl<u8> Distribution<[u8; $n]> for Standard where [u8; $n]: AsByteSliceMut+Default {
            #[inline]
            fn sample<R: Rng + ?Sized>(&self, _rng: &mut R) -> [u8; $n] {
                let mut result: [u8; $n] = Default::default();
                _rng.fill(&mut result);
                result
            }
        }

For now, I believe the best option is to do nothing (we can actually implement this instead of the generic Distribution<[T; $n]> support, but of course that restricts users since any array type not specifically implemented will cease to be supported — e.g. currently [[u64; 32]; 4] is supported).

@dhardy dhardy added the P-postpone Waiting on something else label Jun 2, 2018
@dhardy dhardy mentioned this issue Jun 27, 2018
28 tasks
@dhardy dhardy mentioned this issue Oct 9, 2018
@dhardy
Copy link
Member Author

dhardy commented Jun 3, 2019

I propose we close this with documentation only (provided in #812) since although using specialisation as above might be possible in the future, value-stability is important and this specialisation would yield different values for many types (with actual behaviour depending on the RNG).

It appears that specialization will not be available in the near future: tracking issue. Would we want to make a value-breaking change when it becomes available?

Also, any such implementation would have to make a fixed choice about which types to use the fill "optimisation" for, while the fastest method may depend on RNG and platform.

@vks
Copy link
Collaborator

vks commented Jun 3, 2019

This might be a better option:

diff --git a/src/distributions/other.rs b/src/distributions/other.rs
index 2295f79..883b332 100644
--- a/src/distributions/other.rs
+++ b/src/distributions/other.rs
@@ -137,10 +137,13 @@ macro_rules! array_impl {
     {$n:expr, $t:ident, $($ts:ident,)*} => {
         array_impl!{($n - 1), $($ts,)*}
 
-        impl<T> Distribution<[T; $n]> for Standard where Standard: Distribution<T> {
+        impl<T> Distribution<[T; $n]> for Standard
+        where Standard: Distribution<T>, [T]: crate::AsByteSliceMut, T: Default {
             #[inline]
-            fn sample<R: Rng + ?Sized>(&self, _rng: &mut R) -> [T; $n] {
-                [_rng.gen::<$t>(), $(_rng.gen::<$ts>()),*]
+            fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> [T; $n] {
+                let mut a = Default::default();
+                rng.fill(&mut a);
+                a
             }
         }
     };

The Distribution<T> requirement is not needed, but it avoids implementing the trait for potentially unwanted types. This does not work for [[u64; 32]; 4], but arguably that is not very important and could be fixed by implementing AsByteSliceMut for nested arrays.

Of course, this is still a regression for types that don't implement AsByteSliceMut.

@dhardy
Copy link
Member Author

dhardy commented Jun 3, 2019

This doesn't work for many types, e.g. [f32; 3]. Is byte-buffer gen performance more important than being able to generate arrays of other types?

@vks
Copy link
Collaborator

vks commented Jun 3, 2019

Fair enough. However, it would be nice if fill could support floats. I think it is fine to close this issue, following your argumentation in #269 (comment).

@dhardy
Copy link
Member Author

dhardy commented Jun 3, 2019

Perhaps we got the design of AsByteSliceMut wrong), and instead we should have:

pub trait Fill {
    fn try_fill<R: Rng + ?Sized>(&mut self, rng: &mut R) -> Result<(), Error>;
}

With that, we could implement support for float slices; also we could support gen for any type supporting Fill + Default (so long as we avoided overlapping implementations) which would allow this optimisation and make arr = rng.gen() and rng.fill(&mut arr) value-equivalent on arrays.

Possibly this was not done because fill_bytes and try_fill_bytes are not guaranteed to be semantically equivalent, so correct implementation of both would require two implementations (at least, for everything using byte filling internally). Now that we have a simpler Error type it is hard to see try_fill_bytes(buf).unwrap() being different to fill_bytes(buf), although technically it's still possible.

@vks
Copy link
Collaborator

vks commented Jun 3, 2019

I agree that Fill seems like the better abstraction. Maybe it's better to do that for 0.8 though.

I would not worry too much about fill_bytes vs. try_fill_bytes, we have similar problems with next_u32 and next_u64.

@vks
Copy link
Collaborator

vks commented Sep 15, 2021

Fill is now implemented and documented, including support for floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-optimisation P-low Priority: Low P-postpone Waiting on something else
Projects
None yet
Development

No branches or pull requests

4 participants