Meta: add bit / bit-range sampling to RngCore #1147

dhardy · 2021-07-16T09:18:28Z

This is a topic that has come up a few times and deserves its own issue. Related:

Consider Fast Loaded Dice Roller for WeightedIndex #1014 mentions an algorithm requiring bit-level sampling for performance; some discussion of how to achieve this
Add next_bool method to RngCore and counter levels to BlockRng #1031 is a WIP implementation plus some discussion

The text was updated successfully, but these errors were encountered:

dhardy · 2021-07-16T09:22:34Z

Summarising from #1014:

[@fsaad] Does your API For Rng support efficient and lazy production of random bits one at a time ...

[myself] Good question. No. Theoretically it could, even using different approaches depending on the RNG, but we chose not to do that. For simpler PRNGs we use no buffer at all and for block-based PRNGs we use a block which we read off one word (32-bit) at a time.

Another factor to consider (for both mutable distribution and buffered RNGs) is reproducible behaviour. If we allowed out-of-order usage of random bits/bytes it becomes much harder to reason about and especially document the behaviour, thus making reproducibility hard. If we don't allow this, then every read has to either be compatible with misaligned buffers (slow) or check and clear up to the next alignment point (also slow). At this point we're no longer optimising one RNG+distribution pair but expected usage of the RNG across a number of distributions.

I suppose we could side-step that problem by having a special RNG optimised for bit- or byte-sized reads, though that's another step in complexity.

[@newpavlov] I think we can use a single counter for used bits, which would be incremented by 8 for read byte, 32 for u32, etc. We would need to perform additional computations to correctly align read point (e.g. if index is equal to 9, we would have to align it to 16 for reading byte and to 64 for reading u32). It would degrade performance a bit, but I am not sure if it will be noticeable.

As an advantage this approach will be less wasteful (and thus more performant) for users using primarily fill_bytes. But it would change behavior of BlockRng between releases, which may create problems for users who rely on reproducibility.

dhardy · 2021-07-16T09:23:53Z

@riking posted:

Re compromising performance on pure u64: I think you can reduce that cost to a single branch.

The algorithm consumes the random numbers bit-by-bit.
[...] reproducible behavior

First, we discard this as a design constraint when the application is making RNG dependent range requests. I do not believe any application should expect seeded RNG stability under a reordering of range request sizes.

Deterministic behavior for a seeded RNG under a deterministic request stream should be preserved.

(Note: it is theoretically possible for this to weaken the stream somehow, when considered as a complete system with the application. Probably a fun research paper or game speedrun investigation for someone.)

The "one branch" algorithm is then simple: for any "large enough" requests, bypass or discard the buffer.

Any requests for over half the source RNG bit length (>32.0 bits) should bypass the buffer, as they are unlikely to be followed by small enough requests. Note that 63 and 53 bits are common request lengths; there is little to no profit to be had in attempting to merge leftover bits there.

Any requests for more bits than are contained in the buffer (e.g. buffer has 2 bits remaining, request wants 6) should discard the remaining buffer instead of attempting to merge the leftovers.

dhardy · 2021-07-16T09:28:45Z

@riking so your suggestion is essentially to add a method like this?

pub trait RngCore {
    // existing methods here

    /// Request random bits
    ///
    /// If this method is used, all guarantees regarding reproducibility are void.
    fn fill_bits(&mut self, buf: &mut [u8], n_bits: usize) {
        // Default impl over other methods for backwards compatibility;
        // implementors may provide a faster impl as suggested over a buffer
    }

(Or even add a method fn bit_buffer(&mut self) -> Option<&mut u8> { None } which fill_bits uses, relying heavily on inlining for performance.)

This may be worth investigating, though I'm a little sceptical of the performance and not entirely keen on the additional complexity of the API.

dhardy · 2022-10-15T08:58:15Z

I'm going to close this:

It adds complexity to the API
Small RNGs would have to choose between a naive (not useful) implementation of this API, or adding a buffer and adding complexity (and likely overhead) to their other methods
If someone were really interested, they should implement a demo showing the (performance) justification for this change.
Likely even if this approach would be useful, it only would be in niche applications, and elsewhere would just be overhead, therefore does not make sense for the baseline.

dhardy mentioned this issue Jul 16, 2021

Consider Fast Loaded Dice Roller for WeightedIndex #1014

Open

dhardy closed this as completed Oct 15, 2022

dhardy mentioned this issue Oct 15, 2022

Tracker: proposed RngCore changes #1261

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta: add bit / bit-range sampling to RngCore #1147

Meta: add bit / bit-range sampling to RngCore #1147

dhardy commented Jul 16, 2021

dhardy commented Jul 16, 2021

dhardy commented Jul 16, 2021

dhardy commented Jul 16, 2021

dhardy commented Oct 15, 2022

Meta: add bit / bit-range sampling to RngCore #1147

Meta: add bit / bit-range sampling to RngCore #1147

Comments

dhardy commented Jul 16, 2021

dhardy commented Jul 16, 2021

dhardy commented Jul 16, 2021

dhardy commented Jul 16, 2021

dhardy commented Oct 15, 2022