Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: add bit / bit-range sampling to RngCore #1147

Closed
dhardy opened this issue Jul 16, 2021 · 4 comments
Closed

Meta: add bit / bit-range sampling to RngCore #1147

dhardy opened this issue Jul 16, 2021 · 4 comments

Comments

@dhardy
Copy link
Member

dhardy commented Jul 16, 2021

This is a topic that has come up a few times and deserves its own issue. Related:

@dhardy
Copy link
Member Author

dhardy commented Jul 16, 2021

Summarising from #1014:

[@fsaad] Does your API For Rng support efficient and lazy production of random bits one at a time ...

[myself] Good question. No. Theoretically it could, even using different approaches depending on the RNG, but we chose not to do that. For simpler PRNGs we use no buffer at all and for block-based PRNGs we use a block which we read off one word (32-bit) at a time.

Another factor to consider (for both mutable distribution and buffered RNGs) is reproducible behaviour. If we allowed out-of-order usage of random bits/bytes it becomes much harder to reason about and especially document the behaviour, thus making reproducibility hard. If we don't allow this, then every read has to either be compatible with misaligned buffers (slow) or check and clear up to the next alignment point (also slow). At this point we're no longer optimising one RNG+distribution pair but expected usage of the RNG across a number of distributions.

I suppose we could side-step that problem by having a special RNG optimised for bit- or byte-sized reads, though that's another step in complexity.

[@newpavlov] I think we can use a single counter for used bits, which would be incremented by 8 for read byte, 32 for u32, etc. We would need to perform additional computations to correctly align read point (e.g. if index is equal to 9, we would have to align it to 16 for reading byte and to 64 for reading u32). It would degrade performance a bit, but I am not sure if it will be noticeable.

As an advantage this approach will be less wasteful (and thus more performant) for users using primarily fill_bytes. But it would change behavior of BlockRng between releases, which may create problems for users who rely on reproducibility.

@dhardy
Copy link
Member Author

dhardy commented Jul 16, 2021

@riking posted:

Re compromising performance on pure u64: I think you can reduce that cost to a single branch.

The algorithm consumes the random numbers bit-by-bit.
[...] reproducible behavior

First, we discard this as a design constraint when the application is making RNG dependent range requests. I do not believe any application should expect seeded RNG stability under a reordering of range request sizes.

Deterministic behavior for a seeded RNG under a deterministic request stream should be preserved.

(Note: it is theoretically possible for this to weaken the stream somehow, when considered as a complete system with the application. Probably a fun research paper or game speedrun investigation for someone.)


The "one branch" algorithm is then simple: for any "large enough" requests, bypass or discard the buffer.

Any requests for over half the source RNG bit length (>32.0 bits) should bypass the buffer, as they are unlikely to be followed by small enough requests. Note that 63 and 53 bits are common request lengths; there is little to no profit to be had in attempting to merge leftover bits there.

Any requests for more bits than are contained in the buffer (e.g. buffer has 2 bits remaining, request wants 6) should discard the remaining buffer instead of attempting to merge the leftovers.

@dhardy
Copy link
Member Author

dhardy commented Jul 16, 2021

@riking so your suggestion is essentially to add a method like this?

pub trait RngCore {
    // existing methods here

    /// Request random bits
    ///
    /// If this method is used, all guarantees regarding reproducibility are void.
    fn fill_bits(&mut self, buf: &mut [u8], n_bits: usize) {
        // Default impl over other methods for backwards compatibility;
        // implementors may provide a faster impl as suggested over a buffer
    }

(Or even add a method fn bit_buffer(&mut self) -> Option<&mut u8> { None } which fill_bits uses, relying heavily on inlining for performance.)

This may be worth investigating, though I'm a little sceptical of the performance and not entirely keen on the additional complexity of the API.

@dhardy
Copy link
Member Author

dhardy commented Oct 15, 2022

I'm going to close this:

  • It adds complexity to the API
  • Small RNGs would have to choose between a naive (not useful) implementation of this API, or adding a buffer and adding complexity (and likely overhead) to their other methods
  • If someone were really interested, they should implement a demo showing the (performance) justification for this change.
  • Likely even if this approach would be useful, it only would be in niche applications, and elsewhere would just be overhead, therefore does not make sense for the baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant