Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add choose_multiple_fill for SliceRandom #982

Closed
mendess opened this issue May 27, 2020 · 2 comments
Closed

Add choose_multiple_fill for SliceRandom #982

mendess opened this issue May 27, 2020 · 2 comments

Comments

@mendess
Copy link

mendess commented May 27, 2020

Background

Currently IteratorRandom provides an allocation free way to sample members from it with choose_multiple_fill, problem is it's O(n) over the size of the iterator. This makes sense but then the docs point to SliceRandom::choose_multiple which is O(amount) but it always allocates a new vec of indexes to back the iterator it returns.

I need to sample exactly 2 members of a slice in a loop, it would be nice to have a way to avoid allocating in a loop that doesn't need to as this will be a major performance hit.

Right now I have to do

let (u, w) = slice
    .choose(&mut rng)
    .and_then(|u| loop {
        let w = slice.choose(&mut rng)?;
        if w != u {
            break Some((u, w));
        }
    })?;

which I doubt is a good way to achieve randomness.

Feature request

Add to SliceRandom

fn choose_multiple_fill(
    &self,
    rng: &mut Rng,
    slice: &mut [Self::Item]
) -> usize

I can try to write a PR for this if it seems feasible but I don't have experience with rng, I would look at the rest of the implementations, specially SliceRandom::choose_multiple for inspiration.

@mendess mendess changed the title choose_multiple_fill for SliceRandom Add choose_multiple_fill for SliceRandom May 27, 2020
@dhardy
Copy link
Member

dhardy commented May 27, 2020

which I doubt is a good way to achieve randomness

Rejection sampling is a very good approach in this case (amount = 2). For more general algorithms, take a look at src/seq/index.rs. There is at least one other algorithm that ended up being coded but not included here.

All these have something in common: they need some buffer to store selected indices in. We can't directly use the output buffer, since we need to avoid sampling any index multiple times but multiple copies of an output value may be valid (if appearing in your slice multiple times). So the best we could do without allocating is:

pub fn sample_buf<R>(rng: &mut R, length: usize, buf: &mut [usize])
where R: Rng + ?Sized { ... }

Unfortunately, we'd have to duplicate all the methods in index.rs, which isn't really desirable (though it isn't that much code).

Note further: IndexVec specifically supports output to Vec<u32> as well as the more general Vec<usize>, since for most cases u32 is large enough and the extra memory-usage of usize (on 64-bit platforms) has real performance impacts. You can't do the same with the method above, so it could be slower despite avoiding allocation — although for large amount you probably need to allocate anyway.

@dhardy
Copy link
Member

dhardy commented Jul 20, 2024

This ("sample exactly 2 members of a slice") was implemented in #1453.

@dhardy dhardy closed this as completed Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants