Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating random unique integers (i.e. non-repeating, random permutations of an integer type) #270

Closed
cipriancraciun opened this issue Feb 25, 2018 · 6 comments
Labels
P-low Priority: Low

Comments

@cipriancraciun
Copy link

Sometimes one needs a source of "random looking" non-repeating integers (non repeating until all the possible values have been generated). In concrete terms the simplest such generator (without the "random looking" property) is a simple counter which generates the values min .. max. Another concrete example would be generating a random permutation of the min .. max range and iterating through that. (However the first is not "random looking" , and the second is not efficient.)

What would be the use-case for such a generator?

  • generating integer "handles" for various objects; although one could use plain counters, they have a few weaknesses: they are extremely "deterministic" and thus guessable, and they are not easily "seen" in logs (imagine trying to visually look through a log which displays many such handles and you're searching for handle "272132", but it appears between other handles in the range 272090 .. 272180; moreover using a find functionality you can easily miss-type 272123 and you'll find the wrong handle);

  • generating small unique identifiers to be used as keys in a map-like structure, identifiers that disperse well and which guarantee that you don't need to check if that key already exists; for example say that we need to generate files in a folder tree with the pattern 01/23/4567 (where 0..7 are the bytes in a u64 value); using an "standard" random generator we'll have to loop in a retry; (or an INSERT in an SQL database with a small primary key, where an error would abort the transaction;)


I did find some implementations for such a generator:

I have even re-implemented the last one in Rust (but only for u32):


However it would be lovely if such a generator would exist in the rand library for the integer types (at least u16, u32, u64 and u128).

The only requirements for such a generator would be:

  • be seedable with an actual "seed";
  • be seedable with an offset in the generated sequence; (i.e. one can start at any point in the "permutation" that would be generated, thus one could store the offset and seed on disk when the application stops, and resume from there when it starts again;)
  • provide a way to access the current offset and seed;

As optional requirements or observations:

  • provide a way to signal that the sequence would "restart" (after having generated all the possible values;)
  • I don't think it's required to generate all possible values; for example the generator could "skip" some of the possible values (perhaps due to the fact that the mathematical inner-workings don't allow it);

I would provide an implementation but I don't have the "mathematical" knowledge to implement or prove them. However if someone points me to such algorithms I could provide the implementation in Rust.

Hope this "feature request" is useful for someone!

@dhardy
Copy link
Member

dhardy commented Feb 25, 2018

I believe Xoroshiro is also capable of generating such distributions (see also what this post says about Xoroshiro).

We thought a lot about error handling, and reporting via both panic and Result is possible (or will be in the next rand version).

Since these generators are not ideal general-purpose random number generators I would prefer not adding them directly to the rand crate, though we could eventually add a sub-crate for such generators, e.g. rand-nonrepeating. Unless someone else steps up and does the work this is going to be low priority, however.

For now I would suggest implementing a stand-alone crate and implementing Rng for your generators.

@pitdicker
Copy link
Contributor

Thanks @cipriancraciun for all the links!

Generating random non-repeating integers reminds me a little of dhardy#82 (comment). We seem to have two options:

  • Track which values are generated, and only return a value in the target range if it is new. This is already implemented in the seq module. And for small ranges shuffle is a very good option
  • Use some LCG-based PRNG that generates values in the range rounded up to the next power of two, and then reject all values that are too large.

Working out that last point shouldn't be too hard (I could give pointers). I am not sure how to expose it in a nice API, but do think it would be a useful addition for rand. It would break our nice separation between PRNG's and distributions / helper functions, but it just seems practical.

I believe Xoroshiro is also capable of generating such distributions

It could, but not easily. It works best with integer sizes. Otherwise we have to mask out bits between at least all three xor-shift steps. And every different number of bits needs three different shift constants. They are not easy to calculate, so we would have to include a list of constants for every number of bits we want to generate.

@pitdicker
Copy link
Contributor

A few of the points you mentioned would be possible, but quite a bit more complex:

  • be seedable with an offset in the generated sequence; → needs jumping
  • provide a way to access the current offset and seed; → PCG should have some code for this..., otherwise use a counter
  • provide a way to signal that the sequence would "restart". → This would require a counter

@vks
Copy link
Collaborator

vks commented Mar 6, 2018

The property you are looking for is usually called equidistribution. You could use a linear congruential generator with a full period, the Mersenne Twister or xorshift/xoroshiro. You might have to adapt the standard implementations to have the period you want as suggested in the other comments.

@hoxxep
Copy link

hoxxep commented Nov 2, 2023

For anyone for whom this might help, I've recently published a rand-unique crate to generate random sequences of unique integers.

It's based on phreshing's unique prng as linked by OP, from which I have built and used a C++ implementation of successfully in the past. This Rust library is using the same logic with support for the common unsigned integer types (u8, u16, u32, u64, and usize).

This is a very simple crate making use of hardcoded primes, quadratic residue, and xor to provide the pseudo-randomness.

My current thought is that a "random unique sequence" is conceptually different enough from a traditional prng that the interfaces should be slightly different. There is naturally an index and an end to a sequence, where there isn't with a prng. The sequence can wrap of course, but often the user might prefer the sequence to terminate.

This is hopefully a decent stab at the beginnings of a random unique sequence. I'm happy to extend the functionality as desired, and any feedback on the interface or algorithm would be welcome. I'm open to doing the work to make rand-unique suitable for merging into the rand crate too if that's a desirable outcome.

I'm currently looking at handling arbitrary length sequences (eg. of length 5 or 1000, instead of only u8/256, u16/65536 etc.). This would be useful for generating permutations of arbitrarily sized collections/slices, or building permuting indexes/views into them. It's taking a bit more thought to make the algorithm suitable for small sequences though. Again, suggestions welcome!

@dhardy
Copy link
Member

dhardy commented Nov 3, 2023

Another approach (but definitely not O(1) and probably only for small numbers of values) is to use choose_multiple.

Thanks all for the thoughts, but I don't think this is anything we want to add to rand, so closing.

@dhardy dhardy closed this as completed Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-low Priority: Low
Projects
None yet
Development

No branches or pull requests

5 participants