Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Faster Vectorized Counter-Based RNG #26472

Closed
vxst opened this issue May 18, 2024 · 7 comments
Closed

ENH: Faster Vectorized Counter-Based RNG #26472

vxst opened this issue May 18, 2024 · 7 comments

Comments

@vxst
Copy link
Contributor

vxst commented May 18, 2024

Proposed new feature or change:

I wrote a random generator utilizing VAES and AVX512 (and AVX2/SSE). The principle is based on the "random123" paper, similar to Philox, but with a larger state (512-bit key and 512-bit counter). On a platform supporting AVX2 (Broadwell or later, which was introduced 10 years ago), it's faster than SFC64 in my benchmarks. It's even faster on platforms supporting VAES.

This will provide users with the performance of SFC64 while having the benefits of a counter-based RNG, such as jumping and a guaranteed 2^256 - 1 period, as the map from the counter to the buffer with the same key is one-to-one.

An implementation of this idea can be found at https://github.com/vxst/qrand. This implementation is not currently in a counter-based mode because I reused the key/buffer to save memory (which is not really required, and I'm in the process of changing it to a counter-based model and running BigCrush).

Is NumPy open to a new random bit generator? I'll make a PR if it's possible to add a new bit generator to the NumPy library (perhaps named Phx512?).

@charris
Copy link
Member

charris commented May 18, 2024

Might take a look at https://github.com/bashtage/randomgen. @bashtage Thoughts?

@vxst
Copy link
Contributor Author

vxst commented May 18, 2024

randomgen library only utilizes full AES to generate crypto-level random numbers, which I believe is a little overkill. Perhaps it's more on "peace of mind" rather than raw performance? I don't know why ARS-7 is not included, it seems to be more than sufficient for any simulation purposes, and is the recommended RNG for CPUs with AES instruction by random123, which is also the paper first proposed Philox. ARS-5 is already crush-resistant. For security, people should turn to /dev/random or OpenSSL.

For my qrand, 3 rounds are crush-resistant with a balanced (randomly generated, or I can perhaps use a bitstream from something like PI, to help explain why chose the specific value, actually just want something not 0x000001, which is too sparse and requires more rounds to balance, e.g., ARS needs 5) addition factor between rounds. However, plain ARS-7, as described by the random123 paper, might be more popular/compatible. On the safe side, perhaps I can implement an ARS-7 with AES-NI / VAES and AVX / SSE first.

On anything that supports AES-NI (or AESE, with support from ARMv8), ARS will be much faster than Philox. The AES round function is actually a hardware-accelerated, battle-proven entropy re-mix tool. Since it's widely adopted (even Rosetta 2 on Mac supports AES-NI), I believe it's a good idea to use it.

@vxst
Copy link
Contributor Author

vxst commented May 18, 2024

Might take a look at https://github.com/bashtage/randomgen. @bashtage Thoughts?

And it's not vectorized, since most resource heavy usage for random would be generating a big array of random numbers, I believe vectorize will prove some if not great performance enhancement.

@vxst
Copy link
Contributor Author

vxst commented May 18, 2024

I believe randomgen is the best library on experimenting different kinds of random generators, but I don't know why @bashtage not include ARS-7 in it(perhaps I missed something like some paper on bad ARS performance?). With a lot of different generators, it's reasonable not to vectorize everything. But perhaps we can include a vectorized ARS in NumPy since it may great benefit users who needs a lot of random numbers, e.g. initialize matrix for deep learning.

@bashtage
Copy link
Contributor

The bit generators provided in randomgen can be used with NumPy Generator to produce random variates using low level code paths. I think the general feeling in NumPy is that the standard for including a new bit generator is very, very high (e.g., he scientific community has coalesced around a particular source of pseudo random ints like it did with Mersennse Twister a could of decades ago). You could submit this as a bit generator to Randomgen.

@vxst
Copy link
Contributor Author

vxst commented May 18, 2024

The bit generators provided in randomgen can be used with NumPy Generator to produce random variates using low level code paths. I think the general feeling in NumPy is that the standard for including a new bit generator is very, very high (e.g., he scientific community has coalesced around a particular source of pseudo random ints like it did with Mersennse Twister a could of decades ago). You could submit this as a bit generator to Randomgen.

Hi @bashtage, I'll open a PR to add my qrand to randomgen. For NumPy, perhaps ARS-7(or 5?) would be better since it has been proposed for a very long time and should be proven. Is there any reasonrandomgen doesn't include ARS? It seems to be the fastest crush resistant counter-based RNG out here, should be comparable to SFC64 in the matter of generation speed.

@vxst
Copy link
Contributor Author

vxst commented May 18, 2024

Will work on add ARS and qrand to randomgen first

@vxst vxst closed this as completed May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants