-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add randbytes() method to random.Random #84466
Comments
The random module lacks a getrandbytes() method which leads developers to be creative how to generate bytes: It's a common use request:
Python already has three functions to generate random bytes:
These 3 functions are based on system entropy and they block on Linux until the kernel collected enough entropy: PEP-524. While many users are fine with these functions, there are also use cases for simulation where the security doesn't matter, and it's more about being able to get reproducible experience from a seed. That's what random.Random is about. The numpy module provides numpy.random.bytes(length) function for such use case: One example can be to generate UUID4 with the ability to reproduce the random UUID from a seed for testing purpose, or to get reproducible behavior. Attached PR implements the getrandbytes() method. |
If we have to have this, the method name should be differentiated from getrandbits() because the latter returns an integer. I suggest just random.bytes(n), the same as numpy.
Now, there will be four ;-) |
The problem with this is that people who (The metaproblem is of course that some functions already do the "poor man's namespacing" in C-style by starting with |
Do you have another name suggestion that doesn't have a parallelism problem with the existing name? The names getrandbytes() and getrandbits() suggest a parallelism that is incorrect. |
I think that the "module owner";-P must decide whether the |
I concur that bytes() isn't a good name, but am still concerned that the proposed name is a bad API decision. |
Maybe randbytes()? |
I like "from random import randbytes" name. I concur that "from random import bytes" overrides bytes() builtin type and so can likely cause troubles. |
I updated my PR to rename the method to randbytes(). |
The performance of the new method is not my first motivation. My first motivation is to avoid consumers of the random to write a wrong implementation which would be biased. It's too easy to write biased functions without notifying. Moreover, it seems like we can do something to get reproducible behavior on different architectures (different endianness) which would also be a nice feature. For example, in bpo-13396, Amaury found this two functions in the wild:
As I wrote, users are creative to workaround missing features :-) I don't think that these two implementations give the same result on big and little endian. |
All Random methods give the same result independently of endianess and bitness of the platform.
The second one does. |
I wrote a quick benchmark: import pyperf
import random
gen = random.Random()
# gen = random.SystemRandom()
gen.seed(850779834)
if 1: #hasattr(gen, 'randbytes'):
func = type(gen).randbytes
elif 0:
def py_randbytes(gen, n):
data = bytearray(n)
i = 0
while i < n:
chunk = 4
word = gen.getrandbits(32)
word = word.to_bytes(4, 'big')
chunk = min(n, 4)
data[i:i+chunk] = word[:chunk]
i += chunk
return bytes(data)
func = py_randbytes
else:
def getrandbits_to_bytes(gen, n):
return gen.getrandbits(n * 8).to_bytes(n, 'little')
func = getrandbits_to_bytes
runner = pyperf.Runner()
for nbytes in (1, 4, 16, 1024, 1024 * 1024):
runner.bench_func(f'randbytes({nbytes})', func, gen, nbytes) Results on Linux using gcc -O3 (without LTO or PGO) using the C randbytes() implementation as the reference: +--------------------+-------------+----------------------------------+-------------------------------+
So well, the C randbytes() implementation is always the fastest. random.SystemRandom().randbytes() (os.urandom(n)) performance using random.Random().randbytes() (Mersenne Twister) as a reference: +--------------------+-------------+---------------------------------+ os.urandom() is way slower than Mersenne Twister. Well, that's not surprising: os.urandom() requires at least one syscall (getrandom() syscall on my Linux machine). |
The randbytes() method needs to depend on genrandbits(). It is documented that custom generators can supply there own random() and genrandbits() methods and expect that the other downstream generators all follow. See the attached example which demonstrates that randbytes() bypasses this framework pattern. Also, I don't want randbytes() in the C extension. We're tried to keep as much of the code as possible in pure Python and only have the MersenneTwister specific code in the C module. The improves maintainability and makes the code more accessible to a broader audience. Also, please don't change the name of the genrand_int32() function. It was a goal to change as little as possible from the official, standard version of the C code at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html . For the most part, we just want to wrap that code for Python bindings, but not modify it. |
Direct link to MT code that I would like to leave mostly unmodified: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c |
When a new method gets added to a module, it should happen in a way that is in harmony with the module's design. |
I created bpo-40346: "Redesign random.Random class inheritance". |
$ ./python -m timeit -s 'import random' 'random.randbytes(10**6)'
200 loops, best of 5: 1.36 msec per loop
$ ./python -m timeit -s 'import random' 'random.getrandbits(10**6*8).to_bytes(10**6, "little")'
50 loops, best of 5: 6.31 msec per loop The Python implementation is only 5 times slower than the C implementation. I am fine with implementing randbytes() in Python. This would automatically make it depending on the getrandbits() implementation. |
Raymond:
I don't see how 30 lines makes Python so harder to maintain. These lines make the function 4x to 5x faster. We are not talking about 5% or 10% faster. I think that such optimization is worth it. When did we decide to stop optimizing Python? Raymond:
I created bpo-40346: "Redesign random.Random class inheritance" for a more generic fix, not just randbytes(). Raymond:
This code was already modified to replace "unsigned long" with "uint32_t" for example. I don't think that renaming genrand_int32() to genrand_uint32() makes the code impossible to maintain. Moreover, it seems like http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html was not updated for 13 years. |
Raymond:
I created PR 19700 which allows to keep the optimization (C implementation in _randommodule.c) and Random subclasses implement randbytes() with getrandbits(). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: