Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new method random.getrandbytes() #57605

Closed
amauryfa opened this issue Nov 13, 2011 · 9 comments
Closed

new method random.getrandbytes() #57605

amauryfa opened this issue Nov 13, 2011 · 9 comments
Assignees
Labels
type-feature A feature request or enhancement

Comments

@amauryfa
Copy link
Member

BPO 13396
Nosy @rhettinger, @jcea, @amauryfa, @pitrou
Files
  • getrandbytes.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2013-07-12.06:33:29.826>
    created_at = <Date 2011-11-13.16:43:41.024>
    labels = ['type-feature']
    title = 'new method random.getrandbytes()'
    updated_at = <Date 2013-07-12.06:33:29.824>
    user = 'https://github.com/amauryfa'

    bugs.python.org fields:

    activity = <Date 2013-07-12.06:33:29.824>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2013-07-12.06:33:29.826>
    closer = 'rhettinger'
    components = []
    creation = <Date 2011-11-13.16:43:41.024>
    creator = 'amaury.forgeotdarc'
    dependencies = []
    files = ['23682']
    hgrepos = []
    issue_num = 13396
    keywords = ['patch']
    message_count = 9.0
    messages = ['147557', '147562', '147567', '147569', '147570', '147571', '147573', '147583', '192924']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'jcea', 'amaury.forgeotdarc', 'pitrou', 'nadeem.vawda']
    pr_nums = []
    priority = 'low'
    resolution = 'rejected'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13396'
    versions = ['Python 3.3']

    @amauryfa
    Copy link
    Member Author

    I noticed that several usages of random.getrandbits() actually need bytes. A few examples:

    • Lib/test/test_zlib.py calls "random.getrandbits(8 * _1M).to_bytes()"
    • Twisted uses the %x format and then call .decode('hex')
      Another examples found with Code Search:
    • struct.pack("Q", random.getrandbits(64))
    • for i in range(8): ClientChallenge+= chr(random.getrandbits(8))
    • key = sha1(str(random.getrandbits(8*20))).digest()

    random.getrandbits() itself is not a cheap call: it ends with a call to _PyLong_FromByteArray, and transformation to bytes will involve more copy, conversions from 30bit digits (or 15bit digits) to bytes, or worse.

    This patch adds random.getrandbytes(), which creates a bytes string directly from calls to genrand_int32().

    @amauryfa amauryfa added the type-feature A feature request or enhancement label Nov 13, 2011
    @pitrou
    Copy link
    Member

    pitrou commented Nov 13, 2011

    Good idea, IMO.

    + cum = set()
    + for i in range(100):
    + val = getbytes(span)
    + cum |= set(i for i in range(span) if val[i])
    + self.assertEqual(len(cum), span)

    I find this test a bit strange. Also, perhaps it should be bitwise rather than bytewise.

    @rhettinger rhettinger self-assigned this Nov 13, 2011
    @rhettinger
    Copy link
    Contributor

    How would this work for other random number generators that don't supply genrand_int32()?

    The API for random is supposed to be easily subclassable and reusable for other random number generators. The requirements for those generators is intentionally kept minimal (i.e. they only need to supply random(), seed(), getstate() and setstate() and optionally getrandbits()).

    I'm -0 on making this API fatter.

    @amauryfa
    Copy link
    Member Author

    How would this work for other random number generators that don't
    supply genrand_int32()?

    genrand_int32 is an internal function, only available in C for the Mersenne Twister generator.
    random.SystemRandom() should provide getrandbytes as well, it would just call os.urandom(); I don't know other generators.

    @rhettinger
    Copy link
    Contributor

    genrand_int32 is an internal function,
    only available in C for the Mersenne Twister generator.

    Yes, I know. I'm the one added that code ;-)

    I don't know other generators.

    The problem is that we need an API that will accommodate other random number generators and not be specific to the MersenneTwister. Right now, the starting point for everything in the random module is an underlying generator supplying a random() method returning a float and an optional getrandombits() method which returns an int (possibly a long int). The latter is easily convertible to bytes with to_bytes() which uses a fast O(n) algorithm. ISTM, the getrandbytes() proposal amounts to a micro-optimization of existing capabilities, and it comes at the expense of fattening the API.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 13, 2011

    The problem is that we need an API that will accommodate other random
    number generators and not be specific to the MersenneTwister. Right
    now, the starting point for everything in the random module is an
    underlying generator supplying a random() method returning a float and
    an optional getrandombits() method which returns an int (possibly a
    long int).

    Well, you can provide a default getrandombytes() which calls into
    getrandombits(), and specialize it in the case genrand_int32() exists.

    The latter is easily convertible to bytes with to_bytes() which uses
    a fast O(n) algorithm.

    Well, O(n) doesn't necessarily equate fast. Can Amaury post benchmark
    numbers of getrandbits().to_bytes() vs getrandbytes()?

    @amauryfa
    Copy link
    Member Author

    ./python -m timeit -s "from random import getrandbits" "getrandbits(8000000).to_bytes(1000000, 'little')"
    10 loops, best of 3: 25 msec per loop

    ./python -m timeit -s "from random import getrandbytes" "getrandbytes(1000000)"
    100 loops, best of 3: 9.66 msec per loop

    For the Mersenne Twister, getrandbytes() is ~2.5 times faster (A length of 1000 gives exactly the same ratio)
    When applied to the SytemRandom object, the difference is less impressive (probably because os.urandom is slow) but getrandbytes is still 20ms faster.

    Updated patch to add getrandbytes at the module level, and to SystemRandom.

    @rhettinger
    Copy link
    Contributor

    The differential cost of generating n random bytes is negligible compared to actually doing anything with the bytes once their generated.

    This optimization is close to being a total waste (saving 15 milliseconds for the abnormal case of generating 1 million random bytes).

    @rhettinger
    Copy link
    Contributor

    Closing for the reasons listed above.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants