Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-111881: Import _sha2 lazily in random #111889

Merged
merged 2 commits into from Nov 9, 2023
Merged

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Nov 9, 2023

The random module now imports the _sha2 module laziliy in Random.seed() method for str, bytes and bytearray seeds. Lazy import makes Python startup faster and reduce the number of imported modules at startup.

@AlexWaygood
Copy link
Member

AlexWaygood commented Nov 9, 2023

warnings can also be imported lazily in this module (the cost of importing it is small, but measurable)

The random module now imports the _sha2 module lazily in the
Random.seed() method for str, bytes and bytearray seeds. It also
imports lazily the warnings module in the _randbelow() method for
classes without getrandbits(). Lazy import makes Python startup
faster and reduces the number of imported modules at startup.
@vstinner
Copy link
Member Author

vstinner commented Nov 9, 2023

warnings can also be imported lazily in this module (the cost of importing it is small, but measurable)

Oh right, I updated my PR to also import the warnings module lazily.

It's common in the stdlib to only import warnings when a warning is logged, especially to emit a deprecation warning.

@vstinner
Copy link
Member Author

vstinner commented Nov 9, 2023

This change only impacts uncommon code paths. IMO it's more common to call seed() with an integer, or just not call seed() at all. Also, random.Random subclasses which don't implement getrandbits() should be updated to implement getrandbits(). I suppose that the warnings is a good reminder for that :-)

Note: A few years ago, I proposed adding a BaseRandom class which would make inheritance better defined, but I abandoned this approach: issue gh-84526.

@rhettinger
Copy link
Contributor

+1 for the lazy import of warnings.

-1 for the lazy import of sha import which is already very fast. Also, I've seen production code such as a BloomFilter that would be adversely impacted this because it frequently reseeds with string keys. Seeding with strings isn't rare enough to cripple that code path. It is the easiest way to generate a high quality seed. Oeople do better at thinking up arbitrary words and phrases than they do at choosing an arbitrary numerical seed and the SHA does a good job of extending that choice to a large number of bits and does so in way that is hard to invert.

@vstinner
Copy link
Member Author

vstinner commented Nov 9, 2023

@rhettinger:

-1 for the lazy import of sha import which is already very fast.

Benchmark on moving the import inside seed(): Mean +- std dev: [ref] 5.32 us +- 0.09 us -> [lazy] 5.54 us +- 0.09 us: 1.04x slower. So yes, it has a significant negative impact on performance.

So I added a global variable: with a global variable, there is no impact on performance:

$ env/bin/python -m pyperf compare_to ref.json global.json 
Benchmark hidden because not significant (1): seed

$ env/bin/python -m pyperf compare_to ref.json global.json  -v
Mean +- std dev: [ref] 5.32 us +- 0.09 us -> [global] 5.27 us +- 0.18 us: 1.01x faster
Not significant!

Benchmark script, bench.py:

vstinner@mona$ cat bench.py 
import pyperf
runner = pyperf.Runner()
runner.timeit('seed',
    setup='import random; rnd=random.Random()',
    stmt='rnd.seed(b"abc")')

I ran the benchmark with:

$ ./configure && make
$ ./python -m venv env && env/bin/python -m pip install pyperf
$ env/bin/python bench.py -o ref.json

@vstinner vstinner merged commit b9f814c into python:main Nov 9, 2023
25 checks passed
@vstinner vstinner deleted the import_random branch November 9, 2023 22:10
@vstinner
Copy link
Member Author

vstinner commented Nov 9, 2023

Merged. Thanks for reviews @AlexWaygood and @rhettinger.

@vstinner
Copy link
Member Author

vstinner commented Nov 9, 2023

Benchmark on moving the import inside seed(): Mean +- std dev: [ref] 5.32 us +- 0.09 us -> [lazy] 5.54 us +- 0.09 us: 1.04x slower. So yes, it has a significant negative impact on performance.

It's sad that import is "so slow" when the module is already loaded, but I'm not interested to dig into import performance. Using a global variable is simple, similar to what we had before, and seems to be very efficient (hot code only has to check if _sha256 is None, it's quick).

@rhettinger: I didn't know that passing a string to seed() was common, thanks, it forces me to think about a different approach to avoid any performance issue.

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
The random module now imports the _sha2 module lazily in the
Random.seed() method for str, bytes and bytearray seeds. It also
imports lazily the warnings module in the _randbelow() method for
classes without getrandbits(). Lazy import makes Python startup
faster and reduces the number of imported modules at startup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants