-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"import random" should import hashlib on demand (nor load OpenSSL) #80740
Comments
Currently, when the random module is imported, the hashlib module is always imported which loads the OpenSSL library, whereas hashlib is only needed when a Random() instance is created with a string seed. For example, "rnd = random.Random()" and "rnd = random.Random(12345)" don't need hashlib. Example on Linux: $ python3
Python 3.7.2 (default, Mar 21 2019, 10:09:12)
>>> import os, sys
>>> 'hashlib' in sys.modules
False
>>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps")
>>> import random
>>> 'hashlib' in sys.modules
True
>>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps")
7f463ec38000-7f463ec55000 r--p 00000000 00:2a 5791335 /usr/lib64/libssl.so.1.1.1b
7f463ec55000-7f463eca5000 r-xp 0001d000 00:2a 5791335 /usr/lib64/libssl.so.1.1.1b
... Attached PR only imports hashlib on demand. Note: I noticed this issue while working on adding OpenSSL 1.1.1 support to Python 3.4 :-) |
In the past, some developers complained when an import has been removed in a stdlib module. I vaguely recall code using "import os" to get the "errno" module from "os.errno". That's "What's New in Python 3.7" contains: "Several undocumented internal imports were removed. One example is that os.errno is no longer available; use import errno directly instead. Note that such undocumented internal imports may be removed any time without notice, even in micro version releases." For this reason, I don't think that stable versions (2.7 and 3.7) should be modified. |
Why do we care about this particular import? It doesn't see slow in any way. In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable). Otherwise, it is a false optimization. |
Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much? |
Raymond:
While I was working on adding OpenSSL 1.1.1 to Python 3.4, my _hashopenssl module was broken. In that case, "import random" fails with ImportError because of hashlib failures. I was surprised, since random doesn't need hashlib at startup. I would like to be able to use "import random" even if hashlib is broken. For me, random is a key component, whereas I see hashlib more as optional. (Even if in practice, it should always be available). Raymond:
Brett:
Well, OpenSSL is not a random tiny library. For example, loading it increases Python RSS of around 2.7 MiB. Example with script x.py: import os
os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
import random
os.system(f"grep ^VmRSS /proc/{os.getpid()}/status") Output without the change on Fedora 29: VmRSS: 7396 kB With the change: VmRSS: 7272 kB Another example is that OpenSSL loads the libz.so dynamic library by dependency. I would prefer to minimize Python footprint if possible. |
Raymond:
I ran a benchmark: $ ./python -m perf command -o ref.json -v -- ./python -c 'import random'
$ # apply patch
$ ./python -m perf command -o patch.json -v -- ./python -c 'import random'
$ ./python -m perf compare_to ref.json patch.json
Mean +- std dev: [ref] 13.8 ms +- 0.2 ms -> [patch] 11.8 ms +- 0.2 ms: 1.18x faster (-15%) My PR 12728 makes Python startup 2 ms faster which I consider as quite significant on 13.8 ms. I know that some users are fighting to get a faster Python startup. Mercurial is just one example ;-) |
In general, deferred imports are code smell that should avoided unless really necessary. They create an on-going maintenance burden (there's a reason most modules don't do this and put their imports at the top). FWIW, a broken hashlib is a localized bug, not an optimization problem. It doesn't affect any user with a build that passes the test suite. Running "python -v" shows that "random" is not part of the normal startup, so deferring the import saves zero for normal startup. It only affects modules that specifically import random. IIRC, Mercurial uses hashing extensively, so deferring the import doesn't help them at all. This is minor change, so I suppose we could let it go through; however, it seems somewhat arbitrary and the reasons offered seem dubious. For the most part, it isn't a good practice. |
One other thought. This code has been present for over a decade. There is no evidence that anyone has ever wanted random to defer one of its imports. This seems like an invented problem. |
You could also use the internal _sha512 module. It's always present, small, lean and provides a SHA512 implementation with sufficient performance. |
This is not true at the moment, the _sha512 module is not present when openssl is missing. This is a bug in setup.py that prevents building the _sha512 module when openssl is missing. See bpo-36544. It is possible that this issue (since it started because of hashlib failures) and also bpo-36414 are caused by this problem. |
Thanks for pointing to the other issue. This is clearly a regression and should be fixed ASAP. I have ACKed your PR and pushed it. |
Note: *Technically*, you can disable the compilation of the _sha512 module using "*disabled*" in Modules/Setup, but I'm not sure if it's a common use case. At least, it makes sense to me when we are sure that OpenSSL and _hashlib are available ;-) I didn't want to mention that since I'm not sure that it's really relevant in this discussion. (I was aware of the regression, and hopefully it's now fixed!) |
I suppose we could do this but it borders on telling folks that we're worried about using our own public APIs, that importing hashlib is bad for them. It shouldn't be that way, hashlib is a collection of hash functions -- it is clear why this import isn't small and fast. It suggests that something is wrong with the implementation. The focus on client code in random seems like the wrong focus. That is just typical of what other clients would do. |
I'm +1 on PR 12728. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: