-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
os.urandom() should call getrandom(2) not getentropy(2) on Solaris 11.3 and newer #69191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A recent Solaris build upgrade resulted in a massive slowdown of a package operation (our package client is written in Python). Some DTrace revealed this was because os.urandom() calls had slowed down by a factor of over 300. This turned out to be caused by an upgrade of Python from 2.7.9 to 2.7.10 which included:
By adding ac_cv_func_getentropy=no to our configure options, we were able to get back the performance we had lost. But our security experts warned: --- OpenBSD only has getentropy(2) and we are compatible with that. The bug is in Python it should not use getentropy(2) for the implementation of os.urandom() unless it builds its own DRBG (Deterministic Random Bit Generator) around that - which will mean it is "caching" and thus only calling getentropy() for seeding very infrequently. You can not substitute getentropy() for getrandom(), if you need randomness then entropy does not suffice. They are using getentropy(2) correctly in the implementation of _PyRandom_Init(). I would personally recommend that the upstream adds os.getentropy() changes As it is the upstream implementation is arguably a security vulnerability since you can not use entropy where you need randomness. where by "upstream" he meant "python.org". So I am filing this bug to request those changes. I can probably prototype this in the reasonable near future, but I wanted to get the bug filed sooner rather than later. |
Attached patch (for Python 3.4) removes code to use getentropy() in os.urandom(). |
I got several long private emails from Theo De Raadt about this issue. I think the gist of it all is that most likely (a) the app most likely shouldn't be calling os.urandom() that often, and (b) Solaris getentropy() is apparently stunningly slow. Theo also seems to imply that "entropy" as a concept is debunked, and OpenBSD getentropy() and getrandom() are essentially the same thing except that the former's interface is faster (because it doesn't use a FD). So it is also possible that using getentropy() instead of getrandom() was unnecessary; possibly the speed difference would be noticeable via Python. Perhaps the getentropy() check can explicitly rule out Solaris (either at the autoconf level or in the random.c source code) if you prefer to keep the getentropy() call on OpenBSD. |
Also, Theo believes that our Mersenne Twister is outdated and os.urandom() is the only reasonable alternative. So we might as well keep it fast. |
Guido, you're clearly talking with someone who knows too much ;-) If we're using the Twister for _anything_ related to crypto-level randomness, then I'd appalled - it was utterly unsuitable for any such purpose from day 1. But as a general-purpose generator for non-crypto uses, it remains an excellent choice, and still a nearly de facto standard for "almost all" such purposes. There are general-purpose generators I like better now, but won't push in that direction until the Twister's "nearly de facto standard" status fades. Better to be a follower than a leader in this area. |
To Theo it's probably inconceivable that anyone would be using random numbers for anything besides crypto security. :-) His favorite is arc4random() (but he notes it's not based on RC4 anymore, but uses chacha -- I have no idea what any of that means :-). He did say userland PRNG a few times. |
(A)RC4 and ChaCha are just two stream ciphers that let you encrypt some data, they work by essentially producing a psuedo-random stream of data in a deterministic manner based off of a key, and than that is XOR'd with the data you want to encrypt. arc4random (ab)uses this and uses "real" entropy (e.g. randomness pulled from random noise on the network and such) as the "key" and then uses the psuedo-random stream of data as the values you get when you ask arc4random for some random data. The actual process is quite a bit more complex then that, but that's the basic gist. Userspace PRNG's are not a very good idea for reasons better explained by an expert: http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ And yea, using MT for anything that needs a CSPRNG (that is, a Cryptographically Secure Psuedo Random Number Generator) is a real bad idea, because the numbers it outputs are not "really" random. I'm of a mind that the APIs should default to CSPRNGs (so |
Oh yea, and (A)RC4 is broken and shouldn't be used for anything anymore, ChaCha is much better and is pretty great. |
Thread on python-dev: https://mail.python.org/pipermail/python-dev/2015-September/141439.html |
Ok, here is a patch implementing this option. It keeps getentropy() on OpenBSD for os.urandom(), but it disables getentropy() on Solaris for os.urandom(). I don't know if my py_getrandom() function calling syscall(SYS_getrandom, buffer, size, 0) works on Solaris. The flags are hardcoded, and I don't know if the <sys/syscall.h> include is enough to get the syscall() function. (Does Solaris uses the GNU C library?) @jbeck: Can you please test this patch on the default branch of Python? Can you tell if the HAVE_GETRANDOM_SYSCALL check succeed on Solaris? (do you have "#define HAVE_GETRANDOM_SYSCALL 1" in pyconfig.h?) The configure scripts tries to compile the following C program: #include <sys/syscall.h>
int main() {
const int flags = 0;
char buffer[1];
int n;
/* ignore the result, Python checks for ENOSYS at runtime */
(void)syscall(SYS_getrandom, buffer, sizeof(buffer), flags);
return 0;
} |
@Haypo: Yes, that patch works (applied to 3.5.0rc3; proxy problems are preventing me from updating my clone of the default branch at the moment) i.e., pyconfig.h shows: #define HAVE_GETRANDOM_SYSCALL 1 To answer your other questions:
Prior to applying this patch, I had needed to tweak py_getrandom() to recognize EINVAL in addition to ENOSYS as sufficient reason to turn off getrandom_works. I still have that tweak in place, but have not yet tried backing it out to see if things still behave with your patch. Let me know if there is any more information I can provide, and thanks for your attention to this issue. |
In which case getrandom() fails with EINVAL? |
Yes, the syscall was failing with EINVAL. |
Sorry, I don't understand. Do you mean that calling a non-existant syscall on Solaris fails with EINVAL? Calling the getrandom() syscall on Solaris < 11.3 fails with EINVAL? |
Sorry, let me try to clarify. ENOSYS is the correct errno value to check for, for the situation (e.g., Solaris < 11.3) of a non-existent system call. But EINVAL should also be checked for to make sure the system call was invoked with proper parameters. My builds of Python 3.5.0.X (don't recall whether X was a late Beta or release candidate) were core dumping because Python was making that syscall but not checking for EINVAL, and thus assuming the call was valid, when it was not. Sorry, I did not try to debug why it was invalid. But once I added EINVAL, alongside ENOSYS, as a reason to set getrandom_works to 0, then python started behaving properly. |
py_getrandom() calls Py_FatalError() or raises an OSError on EINVAL. The error is not silently ignored.
Py_FatalError() calls abort(). Usually, operating systems dump a core dump in this case. But it is the expected behaviour. Python now refuses to start if the OS random number generator doesn't work. There are similar checks on the system and monotonic clocks for example.
Ah! Finally you explain the problem :-) I wrote py_getrandom() for Linux. I didn't expect other operating systems to implement the getrandom() syscall. I hardcoded the flags (0) for example. py_getrandom() calls directly the syscall, because I like the new cool getrandom() syscall of Linux: it avoids the need of a private file descriptor. It would be much better to call a function of the C library, but the GNU C library didn't expose the function yet... On Solaris, the function is available in C, no need to call directly the syscall. It would be better to call the C function, and check if it's available in configure. Can you please try remove_get_entropy.patch + urandom_solaris.patch? |
Yes, those patches work, with a caveat. While testing this, I found out why I had needed EINVAL earlier (and still do, for now): there is a bug in the Solaris implementation of getrandom(2). If flags are 0 and the buffer size > 1024, then it fails with EINVAL. That is only supposed to happen for a buffer that big if GNRD_RANDOM is set in flags but GNRD_NONBLOCK is not set. So until that bug is fixed, I have to patch py_getrandom() to treat EINVAL like ENOSYS and fall back to using /dev/urandom as if getrandom(2) were not supported. |
Ah! So it was useful to dig the EINVAL issue, it's a bug in the kernel :-) You wrote that the final version of Solaris 11.3 and 12 was not released yet. Can we expect a fix in the kernel before the final version? |
I have queried the engineer who owns the kernel bug and will post an update once I hear back from him. But as it is already almost midnight on Friday in his geo, it may well be Monday before I hear back. |
The owner of the Solaris kernel bug has confirmed that he plans to get |
New changeset 8b0c2c7cb4a7 by Victor Stinner in branch 'default': |
Ok, I pushed a first fix for Python 3.6. The changeset 8b0c2c7cb4a7 keeps getentropy() on OpenBSD, but it explicitly excludes getentropy() on Solaris ("!defined(sun)"). It adds support for the getrandom() function. As the EINVAL error will be fixed in the final version of Solaris 11.3, I chose to not support this specific error. @john Beck: Does it look good to you? Can you test it? You may have to manually modify the code to not pass a value too large to getrandom() until the Solaris kernel is fixed. If John and buildbots are happy, I will backport the change to Python 2.7, 3.4 and 3.5. |
I have tested your patch with 3.5, and it works fine, although I did have to make a minor change to get test_os to pass. In test_os.py you had: ... @unittest.skipIf(USE_GETENTROPY, whereas I came up with this: ... @unittest.skipIf(USE_GETENTROPY, |
New changeset 221e09b89186 by Victor Stinner in branch 'default': |
"I have tested your patch with 3.5, and it works fine, although I did have to make a minor change to get test_os to pass. In test_os.py you had: (...)" Oh right, I fixed test_os too. |
New changeset 835085cc28cd by Victor Stinner in branch '3.5': |
New changeset 202c827f86df by Victor Stinner in branch '2.7': New changeset 83dc79eeaf7f by Victor Stinner in branch '3.4': |
Ok, I pushed fixes for Python 2.7, 3.4, 3.5 and 3.6. Summary for Solaris:
getentropy() is no more used on Solaris. @john Beck: It would be great if you could run test_os on the development branches 2.7, 3.4, 3.5 and default (3.6). |
Confirmed that test_os runs cleanly on Solaris 12, for each of:
|
Thanks! I close the issue, it's now fixed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: