-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with OpenSSL 1.1.1c caused by DEVRANDOM_WAIT #9078
Comments
Side Note: @t8m noticed that this feature can be disabled by the following 'hack': just add an explicit Lines 25 to 34 in f308fa2
|
I pinged the original reporter on openssl-users and hope he will join our discussion here. |
So my opinion is of course still the same, i.e. we should not attempt to outsmart the kernel and/or init system. |
I understand not trying to outsmart the local platform, but most OpenSSL users are naive about cryptography, and especially about CSPRNG seeding. We should avoid simple cases where they can make mistakes that result in guessable key material, as in https://crypto.stanford.edu/RealWorldCrypto/slides/nadia.pdf et al. |
Switching to getentropy() also had the effect that people got long
delays during boot or starting services the first time. So we're
now in the situation that using getentropy() or /dev/urandom
will both make us wait until the kernel tells it's ready.
So there are at least a few cases that run into this:
- You really have no entropy, you don't even store it
between boots.
- You have entropy from the previous boot, but don't
count it, you're waiting for fresh entropy to avoid
an attack on the stored file.
In the first case, something really needs to wait until we have
entropy. You can argue about who it should be, OpenSSL or
something else. People get it wrong, so you can argue that OpenSSL
should protect them.
The 2nd case is really what most people that will complain about
it being slow run into. It's a local policy issue that you trust
that file or not. Most Linux distributions will take the safe way
and not trust that file, but still use it. I don't know if they
have a setting to say to trust it or not if you're not worried
about such an attack.
Recent Linux kernels will make use of things like RDRAND/RDSEED or
other hardware RNGs that are available in most modern hardware,
and will have enough new entropy very early in the boot process.
So switching to a newer kernel might fix your issues. It will also
switch you to using getentropy().
People running in a VM should make sure that the guest OS gets
entropy from the host OS.
|
There is one important difference between |
I would revert the commit. @tm8 is correct in that this isn't where we should be trying to fix things. |
If this isn’t where to fix things, then where do you think it should be address? In every single application that generates keys?
|
I think the options are, in order of preference:
- In every Linux based OS
- In every software that reads from /dev/urandom on Linux
- In every application that uses crypto
The problem should fix itself as OSs adopt more recent Linux
kernels. That really is the only fix we can actually expect from
the OSs. Some are stuck on a very old kernel, you can buy new
devices that still ship with a 2.6 kernel.
|
Okay, so tell me what Apache's mod_ssl should do that's portable for all the OpenSSL versions that it supports. And also what should OpenSSL do when running on an old kernel, for unaware developers porting packages? Right now, OpenSSL is safe but maybe slow to startup, and a careful consumer can rebuild openssl to not be slow. That seems to me to be exactly the right trade-off. Remind me again why you wanted RSA 2K keys as the minimum, Kurt? :) |
I think OpenSSL should be attempting to ensure good entropy. Ideally the kernel/OS would but not all do, so having a fallback of libcrypto trying seems prudent.
Adding a flag to only wait once would avoid this difference. Once seeded |
We can try to make things as simple as possible for the user, BUT no simpler. |
There is already such a flag, it is called openssl/crypto/rand/rand_unix.c Lines 500 to 502 in 9847599
Maybe my #9078 (comment) gave the wrong impression that forking processes would be waiting several times. Actually, we don't have that much information about the exact circumstances, apart form the facts cited from the original report.
In particular, we don't know whether the massive delay occurs only at early boot time or always. And no information about OS/kernel version. I hope the original reporter shows up and provides more detail. |
@mspncp I should have looked at the code :( |
-1 (vote forcing). This one might come down to an OMC vote since it seems two in favour and two agin at present. |
I also wonder what happens in FIPS mode where I'm not sure how this can be done generally. Refer to IG 7.15. |
Best I can determine, select on the Linux /dev/random will block any time there is insufficient "fresh entropy" in the kernel entropy pool, not just initially at boot. While OpenSSL will block at most once per process, the delay is unbounded, unpredictable and unacceptable. If /dev/random were to return "ready" for select even though a read(2) would block for lack of entropy, that would be a kernel bug. The original PR must be reverted. If someone wants to block on /dev/random, they'll have to do that explicitly at the application layer. Perhaps |
I agree that select(2) on |
I agree. Given the current progress of the discussion, an OMC vote seems inevitable. If there is no hurry, I would prefer however that we collect more information about the case first. I just sent another invitation to the reporter on the mailing list. |
Commit cff8ff87ad76b93874922dd8c99f5f0fbd09e757 actually contains two parts and only the first part needs to be reverted IMO:
(For the second part, see also #8215 (comment) and #8215 (comment)). |
I'm not aware that calling a vote requires a strict timeline (I could easily be wrong). I'd also prefer to see a better method to ensure that |
FWIW: The draining effect is a fact which actually can be verified by simple experiments: In one terminal, run
and in the other terminal, run either
or
several times. Reading from |
Adding a '-1' of course doesn't. But an official announcement of the vote on the mailing list would include a timeline. I just attempted to preempt the latter. |
Yes, of course, we don't need to revert part 2, that's a feature, not a bug. Many Linux systems have kernels that are substantially newer than the userland. |
If I understood Pauli correctly, the FIPS lab is perfectly happy with |
Perhaps FIPS-mode applications deserve all the punishment they signed up for. :-) If no other options are available the FIPS RNG can wait for /dev/random if required. FIPS aside, let's revert "part 1". |
If at some time in the (hopefully near) future we finally have the ability to configure our seed sources, the
|
No problem. It's always a pleasure to fight a battle of words with you ;-) And once again I'm amazed how much emotions such a dry subject like a random generator can raise... :-P |
If you revert this wait, you will flunk most commercial security audits. I'm not saying they're correct. Just saying, some time back, "everyone" agreed this was the right thing to do. Fwiw I never agreed with it back then either. Just be aware that you will meet political resistance to a change like this. |
That's OK, asbestos underwear is on. Reverting to 1.1.1 as released originally and 1.1.1[ab] is quite defensible. |
So @mspncp, in 1.1.1c things got more secure and this removes that putting the onus on the end-user or application. So, I guess I was right? |
Sigh... I'm exhausted. You are wasting your energy on me, in the end it will be the OMC which makes the decisions. Apropos energy: your energy would probably be better spent convincing Theodore Ts'o to rethink his decision not to block |
But we are turning in circles. The kernel already has been improved and people just need to move on to using getrandom() instead of /dev/urandom. Which is what OpenSSL did. |
I'll stop; I certainly don't want to (heh) exhaust a vital resource like yourself. "Just upgrade kernels" isn't always feasible, and I think backing out of a security improvement is premature. Shrug. YMMV. |
The DEVRANDOM_WAIT feature added a select() call to wait for the `/dev/random` device to become readable before reading from the `/dev/urandom` device. It was introduced in commit 38023b8 in order to mitigate the fact that the `/dev/urandom` device does not block until the initial seeding of the kernel CSPRNG has completed, contrary to the behaviour of the `getrandom()` system call. It turned out that this change had negative side effects on the performance which were not acceptable. After some discussion it was decided to revert this feature and leave it up to the OS resp. the platform maintainer to ensure a proper initialization during early boot time. Fixes openssl#9078 This partially reverts commit 38023b8.
Nope, this is not a good use of anyone's energy either :), find another thing to spend the energy on. The non-blockability of /dev/urandom on Linux is set in stone and it will not change. And moreover even if the iface was not set in stone it does not make any sense to change it as everyone should be moving to getrandom() anyway which has the desired property and also more advantages such as not needing a file descriptor and a device on system. |
This occurred to me too after writing it. That's why I posted #9078 (comment). |
The DEVRANDOM_WAIT feature added a select() call to wait for the `/dev/random` device to become readable before reading from the `/dev/urandom` device. It was introduced in commit 38023b8 in order to mitigate the fact that the `/dev/urandom` device does not block until the initial seeding of the kernel CSPRNG has completed, contrary to the behaviour of the `getrandom()` system call. It turned out that this change had negative side effects on the performance which were not acceptable. After some discussion it was decided to revert this feature and leave it up to the OS resp. the platform maintainer to ensure a proper initialization during early boot time. Fixes openssl#9078 This partially reverts commit 38023b8. (cherry picked from commit c19c5a6)
The DEVRANDOM_WAIT feature added a select() call to wait for the `/dev/random` device to become readable before reading from the `/dev/urandom` device. It was introduced in commit 38023b8 in order to mitigate the fact that the `/dev/urandom` device does not block until the initial seeding of the kernel CSPRNG has completed, contrary to the behaviour of the `getrandom()` system call. It turned out that this change had negative side effects on performance which were not acceptable. After some discussion it was decided to revert this feature and leave it up to the OS resp. the platform maintainer to ensure a proper initialization during early boot time. Fixes #9078 This partially reverts commit 38023b8. Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Viktor Dukhovni <viktor@openssl.org> (cherry picked from commit a08714e) (Merged from #9118)
As a data point, I've spent good amount of time chasing sudden hangs of |
which linux version is that? |
If you run in a virtual machine, you should make sure you actually
get rng data from the host by using something like virtio rng.
Kurt
|
I think the described behaviour can most likely occur when a recent linux version is used |
When was this issue manifesting, for TCP connections? Was the delay only happening on the handshake process? Or did it occur repeatedly on data transfer? |
Library startup. |
Thanks for the info! |
I can still confirm those DEVRANDOM_WAIT related implementation in ver.1.1.1d and face the performance issue, i.e. my program takes several minutes to start. By read this long discussion, I found that this performance issues was solved by the following commit.
But after checked the log, I found that the revert was restored by the following commit.
|
This isn't so much a performance issue, it is a security issue. High quality random numbers are critical to maintain security. What the waiting is doing is getting such high quality randomness. Not waiting means you won't have as much (or often any) level of security. |
It was always a little bit controversial topic. In my opinion having the wait does not belong to openssl library itself but should be handled by external ways. In case the shared memory marker does not work for some reason openssl will always unnecessarily wait although the system RNG was already fully seeded with entropy. But yes, having the kernel RNG sufficiently seeded with quality entropy on the system boot is absolutely critical to security. |
In most situations, OpenSSL won't wait even if the shared memory marker doesn't work. Depending on the kernel and operating system: select(/dev/random) will succeed immediately and read(/dev/random, 1) will consume one byte. Once everything is properly seeded of course. |
The |
In pull request #8251, a
select()
call on/dev/random
was added before reading from/dev/urandom
. Let's call it theDEVRANDOM_WAIT
feature, named after the preprocessor macro involved.openssl/crypto/rand/rand_unix.c
Lines 492 to 514 in f308fa2
The intention of this change was to mitigate a weakness of the
/dev/urandom
device, namely that it does not block until being initially seeded, contrary to thegetrandom()
system call, see issue #8215. Note that this change does not affect the seeding using thegetrandom()
source, which is the preferred method on newer operating systems.There was a some discussion on the issue thread before PR #8251 was raised. While people unanimously agreed that using
getentropy()
is fine, because it blocks at boot time (see also @kroeckx's comment) until the kernel CSPRNG is properly seeded, there was some dissent about whether the/dev/urandom
weakness should be addressed by OpenSSL.@t8m and I agreed that it is actually the business of the OS (resp. its init system) to take care of this and not OpenSSL's business. (see #8215 (comment) and #8215 (comment)). Pull request #8251 emerged as a compromise; the idea was to wait for
/dev/random
to become readable (without reading any bytes) before reading from/dev/urandom
. Nobody protested anymore, except for @t8m (here and here), who was no committer yet at that time, and his protests went unnoticed.However, there is a recent issue report on openssl-users that this change introduced a regression and has unwanted performance impacts.
I think we need to reconsider pull request #8251 and decide whether it was really good idea to fix a problem of the kernel resp. the init system in the OpenSSL library, or whether the negative side-effects outweigh the benefits. Currently, I'm leaning towards reverting the
DEVRANDOM_WAIT
commits on master and 1.1.1.What's your opinion @openssl/committers?
The text was updated successfully, but these errors were encountered: