Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rand_unix.c: open random devices on first use only #7437

Closed

Conversation

Projects
None yet
4 participants
@mspncp
Copy link
Contributor

commented Oct 18, 2018

Commit c7504ae (pr #6432) fixed a regression for applications in
chroot environments, which compensated the fact that the new OpenSSL CSPRNG
(based on the NIST DRBG) now reseeds periodically, which the previous
one didn't. Now the reseeding could fail in the chroot environment if the
DEVRANDOM devices were not present anymore and no other entropy source
(e.g. getrandom()) was available.

The solution was to keep the file handles for the DEVRANDOM devices open
by default. In fact, the fix did more than this, it opened the DEVRANDOM
devices early and unconditionally in rand_pool_init(), which had the
unwanted side effect that the devices were opened (and kept open) even
in cases when they were not used at all, for example when the getrandom()
system call was available. Due to a bug (issue #7419) this even happened
when the feature was disabled by the application.

This commit removes the unconditional opening of all DEVRANDOM devices.
They will now only be opened (and kept open) on first use. In particular,
if getrandom() is available, the handles will not be opened unnecessarily.

This change does not introduce a regression for applications compiled for
libcrypto 1.1.0, because the SSLEAY RNG also seeds on first use. So in the
above constellation the CSPRNG will only be properly seeded if it is happens
before the forking and chrooting.

Fixes #7419
Alternative to #7429

rand_unix.c: open random devices on first use only
Commit c7504ae (pr #6432) fixed a regression for applications in
chroot environments, which compensated the fact that the new OpenSSL CSPRNG
(based on the NIST DRBG) now reseeds periodically, which the previous
one didn't. Now the reseeding could fail in the chroot environment if the
DEVRANDOM devices were not present anymore and no other entropy source
(e.g. getrandom()) was available.

The solution was to keep the file handles for the DEVRANDOM devices open
by default. In fact, the fix did more than this, it opened the DEVRANDOM
devices early and unconditionally in rand_pool_init(), which had the
unwanted side effect that the devices were opened (and kept open) even
in cases when they were not used at all, for example when the getrandom()
system call was available. Due  to a bug (issue #7419) this even happened
when the feature was disabled by the application.

This commit removes the unconditional opening of all DEVRANDOM devices.
They will now only be opened (and kept open) on first use. In particular,
if getrandom() is available, the handles will not be opened unnecessarily.

This change does not introduce a regression for applications compiled for
libcrypto 1.1.0, because the SSLEAY RNG also seeds on first use. So in the
above constellation the CSPRNG will only be properly seeded if it is happens
before the forking and chrooting.

Fixes #7419
@paulidale

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2018

My concern with this approach is the chroot problem that the keep open patch addressed. Delaying the open call means it could happen after chroot and fail because the device nodes aren't present.

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Oct 18, 2018

Yes, I understand your concerns. But I have two arguments that might help to overcome your concerns:

The failure-if-first-seeding-occurs-after-chroot is not a regression from 1.1.0 to 1.1.1. Because 1.1.0 applications would have failed previously, since also the SSLEAY RNG auto-seeds on first use only. I checked explicitly that if the application did not use or seed the RNG before forking and chrooting, the seeding would have already failed with libcrypto 1.1.0 (and even 1.0.2), and this failure would not have gone unnoticed (via error codes of RAND_bytes() or RAND_status()).

The good news: If the server application initializes an SSL_CTX before forking (which should be the case for web servers, shouldn't it?) then the RNG get's definitely seeded. I verified this for (1.1.1 and 1.1.0 and 1.0.2). The reason for that is because SSL_CTX_new() calls RAND_bytes() to setup RFC5077 ticket keys:

openssl/ssl/ssl_lib.c

Lines 2622 to 2629 in d46f917

/* Setup RFC5077 ticket keys */
if ((RAND_bytes(ret->tlsext_tick_key_name,
sizeof(ret->tlsext_tick_key_name)) <= 0)
|| (RAND_bytes(ret->tlsext_tick_hmac_key,
sizeof(ret->tlsext_tick_hmac_key)) <= 0)
|| (RAND_bytes(ret->tlsext_tick_aes_key,
sizeof(ret->tlsext_tick_aes_key)) <= 0))
ret->options |= SSL_OP_NO_TICKET;

For 1.0.2 it holds provided OPENSSL_NO_TLSEXT is not defined:

openssl/ssl/ssl_lib.c

Lines 2016 to 2023 in 35cf781

#ifndef OPENSSL_NO_TLSEXT
ret->tlsext_servername_callback = 0;
ret->tlsext_servername_arg = NULL;
/* Setup RFC4507 ticket keys */
if ((RAND_bytes(ret->tlsext_tick_key_name, 16) <= 0)
|| (RAND_bytes(ret->tlsext_tick_hmac_key, 16) <= 0)
|| (RAND_bytes(ret->tlsext_tick_aes_key, 16) <= 0))
ret->options |= SSL_OP_NO_TICKET;

@paulidale

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2018

Failing after chroot would be a behavioural change to 1.1.1. I'm not comfortable with this.

I mostly agree with the second point -- ideally programs do the absolute minimum they need to before giving away permissions and calling chroot. There could be problems for applications that adhere strictly to this. Practically, this isn't all that common.

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Oct 19, 2018

Failing after chroot would be a behavioural change to 1.1.1. I'm not comfortable with this.

An application that didn't fail with 1.1.0 doesn't fail with 1.1.1 either. And an application that didn't fail although it's RNG was unseeded, has a severe security issue.

@mspncp mspncp added this to the 1.1.1a milestone Oct 28, 2018

@levitte

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

@mspncp, I'm not sure how you're reasoning. 1.1.1 has already been released, this change would end up in 1.1.1a, and like @paulidale says, this seems to cause a 1.1.1 -> 1.1.1a regression. Your comparison with 1.1.0 or older doesn't change that.

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

My reasoning is that #6432 fixed the "chroot regression" from 1.1.0 to 1.1.1., but it did so at the cost of introducing a different regression: libcrypto now opens and keeps open device handles, even if they are not used by the application at all. These handles can even leak into child processes after a fork and exec.

Issue #7419 shows that people are disturbed by this behaviour. To be precise: the subject of that issue is only that RAND_keep_random_devices_open(0) fails to disable this unwanted behaviour. But the point is that this unwanted behaviour can't be disabled for existing applications compiled for 1.1.0, without recompiling them. So #6432 broke the compatibility promise and this pull request can be considered a bugfix for that breaking change.

Also I believe that the regression from 1.1.1 -> 1.1.1a is merely hypothetical and that the transition from 1.1.0 to 1.1.1 is much more important and should be as smooth as possible.

@levitte

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

Hmmmm, I had to think a bit over this. It is a matter of weighing one behaviour change over the other... and I'm frankly unclear if we can call "open and keep all random devices" a bug, which would be the primary reason to do this for 1.1.1a.

However, I'm not sure I understand the whole "chroot problem"... or yeah, I do understand it if you forget to mount dev in your chroot, but that's asking for trouble anyway (oh boy it is...).

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

However, I'm not sure I understand the whole "chroot problem"... or yeah, I do understand it if you forget to mount dev in your chroot, but that's asking for trouble anyway (oh boy it is...).

Long story short is: with libcrypto 1.1.0 and older, if the CSPRNG was properly seeded before chrooting, it would continue working forever, because it never reseeded from os entropy sources. (Not even the FIPS DRBG would do it; this was addressed only recently in f58001c. But that change does not even risk to add a regression because any errors during reseeding are ignored anyway.)

New in 1.1.1 is the fact that the DRBG not only reseeds on a regular base, but that it also detects reseeding errors and enters an error state. The "chroot fix" was intended for the kind of application which used to work with 1.1.0. Since the LIBEAY RNG seeds on first use, my conclusion is that these applications must have seeded before chrooting in order to function. So IMO it should be sufficient to open the devices an first use and keep them open, in order to facilitate successful reseeding. BTW, an independent mitigation for the chroot problem was to make the getrandom() syscall available as widely as possible and prefer it if it is available.

@paulidale

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

There is also the concern that getrandom/getentropy fail after chroot and the/dev devices nodes aren't available anymore.

@levitte

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

I find it exasperating that we're trying to cater for what is really a chroot management problem.

If you chroot, make damn sure that you have mounted a dev with at least a minimum set of device files. I thought this was chroot jail 101. It shouldn't be a crypto libraries problem if the user hasn't set up their jail properly.

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

There is also the concern that getrandom/getentropy fail after chroot and the/dev devices nodes aren't available anymore.

Maybe I am thinking too naively, but isn't it so that both random sources finally end up in the same CSPRNG in the kernel? So in which situation would you expect the former entropy source to fail and the latter to succeed?

@paulidale

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

We only ever use one of getentropy and getrandom. However, we use a variety of device nodes depending on the host: /dev/random, /dev/prandom, /dev/srandom, /dev/urandom, and /dev/hwrng. Not Unix machines are x86 Linux :)

It suspect it is possible for /dev/urandom to succeed after getentropy failed (after it earlier succeeded).

I also agree with @levitte that this shouldn't be an OpenSSL problem but it was a breaking change in behaviour.

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

Ok, but comparing /dev/urandom with getentropy() is like comparing apples and oranges. One actually should either use /dev/random and getentropy() together, or /dev/urandom and getrandom(), and not treat them as arbitrarily interchangeable. But this brings us back to our seed source management problem...

@paulidale

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

I agree we've a source management problem but as it currently stands we can try to use at most one of the system calls and up to four of the devices nodes.

@levitte

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

The more I've been thinking of it, the more I come to the conclusion that the "chroot problem" isn't ours and never was. It was pure damn luck that the pre-1.1.1 RNG didn't need to get seeding material from /dev/Xrandom post chroot.

With that in mind, I think I favor this PR over #7429

@t-j-h

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

I too don't think the "chroot problem" is our problem to transparently solve.

@paulidale
Copy link
Contributor

left a comment

Decision made. I'll withdraw the other PR.

@levitte levitte added the ready label Nov 7, 2018

@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 8, 2018

Thank you, Pauli. Your approval means a lot to me, since it shows that we finally reached a consensus after a productive discussion and not by a runoff vote between competing proposals. I'll merge this evening.

levitte pushed a commit that referenced this pull request Nov 8, 2018

rand_unix.c: open random devices on first use only
Commit c7504ae (pr #6432) fixed a regression for applications in
chroot environments, which compensated the fact that the new OpenSSL CSPRNG
(based on the NIST DRBG) now reseeds periodically, which the previous
one didn't. Now the reseeding could fail in the chroot environment if the
DEVRANDOM devices were not present anymore and no other entropy source
(e.g. getrandom()) was available.

The solution was to keep the file handles for the DEVRANDOM devices open
by default. In fact, the fix did more than this, it opened the DEVRANDOM
devices early and unconditionally in rand_pool_init(), which had the
unwanted side effect that the devices were opened (and kept open) even
in cases when they were not used at all, for example when the getrandom()
system call was available. Due  to a bug (issue #7419) this even happened
when the feature was disabled by the application.

This commit removes the unconditional opening of all DEVRANDOM devices.
They will now only be opened (and kept open) on first use. In particular,
if getrandom() is available, the handles will not be opened unnecessarily.

This change does not introduce a regression for applications compiled for
libcrypto 1.1.0, because the SSLEAY RNG also seeds on first use. So in the
above constellation the CSPRNG will only be properly seeded if it is happens
before the forking and chrooting.

Fixes #7419

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from #7437)

levitte pushed a commit that referenced this pull request Nov 8, 2018

rand_unix.c: open random devices on first use only
Commit c7504ae (pr #6432) fixed a regression for applications in
chroot environments, which compensated the fact that the new OpenSSL CSPRNG
(based on the NIST DRBG) now reseeds periodically, which the previous
one didn't. Now the reseeding could fail in the chroot environment if the
DEVRANDOM devices were not present anymore and no other entropy source
(e.g. getrandom()) was available.

The solution was to keep the file handles for the DEVRANDOM devices open
by default. In fact, the fix did more than this, it opened the DEVRANDOM
devices early and unconditionally in rand_pool_init(), which had the
unwanted side effect that the devices were opened (and kept open) even
in cases when they were not used at all, for example when the getrandom()
system call was available. Due  to a bug (issue #7419) this even happened
when the feature was disabled by the application.

This commit removes the unconditional opening of all DEVRANDOM devices.
They will now only be opened (and kept open) on first use. In particular,
if getrandom() is available, the handles will not be opened unnecessarily.

This change does not introduce a regression for applications compiled for
libcrypto 1.1.0, because the SSLEAY RNG also seeds on first use. So in the
above constellation the CSPRNG will only be properly seeded if it is happens
before the forking and chrooting.

Fixes #7419

Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from #7437)

(cherry picked from commit 8cfc197)
@mspncp

This comment has been minimized.

Copy link
Contributor Author

commented Nov 8, 2018

Merged. Thank you everybody!

master

8cfc197 rand_unix.c: open random devices on first use only

1.1.1

abf58ed rand_unix.c: open random devices on first use only

@mspncp mspncp closed this Nov 8, 2018

@mspncp mspncp deleted the mspncp:pr-open-random-devices-on-first-use-only branch Nov 8, 2018

@paulidale

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

Thank you, Pauli.

👍 I just want the best solution we can produce, I'm not (often) precious about the code I write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.