New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rand_unix: don't discard entropy bytes from syscall_random() and /dev/*random #6990

Closed
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@mspncp
Contributor

mspncp commented Aug 16, 2018

Fixes #6978

Don't discard partial reads from syscall_random() and retry instead.

Don't discard partial reads from /dev/*random as well.

Note: does this need a test? Don't really know how to test this in real life.

@mspncp mspncp changed the title from rand_unix: don't discard entropy bytes from syscall_random() to rand_unix: don't discard entropy bytes from syscall_random() and /dev/*random Aug 16, 2018

@paulidale

This comment has been minimized.

Contributor

paulidale commented Aug 16, 2018

Would a little more sophistication here make sense? I can think of two points that might help:

  1. Rather than retrying the same entropy source immediately, I think it would make sense to move on to the next source and only retry a source once all have failed to provide enough bytes. I.e. make the loop around all of the sources rather than one. This will require changes to our early exit checks that only look to see if some entropy was obtained.
  2. Instead of a fixed iteration limit, would it make sense to iterate until no entropy was gather in the loop? While we are still making progress, keep asking. If the progress stalls, give up.

Of course, if only one source is defined the first item makes no difference but when multiple are it could.

Thoughts?

@paulidale

This comment has been minimized.

Contributor

paulidale commented Aug 16, 2018

I'm willing to live without a test. A test could implement a getrandom() function that always returns a single byte but this only works if that is the only configured source.

To manually test it would be possible to drain /dev/random using something like dd if=/dev/random of=/dev/null iflag=fullblock bs=16 and then run a test where /dev/random is the only source configured.

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 16, 2018

  1. Rather than retrying the same entropy source immediately, I think it would make sense to move on to the next source...

I considered this particular possibility and discarded it, because the DEVRANDOM list contains non-blocking and blocking entropy sources. So I'm not sure if this would be a good idea.

#   define DEVRANDOM "/dev/urandom","/dev/random","/dev/srandom"

  1. Instead of a fixed iteration limit, would it make sense to iterate until no entropy was gather in the loop? While we are still making progress, keep asking. If the progress stalls, give up.

The main point about my choice was that it's bound to terminate. Your proposal sounds like a good alternative. Or maybe a mixture of our ideas, i.e. reset the retry counter whenever we read more than 0 bytes?

To manually test it would be possible to drain /dev/random using something like dd if=/dev/random of=/dev/null iflag=fullblock bs=16 and then run a test where /dev/random is the only source configured.

Sounds like a good idea. I will give it a try.

Thoughts?

I'm not in a hurry to merge. We can discuss this without rush.

@paulidale

This comment has been minimized.

Contributor

paulidale commented Aug 16, 2018

I'd be fine with a making progress and an iteration limit (either an absolute one or one that is reset on progress as you suggest).

What about opening the /dev/random devices with the O_NONBLOCK flag so they return rather than blocking? The error handling would need to check for either EAGAIN or EWOULDBLOCK (either of which might not be defined and if both are they might or might not have the same value) so it gets a little more complicated.

I agree, let's take the time to do this properly.

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 17, 2018

What about opening the /dev/random devices with the O_NONBLOCK flag so they return rather than blocking?

Wouldn't that mean that we would busy loop on /dev/random when it is drained and/or fall through the loop without obtaining enough entropy ? I'm not sure whether this is a good idea.

@paulidale

This comment has been minimized.

Contributor

paulidale commented Aug 17, 2018

If we add non-blocking I/O for the DEVRANDOM sources, we'd also need the fall through to check other sources -- that was the suggestion you commented about or have I misunderstood something?

We'd still need the iteration limiting of course.

rand_pool_add_end(pool, bytes, 8 * bytes);
entropy_available = rand_pool_entropy_available(pool);
bytes_needed = rand_pool_bytes_needed(pool, 1 /*entropy_factor*/);
while(bytes_needed != 0 && attempts-- > 0) {

This comment has been minimized.

@dot-asm

dot-asm Aug 17, 2018

Contributor

Style nit. It's function(arg), but while (condition).

This comment has been minimized.

@mspncp

mspncp Aug 17, 2018

Contributor

Thank, I overlooked that one. Fixed in fe675d2.

buffer = rand_pool_add_begin(pool, bytes_needed);
bytes = syscall_random(buffer, bytes_needed);
if (bytes > 0) {
rand_pool_add_end(pool, (size_t)bytes, 8 * (size_t)bytes);

This comment has been minimized.

@dot-asm

dot-asm Aug 17, 2018

Contributor

Here is interesting question. Is explicit cast actually required? And if not, are we better off without it? Cast is not actually required because of explicit bytes > 0 check, and implicit conversion would work predictably. But are we better off with it? I'm not saying that I have an answer, but I can't help noting that with unsigned multiplication being specified as modulo word-size operation, and signed can turning undefined if multiplicands are large enough, explicit cast renders outcome of multiplication by 8 unconditionally valid. And without it we would give undefined behaviour sanitizer to catch it for us... Does it mean that a check for result of multiplication to be guaranteed appropriate? Again, not that I know the definitive answer, but formally speaking yes...

This comment has been minimized.

@mspncp

mspncp Aug 17, 2018

Contributor

Here is interesting question. Is explicit cast actually required?

I really love your interesting questions. Very often I learn a lot when following the discussions they trigger :-). I must admit that it was a pure reflex that I placed the cast in anticipation of a compiler warning. You were right, there is no warning about sign conversion when I remove the casts. But the reason you gave seems to be incorrect, there is also no warning even if I replace if (bytes > 0) by if (bytes < 0). The true cause seems to be that -Wsign-conversion is disabled even when configuring --strict-warnings. Here's my configuration:

/usr/bin/perl ./Configure debug-linux-x86_64 threads shared enable-egd enable-crypto-mdebug -g --strict-warnings --prefix=/home/msp/install/openssl

I tried to add -Wsign-conversion explicitely and quickly gave up, because there were tons of warnings.

This comment has been minimized.

@mspncp

mspncp Aug 17, 2018

Contributor

But putting compilers and sanitizers aside: Yes, I agree with you that because of the if-clause it is safe to omit the cast.

This comment has been minimized.

@dot-asm

dot-asm Aug 17, 2018

Contributor

On non-direct, vaguely related note, one can only wonder how did they reason declaring call as one taking size_t and returning int. Of course that large lengths are not something anybody would ever ask for, but declaration asks for question what happens if you ask for 4GB? Would it return 2GB-1? Again, just a side note...

This comment has been minimized.

@mspncp

mspncp Aug 17, 2018

Contributor

@dot-asm please have a look at 23d7c09 where I addressed your comment about the return value of syscall_random().

This comment has been minimized.

@dot-asm

dot-asm Aug 18, 2018

Contributor

Note that I didn't actually suggest to change the declaration, only wondered what did they think when declaring underlying call like that. I mean syscall_random is our function, but it calls system-specific one that accepts size_t and returns int, and remark was about the latter. But now when I look at it in morning light, I realize that it was misdirected, because underlying call doesn't return amount of bytes, but error or success. Duh! This however makes question applicable to syscall_random...

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 17, 2018

@paulidale I am very sceptical about your suggestion to introduce non-blocking I/O for blocking random devices (see below). But independently of my scepticism you are surely leaving the path of a simple bugfix for #6978 and heading towards a redesign of multiple entropy source handling (#4394). As such I would say that it is not an option in the current late beta phase.

You might argument that already my handling of EINTR is more than a bug fix for #6978, i.e, more than just keeping the partially read input. I have to agree, but in this case things seem more simple and predictable. Doing an immediate retry is the normal way of handling EINTR and probably not even a retry counter would be necessary. (Shall I remove it?)

My argument against non-blocking I/O is simple: If you don't want to block, use /dev/urandom. In fact, in the case where you have both /dev/urandom and /dev/random device nodes available, the order in the DEVRANDOM list guarantees that the former is used first. And a read from it will never block except for the early boot stage when the kernel random pool is not initialized yet, in which case it is essential to block and wait.

Looping through the devices in the case of EAGIN/EWOULDBLOCK does not make sense for me: It is hard for me to imagine that a non-blocking read from /dev/urandom could return EAGAIN/EWOULDBLOCK and immediately afterwards a read from /dev/random would succeed. Even worse, in the case where only /dev/random is available, you would either busy-loop on EGAIN/EWOULDBLOCK errors or drop out of the loop before receiving enough entropy, if the random pool is drained.

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 17, 2018

Travis failure is unrelated:

fatal: unable to access 'https://github.com/openssl/openssl.git/': Could not resolve host: github.com
@paulidale

This comment has been minimized.

Contributor

paulidale commented Aug 17, 2018

@mspncp you are right, it's too late in the release cycle for a bigger change. Let's fix the bug and rework the entropy gathering properly later.

The way things are setup, the complete entropy request will always be fulfilled from /dev/urandom, unless that device doesn't exist. For FIPS we might need to remove /dev/urandom from the DEVRANDOM list which will make things more interesting.

In the rework, we'll have to consider the case where DEVRANDOM is the only source and blocking makes sense versus having alternative sources where non-blocking does. A discussion for a future time :)

@mspncp mspncp force-pushed the mspncp:pr-retry-syscall-random branch from fe675d2 to 86b272d Aug 17, 2018

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 17, 2018

@paulidale, unfortunately your approval overlapped with some changes of mine. In particular I addressed a comment of @dot-asm and I implemented a reset of the attempt counter on every successfull read. Please have another look.

My changes require reapproval.

@@ -88,6 +86,8 @@ int syscall_random(void *buf, size_t buflen);
|| defined(OPENSSL_SYS_VMS) || defined(OPENSSL_SYS_VXWORKS) \
|| defined(OPENSSL_SYS_UEFI))
static ssize_t syscall_random(void *buf, size_t buflen);

This comment has been minimized.

@mspncp

mspncp Aug 18, 2018

Contributor

Note: The declaration was moved out of sight of the MSVC compiler to fix this AppVeyor build failure.

bytes = bytes_needed;
{
ssize_t bytes;
/* Maximum allowed number of consecutive unsuccessfull attempts */

This comment has been minimized.

@t-j-h

t-j-h Aug 18, 2018

Member

unsuccessful (only one l)

This comment has been minimized.

@mspncp

mspncp Aug 18, 2018

Contributor

Thanks, fixed!

@@ -217,7 +217,8 @@ static size_t sysctl_random(char *buf, size_t buflen)
* Just return an error on older NetBSD versions.
*/
#if defined(__NetBSD__) && __NetBSD_Version__ < 400000000
return 0;
errno = ENOSYS;
return -1;

This comment has been minimized.

@dot-asm

dot-asm Aug 18, 2018

Contributor

There is another return 0 few lines up.

This comment has been minimized.

@mspncp

mspncp Aug 18, 2018

Contributor

done, see b81dc27

This comment has been minimized.

@mspncp

mspncp Aug 18, 2018

Contributor

it's now in 7722d3d.

@@ -256,7 +257,7 @@ int syscall_random(void *buf, size_t buflen)
extern int getentropy(void *bufer, size_t length) __attribute__((weak));
if (getentropy != NULL)
return getentropy(buf, buflen) == 0 ? buflen : 0;
return getentropy(buf, buflen) == 0 ? (ssize_t)buflen : -1;

This comment has been minimized.

@dot-asm

dot-asm Aug 18, 2018

Contributor

If we look at purely formal side, conversion from unsigned to signed is claimed undefined in presence of most significant bit. Again, I do recognize that such large buflen won't ever be passed. (It's internal function and we have full control over arguments, right?) Yet one can wonder if it would be a readability issue. I mean if somebody looks at the function alone, would it be reasonable to assume that reader knows that buflen is always small enough? Maybe a commentary or soft assert would be appropriate... Side note!

This comment has been minimized.

@mspncp

mspncp Aug 18, 2018

Contributor

done, see 89b93e2

if (!ossl_assert(buflen <= OSSL_SSIZE_MAX)) {
errno = EINVAL;
return -1;
}

This comment has been minimized.

@dot-asm

dot-asm Aug 18, 2018

Contributor

Well, this one is actually redundant, because sysctl_random is called from syscall_random, which already performed the check. One can even wonder following. Wouldn't check for buflen <= OSSL_SSIZE_MAX/8 in rand_pool_acquire_entropy be more appropriate? Because it's multiplied by 8 when it's passed to rand_pool_add_end and you don't want that to overflow. Again, I'm not saying that it would, ever, but if we take this path, then it would be only appropriate if we take it all the way. On related note double-check cast in equality test in rand_pool_acquire_entropy...

This comment has been minimized.

@mspncp

mspncp Aug 18, 2018

Contributor

Well, this one is actually redundant, because sysctl_random is called from syscall_random

Indeed, I considered adding only a comment about it, but then I chose the more defensive solution. I'll remove the redundand check and will also consider your other suggestion about rand_pool_acquire_entropy().

@mspncp mspncp force-pushed the mspncp:pr-retry-syscall-random branch from 469503c to aa69fb3 Aug 18, 2018

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 18, 2018

Thanks @dot-asm, your mentioning of rand_pool_acquire_entropy() was a good hint. A rough estimate of the upper bound for the buffer size yields 2^13. So I removed the asserts and replaced them by comments. I.e., I discarded my last fixups and added aa69fb3.

@dot-asm

This comment has been minimized.

Contributor

dot-asm commented Aug 18, 2018

I suppose one can call it paradox of modularization. I mean you break things up to smaller pieces so that one can better concentrate on details, but then you risk loosing context and making things more complicated than they have to be :-) :-) :-)

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 18, 2018

Thanks @dot-asm. Now that you approved, I will autosquash the fixups, so @paulidale can have a second look at the final outcome.

mspncp added some commits Aug 17, 2018

rand_unix.c: assimilate syscall_random() with getrandom(2)
Change return value type to ssize_t and ensure that a negative value
is returned only if a corresponding errno is set.
rand_unix.c: don't discard entropy bytes from syscall_random()
Fixes #6978

Don't discard partial reads from syscall_random() and retry instead.
rand_unix.c: don't discard entropy bytes from /dev/*random
Don't discard partial reads from /dev/*random and retry instead.

@mspncp mspncp force-pushed the mspncp:pr-retry-syscall-random branch from 7722d3d to 9c9d8e0 Aug 18, 2018

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 18, 2018

Rebased branch autosquashing fixups without further changes.

@paulidale

Reapproved.

* Note: 'buflen' equals the size of the buffer which is used by the
* get_entropy() callback of the RAND_DRBG. It is roughly bounded by
*
* 2 * DRBG_MINMAX_FACTOR * (RAND_DRBG_STRENGTH : 8) = 2^13

This comment has been minimized.

@paulidale

paulidale Aug 19, 2018

Contributor

: 8 should that be / 8 to keep C syntax?

This comment has been minimized.

@paulidale

paulidale Aug 19, 2018

Contributor

I'd prefer this changed but it isn't essential.

This comment has been minimized.

@mspncp

mspncp Aug 19, 2018

Contributor

Ups, you're right! I'll change it. Seems like I had too many math sessions with my daughter recently ;-).

This comment has been minimized.

@paulidale

paulidale Aug 19, 2018

Contributor

I know the feeling :)

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 19, 2018

@paulidale, I noticed an inconsistency between the loops for OPENSSL_RAND_SEED_GETRANDOM and OPENSSL_RAND_SEED_DEVRANDOM: The assignment

entropy_available = rand_pool_entropy_available(pool);

is inside the loop for OPENSSL_RAND_SEED_GETRANDOM and after the loop for OPENSSL_RAND_SEED_DEVRANDOM. I added a fixup which moves it out of the loop in the former case, not for efficiency but for consistency reasons. Please confirm b859aa9.

@paulidale

This comment has been minimized.

Contributor

paulidale commented on b859aa9 Aug 19, 2018

Looks fine from here. I'm a little disappointed I didn't notice this either.

if (bytes > 0) {
rand_pool_add_end(pool, bytes, 8 * bytes);
bytes_needed -= bytes;
attempts = 3; /* reset counter after successfull attempt */

This comment has been minimized.

@t-j-h

t-j-h Aug 19, 2018

Member

One more typo here - successfull -> successful

@t-j-h

t-j-h approved these changes Aug 19, 2018

Please fix the one comment typo - but approved.

@t-j-h t-j-h added ready and removed pending 2nd review labels Aug 19, 2018

@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 19, 2018

Please fix the one comment typo

Done. Puh, that typo was really persistent...

levitte pushed a commit that referenced this pull request Aug 19, 2018

rand_unix.c: assimilate syscall_random() with getrandom(2)
Change return value type to ssize_t and ensure that a negative value
is returned only if a corresponding errno is set.

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from #6990)

levitte pushed a commit that referenced this pull request Aug 19, 2018

rand_unix.c: don't discard entropy bytes from syscall_random()
Fixes #6978

Don't discard partial reads from syscall_random() and retry instead.

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from #6990)

levitte pushed a commit that referenced this pull request Aug 19, 2018

rand_unix.c: don't discard entropy bytes from /dev/*random
Don't discard partial reads from /dev/*random and retry instead.

Reviewed-by: Andy Polyakov <appro@openssl.org>
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from #6990)
@mspncp

This comment has been minimized.

Contributor

mspncp commented Aug 19, 2018

Merged, thank you everybody!

@mspncp mspncp closed this Aug 19, 2018

@mspncp mspncp deleted the mspncp:pr-retry-syscall-random branch Aug 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment