Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rand usage regression 2.29.2 -> ~2.30 #496

Closed
matthew-l-weber opened this issue Aug 12, 2017 · 6 comments
Closed

rand usage regression 2.29.2 -> ~2.30 #496

matthew-l-weber opened this issue Aug 12, 2017 · 6 comments

Comments

@matthew-l-weber
Copy link

New logic was added at configure time and new conditional code
in lib/randutils.c between versions 2.29.2 and >= 2.30. The logic
determines if the glibc or syscall API should be used for
rand calls. This has been observed causing issues in a
configuration of a 4.1 kernel and glibc2.25. A tool like
parted when used at boot hangs for ~40x the time and when
debugged with gdb shows blocking on genrandom() call in util-linux, even though
a entropy check from a hardware rng used by rngd is adequate
before the parted tool is used.

We did notice that if we straced the parted tool and let all that output hit console it didn't block and take the complete 40sec to return. So our theory was that entropy was created via the uart output. We also noticed similar if we enabled networking that the tool would return much faster. So we wondered if these commits are actually using a API that doesn't leverage a hardware rng output as ours was setup with a value of ~3000 when we checked the entropy quality.

Reverted the following commits against 2.30.2 for my Buildroot build.
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?h=stable/v2.30&id=b192dd6943e5bb5d2a3773b2c9b06cbd4eb28258
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?h=stable/v2.30&id=cc01c2dca4f62e36505570d5cb15f868aa44bf54

@Bubu
Copy link

Bubu commented Aug 12, 2017

@kerolasa Maybe you have some ideas for this? I recently debugged a similar problem with dbus blocking in libexpat during boot due to a call to getRandom() [1]

[1] https://git.buildroot.net/buildroot/commit/?id=5a5e76381f8b000baa09c902ca89d45725c47f04

@karelzak
Copy link
Collaborator

The getrandom() uses /dev/urandom pool. The current status of the pool is available in

  /proc/sys/kernel/random/entropy_avail
  /proc/sys/kernel/random/poolsize 

in util-linux we're asking for relatively small random data. The question is why parted asks so many times.

Anyway, I guess you have to initialize the urandom pool. For example systemd provides systemd-random-seed.service, do you have enabled this service?

I'll improve util-linux getrandom() usage. Now it is not able to use non-fully filled buffer and repeat the syscal. It's mistake. The old /dev/urandom code has been more friendly to the kernel.

karelzak added a commit that referenced this issue Aug 14, 2017
The getrandom() does not have to return all requested bytes (missing
entropy or when interrupted by signal). The current implementation in
util-linux stupidly asks for all random data again, rather than only
for missing bytes.

The current code also does not care if we repeat our requests for
ever; that's bad.

This patch uses the same way as we already use for reading from
/dev/urandom. It means:

 * repeat getrandom() for only missing bytes
 * limit number of unsuccessful request (16 times)
 * fallback to /dev/urandom on ENOSYS (old kernel or so...)

Addresses: #496
Signed-off-by: Karel Zak <kzak@redhat.com>
@karelzak
Copy link
Collaborator

@Bubu please, git-pull and test util-linux from the master branch in your environment.

Now the getrandom() code behaves like old /dev/urandom based code. So, it's more friendly to kernel and does not repeat unsuccessful requests for ever...

@matthew-l-weber
Copy link
Author

We do start rngd before using the parted tool as our first S00 startup script. Below are our status values after it starts. (It is running in the background at this point)
Entropy Avail [3095]
Pool Size [4096]

With the following commits applied to 2.30.1, I still see the same ~40-50sec delay.
0001-lib-randutils.c-Fall-back-gracefully-when-kernel-doe.patch
0002-lib-randutils.c-More-paranoia-in-getrandom-call.patch
0003-lib-randutils-improve-getrandom-usage.patch

I did notice though while testing that things started to work after the "random: nonblocking pool is initialized" was printed to the screen (after the delay). It's interesting the entropy avail can read a high value but the pool can be in a uninitialized state for that long. I'll continue to investigate but any ideas are appreciated.

@matthew-l-weber
Copy link
Author

Found the issue. I'm using a 4.1 kernel with this bug.
https://www.spinics.net/lists/linux-crypto/msg24584.html

With the kernel patched, I don't require any util-linux patches. Sorry for the confusion on this, a bump to GLIBC2.25 plus the util-linux update uncovered this issue in my system.

@karelzak
Copy link
Collaborator

No problem, the issue forces me to review and improve our getrand() based code. So.. thanks! ;-)

karelzak added a commit that referenced this issue Sep 21, 2017
The getrandom() does not have to return all requested bytes (missing
entropy or when interrupted by signal). The current implementation in
util-linux stupidly asks for all random data again, rather than only
for missing bytes.

The current code also does not care if we repeat our requests for
ever; that's bad.

This patch uses the same way as we already use for reading from
/dev/urandom. It means:

 * repeat getrandom() for only missing bytes
 * limit number of unsuccessful request (16 times)
 * fallback to /dev/urandom on ENOSYS (old kernel or so...)

Addresses: #496
Signed-off-by: Karel Zak <kzak@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants