New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rand usage regression 2.29.2 -> ~2.30 #496
Comments
@kerolasa Maybe you have some ideas for this? I recently debugged a similar problem with dbus blocking in libexpat during boot due to a call to getRandom() [1] [1] https://git.buildroot.net/buildroot/commit/?id=5a5e76381f8b000baa09c902ca89d45725c47f04 |
The getrandom() uses /dev/urandom pool. The current status of the pool is available in
in util-linux we're asking for relatively small random data. The question is why parted asks so many times. Anyway, I guess you have to initialize the urandom pool. For example systemd provides systemd-random-seed.service, do you have enabled this service? I'll improve util-linux getrandom() usage. Now it is not able to use non-fully filled buffer and repeat the syscal. It's mistake. The old /dev/urandom code has been more friendly to the kernel. |
The getrandom() does not have to return all requested bytes (missing entropy or when interrupted by signal). The current implementation in util-linux stupidly asks for all random data again, rather than only for missing bytes. The current code also does not care if we repeat our requests for ever; that's bad. This patch uses the same way as we already use for reading from /dev/urandom. It means: * repeat getrandom() for only missing bytes * limit number of unsuccessful request (16 times) * fallback to /dev/urandom on ENOSYS (old kernel or so...) Addresses: #496 Signed-off-by: Karel Zak <kzak@redhat.com>
@Bubu please, git-pull and test util-linux from the master branch in your environment. Now the getrandom() code behaves like old /dev/urandom based code. So, it's more friendly to kernel and does not repeat unsuccessful requests for ever... |
We do start rngd before using the parted tool as our first S00 startup script. Below are our status values after it starts. (It is running in the background at this point) With the following commits applied to 2.30.1, I still see the same ~40-50sec delay. I did notice though while testing that things started to work after the "random: nonblocking pool is initialized" was printed to the screen (after the delay). It's interesting the entropy avail can read a high value but the pool can be in a uninitialized state for that long. I'll continue to investigate but any ideas are appreciated. |
Found the issue. I'm using a 4.1 kernel with this bug. With the kernel patched, I don't require any util-linux patches. Sorry for the confusion on this, a bump to GLIBC2.25 plus the util-linux update uncovered this issue in my system. |
No problem, the issue forces me to review and improve our getrand() based code. So.. thanks! ;-) |
The getrandom() does not have to return all requested bytes (missing entropy or when interrupted by signal). The current implementation in util-linux stupidly asks for all random data again, rather than only for missing bytes. The current code also does not care if we repeat our requests for ever; that's bad. This patch uses the same way as we already use for reading from /dev/urandom. It means: * repeat getrandom() for only missing bytes * limit number of unsuccessful request (16 times) * fallback to /dev/urandom on ENOSYS (old kernel or so...) Addresses: #496 Signed-off-by: Karel Zak <kzak@redhat.com>
New logic was added at configure time and new conditional code
in lib/randutils.c between versions 2.29.2 and >= 2.30. The logic
determines if the glibc or syscall API should be used for
rand calls. This has been observed causing issues in a
configuration of a 4.1 kernel and glibc2.25. A tool like
parted when used at boot hangs for ~40x the time and when
debugged with gdb shows blocking on genrandom() call in util-linux, even though
a entropy check from a hardware rng used by rngd is adequate
before the parted tool is used.
We did notice that if we straced the parted tool and let all that output hit console it didn't block and take the complete 40sec to return. So our theory was that entropy was created via the uart output. We also noticed similar if we enabled networking that the tool would return much faster. So we wondered if these commits are actually using a API that doesn't leverage a hardware rng output as ours was setup with a value of ~3000 when we checked the entropy quality.
Reverted the following commits against 2.30.2 for my Buildroot build.
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?h=stable/v2.30&id=b192dd6943e5bb5d2a3773b2c9b06cbd4eb28258
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?h=stable/v2.30&id=cc01c2dca4f62e36505570d5cb15f868aa44bf54
The text was updated successfully, but these errors were encountered: