rngd and the LRNG #173

smuellerDD · 2022-05-31T06:17:05Z

When using the rngd on the LRNG patch set [1], the rngd starts an endless loop. The endless loop repeatedly shows "Added 648/1280 bits of entropy". The issue is the following

The architecture of the LRNG is such that it reports a pool size such as the following:

# cat /proc/sys/kernel/random/poolsize      
1288

which you can never fill by simply writing data into the kernel with the IOCTL RNDADDENTROPY. The reason is that the poolsize shows the maximum number of all entropy pools present in the LRNG. But when writing data to the kernel, you can only update one pool. I.e. after the RNDADDENTROPY IOCTL

# cat /proc/sys/kernel/random/entropy_avail 
456

may go up, but will reach a point where no additional IOCTL can change this value.

Thus when the rngd random_add_entropy performs RNDADDENTROPY followed by RNDGETENTCNT, eventually it will not see the value going up.

I see the following solutions which I would like to discuss:

the LRNG is changed such that it only reports the pool contents of the pool that is externally changeable
the rngd is changed such that the loop stops if the rngd finds it cannot update the entropy

IMHO the 2nd option sounds more like it because even in the upstream random.c implementation there could be an issue where you cannot update the pool and thus run into an endless loop.

I have tested option 1 and it makes the rngd working on the LRNG. But IMHO this is not a good solution as it does not reflect reality.

[1] https://github.com/smuellerDD/lrng

The text was updated successfully, but these errors were encountered:

nhorman · 2022-05-31T12:44:37Z

I should point out that if the LRNG patch has altered /proc/sys/kernel/random/poolsize's meaning, that could be construed as a alteration of userspace ABI, and should likely be modified from a philosophical standpoint. This also goes for the write_wakeup_threshold value. It seems like the LRNG patch has changed the ABI such that the previous meaning of the value (the amount of entropy in the pool at which a process blocking on a POLLIN event will be scheduled) to something else, which is really not good.

How do you propose detecting a failure to add entropy to the pool? I see you are suggesting that we follow the call to RNDADDENTROPY with a call to GETENTCNT, terminating the loop on detection of failure to make the entropy count go up, but I'm a bit leery of that heuristic, as it would cause all our watermarking to be broken (i.e. if someone was draining entropy, we would just continually attempt to fill the pool, even if it was close to full already).

What would be better is if you re-introduced a mechanism into the kernel to emulate what write_wakeup_threshold did previously. From the admin guide:

write_wakeup_threshold: when the entropy count drops below this
(as a number of bits), processes waiting to write to /dev/random
are woken up. This file is writable for compatibility purposes, but
writing to it has no effect on any RNG behavior.

From what I can see that sysctl has been disconnected, meaning that any applications using it to control when poll operations are triggered are broken, that's what really needs to be fixed.

Barring that an alternate method needs to be developed to allow for applications to detect when more entropy should be added.

smuellerDD · 2022-05-31T12:57:38Z

Am Dienstag, 31. Mai 2022, 14:44:47 CEST schrieb Neil Horman: Hi Neil,

I should point out that if the LRNG patch has altered /proc/sys/kernel/random/poolsize's meaning, that could be construed as a alteration of userspace ABI, and should likely be modified from a philosophical standpoint. This also goes for the write_wakeup_threshold value. It seems like the LRNG patch has changed the ABI such that the previous meaning of the value (the amount of entropy in the pool at which a process blocking on a POLLIN event will be scheduled) to something else, which is really not good.

Well, IMHO it is still ABI compliant as the LRNG has multiple pools and only one of them is relevant for the user space interaction. As I said, I could trivially change the LRNG to be 100% identical to what random.c provides. But that would not reflect reality. But on the other hand, the LRNG has another interface where the different pool values are reflected.

How do you propose detecting a failure to add entropy to the pool?

The return code of the IOCTL should indicate a failure.

I see you are suggesting that we follow the call to RNDADDENTROPY with a call to GETENTCNT, terminating the loop on detection of failure to make the entropy count go up, but I'm a bit leery of that heuristic, as it would cause all our watermarking to be broken (i.e. if someone was draining entropy, we would just continually attempt to fill the pool, even if it was close to full already).

Agreed, but why are you not waiting for a wakeup from the kernel using select() and thus using the write_wakeup_threshold as mentioned below? The kernel would wake you up and you would not need to call GETENTCNT to check in user space whether to refill.

What would be better is if you re-introduced a mechanism into the kernel to emulate what write_wakeup_threshold did previously. From the admin guide: * ``write_wakeup_threshold``: when the entropy count drops below this (as a number of bits), processes waiting to write to ``/dev/random`` are woken up. This file is writable for compatibility purposes, but writing to it has no effect on any RNG behavior.

This is exactly what the LRNG does. But it applies only to the user-space manageable entropy pool. The jitterentropy-rngd and the haveged rely on this behavior and they fare quite well with it.

From what I can see that sysctl has been disconnected, meaning that any applications using it to control when poll operations are triggered are broken, that's what really needs to be fixed.

I do not see that. What I see the rngd seems to be doing is while (avail_entropy < poolsize) { RNDADDENTROPY(new entropy); avail_entropy = GETENTCNT; } And if avail_entropy does not change, you enter an endless loop - the one I see. If I change the LRNG code such that poolsize reports only the user-space managable pool and GETENTCNT only returns the entropy level in that user-space managable pool, the rngd works well. Thus, why not simply call RNDADDENTROPY without any loop checking the available entropy and wait until the kernel tells you to insert new entropy using select() on the /dev/random FD?

Barring that an alternate method needs to be developed to allow for applications to detect when more entropy should be added.

Ciao Stephan

nhorman · 2022-05-31T14:23:24Z

Am Dienstag, 31. Mai 2022, 14:44:47 CEST schrieb Neil Horman: Hi Neil,
I should point out that if the LRNG patch has altered /proc/sys/kernel/random/poolsize's meaning, that could be construed as a alteration of userspace ABI, and should likely be modified from a philosophical standpoint. This also goes for the write_wakeup_threshold value. It seems like the LRNG patch has changed the ABI such that the previous meaning of the value (the amount of entropy in the pool at which a process blocking on a POLLIN event will be scheduled) to something else, which is really not good.
Well, IMHO it is still ABI compliant as the LRNG has multiple pools and only one of them is relevant for the user space interaction. As I said, I could trivially change the LRNG to be 100% identical to what random.c provides. But that would not reflect reality. But on the other hand, the LRNG has another interface where the different pool values are reflected.

I'm not sure of your reasoning here. I agree the LRNG introduces multiple pools and that the existing ABI for random.c is insufficient to allow for the expression of multiple pool entropy counts, but amalgamating them into a single value is causing behavioral issues, and exporting an entropy poolsize that can never be reached (when previously it could), strikes me as an ABI alteration.

How do you propose detecting a failure to add entropy to the pool?
The return code of the IOCTL should indicate a failure.
So, looking at the head of Linus' tree, I see that random_ioctl confirms that the entirety of a given entropy block was written, and returns -EFAULT if it wasn't. Thats potentially useable, but its aliased with fault resulting from a failed get_user call. If we can return a different error code here (E2BIG or ENOSPC perhaps), then we could use this as a detector for a full poll definitively

I see you are suggesting that we follow the call to RNDADDENTROPY with a call to GETENTCNT, terminating the loop on detection of failure to make the entropy count go up, but I'm a bit leery of that heuristic, as it would cause all our watermarking to be broken (i.e. if someone was draining entropy, we would just continually attempt to fill the pool, even if it was close to full already).
Agreed, but why are you not waiting for a wakeup from the kernel using select() and thus using the write_wakeup_threshold as mentioned below? The kernel would wake you up and you would not need to call GETENTCNT to check in user space whether to refill.

I am. Se calls to random_sleep in rng-tools. We block on POLLOUT events on the random_fd, executed whenever we detect that the pool is full. Waking up from this call is supposed to be controlled by the write_wakeup_threshold.

However, it seems that the mechanism by which poll triggers are made is now also significantly different from what it used to be. It appears that instead of waking up when the entropy pool is below the write_wakeup_threshold, pollout is triggered anytime a block of entropy is credited to the pool. write_wakeup_threshold appears to be entirely disconnected from any functionality at the moment, which strikes me as further ABI change.

If write_wakeup_threshold worked as documented, we could use that and ignore the full condition from GETENTCNT (or an error from ADDENTROPY, even though it would generate more syscalls), but as it is currently, since rngd is the process adding entropy to the pool, if a kernel isn't for whatever reason adding entropy on its own via add_timer_randomness, its entirely possible that rngd will deadlock on a call to poll(random_fd)

What would be better is if you re-introduced a mechanism into the kernel to emulate what write_wakeup_threshold did previously. From the admin guide: * write_wakeup_threshold: when the entropy count drops below this (as a number of bits), processes waiting to write to /dev/random are woken up. This file is writable for compatibility purposes, but writing to it has no effect on any RNG behavior.
This is exactly what the LRNG does. But it applies only to the user-space manageable entropy pool. The jitterentropy-rngd and the haveged rely on this behavior and they fare quite well with it.
How? How does LRNG introduce a machinsm that preforms what the write_wakeup_threshold does that isn't considered an ABI breakage?
I'm not saying I won't adapt to that mechanism, I will, but I'm not thrilled that the mechanism was changed without having user space backwards compatibility in place first.

From what I can see that sysctl has been disconnected, meaning that any applications using it to control when poll operations are triggered are broken, that's what really needs to be fixed.
I do not see that. What I see the rngd seems to be doing is while (avail_entropy < poolsize) { RNDADDENTROPY(new entropy); avail_entropy = GETENTCNT; } And if avail_entropy does not change, you enter an endless loop - the one I see. If I change the LRNG code such that poolsize reports only the user-space managable pool and GETENTCNT only returns the entropy level in that user-space managable pool, the rngd works well. Thus, why not simply call RNDADDENTROPY without any loop checking the available entropy and wait until the kernel tells you to insert new entropy using select() on the /dev/random FD?

Because I never had to do that before, this is why I'm asserting this is ABI breakage.

First to your point about how rngd behaves, it actually does 2 things:

It adds entropy, then checks to see if the pool is full, exactly as you mention (this was done to aid in aggressive entropy refilling when its drained quickly - i.e. we wanted to be able to keep filling the pool without having to go through an extra kernel trap for the poll call, see rngd commit fa6f75d
Once the pool is full, we sleep blocking on POLLOUT in random_sleep, waiting for the entropy pool to drop to a configured level (write_wakeup_threshold) at which time we begin filling the entropy pool again. This has been in place since 2017, commit 5e6f8f9

It seems to me that (1) no longer works, Regardless of what I do. I could certainly modify rngd to only use the poll check, but I'm opposed to that for three reasons:

What I currently have should work, and it has in the past. I don't dispute that its sub-optimal code, but it worked, and from what I see, there no reason it should not have worked, and continue to work (that is to say, I should be able to do a comparison of poolsize to the value returned from GETENTCNT, and conclude weather or not the entropy pool is full). Because that is no longer the case, I am aserting that this is an ABI break. While I can make the modification to avoid this comparison, that's not how this should have worked. There should have been a new abi established, along with a deprecation of the old one, such that applications could make the switch without having to scramble to fix breakages that were revealed as a result of the change.
Making the change you suggest re-introduces the problem I was trying to avoid in commit fa6f75d. That is to say, If I go to sleep in a poll after every write to /dev/random, rngd has to go through a scheduling cycle to get back on the CPU to add more entropy if it is being drained rapidly. That creates a significant performance impact. Perhaps its less of an issues now that /dev/random no longer blocks, I'm not sure, but I don't like the idea of relying on it, especially since we now need to straddle kernel versions where /dev/random may or may not block based on the kernel version we are running with.
Polling on /dev/random no longer follows the semantics of write_wake_threshold (or more specifically, it may or may not, dependent on what kernel version is being used). In the latest kernel version write_wakeup_threshold appears to be ignored, instead waking any waiters when entropy Is added to the user pool. This implies that rngd will either wait forever to be activated when sleeping on a poll, creating a deadlock, or be woken at random times when the kernel adds to the pool, only for rngd to have to go through the potentially unneeded activity of having to add entropy to what may be an already full pool.

This really feels like a mess to me right now. It seems the existing ABI has been repruposed to a new behavioral model, which while perhaps superior to the old model, simply discarded any old users, in the assumption they would adapt, which is backwards from the way its supposed to be.

I'm not really sure what to do here. I think the "right" philosophical thing to do, is restore the behavior of write_wake_threshold as well as the behavior of the random_ioctl, so that applications continue to function as they previously expected, but that's likely impractical, as you've already started forcing adoption on other applications such that the new model is somewhat de-facto, which leaves me in a lousy position.

I think perhaps the best/only real course of action here that's even loosely palatable would be to:

restore the functionality of write_wake_threshold, such that when the amount of entropy in a pool falls below that mark, any process waiting on poll for /dev/random wakes up. it may wake up more often than that, but it should at least wake up at that point.
Have rngd check for the presence of /proc/lrng_type, and based on its presence or absence, using that as a toggle to determine if poolsize and the value of GENENTCNT can be safely used to determine pool full status. if we can't use that measurement, then we rely solely on write_wake_threshold to trigger refills (which underscores the need for it to function as documented) I really don't like having to do that, but given that its already out in the wild, and reverting this behavior will cause as many problems as it solves, I don't see a way around it.

Barring that an alternate method needs to be developed to allow for applications to detect when more entropy should be added.
Ciao Stephan

smuellerDD · 2022-05-31T14:42:01Z

Am Dienstag, 31. Mai 2022, 16:23:35 CEST schrieb Neil Horman: Hi Neil,

> Am Dienstag, 31. Mai 2022, 14:44:47 CEST schrieb Neil Horman: Hi Neil, > I should point out that if the LRNG patch has altered > /proc/sys/kernel/random/poolsize's meaning, that could be construed as a > alteration of userspace ABI, and should likely be modified from a > philosophical standpoint. This also goes for the write_wakeup_threshold > value. It seems like the LRNG patch has changed the ABI such that the > previous meaning of the value (the amount of entropy in the pool at which > a process blocking on a POLLIN event will be scheduled) to something > else, which is really not good. Well, IMHO it is still ABI compliant as > the LRNG has multiple pools and only one of them is relevant for the user > space interaction. As I said, I could trivially change the LRNG to be > 100% identical to what random.c provides. But that would not reflect > reality. But on the other hand, the LRNG has another interface where the > different pool values are reflected. I'm not sure of your reasoning here. I agree the LRNG introduces multiple pools and that the existing ABI for random.c is insufficient to allow for the expression of multiple pool entropy counts, but amalgamating them into a single value is causing behavioral issues, and exporting an entropy poolsize that can never be reached (when previously it could), strikes me as an ABI alteration.

Ok, then I will revise it.

> How do you propose detecting a failure to add entropy to the pool? > The return code of the IOCTL should indicate a failure. So, looking at the head of Linus' tree, I see that random_ioctl confirms that the entirety of a given entropy block was written, and returns -EFAULT if it wasn't. Thats potentially useable, but its aliased with fault resulting from a failed get_user call. If we can return a different error code here (E2BIG or ENOSPC perhaps), then we could use this as a detector for a full poll definitively > I see you are suggesting that we follow the call to RNDADDENTROPY with a > call to GETENTCNT, terminating the loop on detection of failure to make > the entropy count go up, but I'm a bit leery of that heuristic, as it > would cause all our watermarking to be broken (i.e. if someone was > draining entropy, we would just continually attempt to fill the pool, > even if it was close to full already). Agreed, but why are you not > waiting for a wakeup from the kernel using select() and thus using the > write_wakeup_threshold as mentioned below? The kernel would wake you up > and you would not need to call GETENTCNT to check in user space whether > to refill. I am. Se calls to random_sleep in rng-tools. We block on POLLOUT events on the random_fd, executed whenever we detect that the pool is full. Waking up from this call is supposed to be controlled by the write_wakeup_threshold. However, it seems that the mechanism by which poll triggers are made is now also significantly different from what it used to be. It appears that instead of waking up when the entropy pool is below the write_wakeup_threshold, pollout is triggered anytime a block of entropy is credited to the pool. write_wakeup_threshold appears to be entirely disconnected from any functionality at the moment, which strikes me as further ABI change.

Are you referring to random.c or the LRNG? If you refer to the LRNG, I am not sure why you say that. The wakeup happens at the right time when the user- space managable pool entropy level falls below the threshold. It works precisely as it is defined IMHO. This is exactly how jitterentropy-rngd, haveged, and virtio-rng are woken up at the right time to inject data and then get back to sleep. All those work even without the discussed change mentioned at the beginning.

If write_wakeup_threshold worked as documented, we could use that and ignore the full condition from GETENTCNT (or an error from ADDENTROPY, even though it would generate more syscalls), but as it is currently, since rngd is the process adding entropy to the pool, if a kernel isn't for whatever reason adding entropy on its own via add_timer_randomness, its entirely possible that rngd will deadlock on a call to poll(random_fd)

With the current random.c it will deadlock as the kernel always thinks it has enough entropy. Thus you could select() with a timeout to periodically wake up and inject some data and go back to sleep.

> What would be better is if you re-introduced a mechanism into the kernel > to emulate what write_wakeup_threshold did previously. From the admin > guide: * ``write_wakeup_threshold``: when the entropy count drops below > this (as a number of bits), processes waiting to write to ``/dev/random`` > are woken up. This file is writable for compatibility purposes, but > writing to it has no effect on any RNG behavior. This is exactly what the > LRNG does. But it applies only to the user-space manageable entropy pool. > The jitterentropy-rngd and the haveged rely on this behavior and they > fare quite well with it. How? How does LRNG introduce a machinsm that preforms what the write_wakeup_threshold does that isn't considered an ABI breakage? I'm not saying I won't adapt to that mechanism, I will, but I'm not thrilled that the mechanism was changed without having user space backwards compatibility in place first.

I am surprised that you say that the LRNG deviates in this interface from random.c. The LRNG DOES wake you up if it has insufficient entropy in its entropy pool managable by external. It works exactly as defined in random(4). On my test system: $ cat /proc/sys/kernel/random/write_wakeup_threshold 448

--> when the user-space manageable entropy pool has less entropy the LRNG wakes the sleeper up. And that works perfectly well with the hwrng and virtio-rng: # cat /sys/devices/virtual/misc/hw_random/rng_current virtio_rng.0 # ps -efa | grep rng root 180 2 0 Mai23 ? 00:00:00 [hwrng] -> causing a reseed lrng_es_aux: obtained 256 bits by collecting 320 bits of entropy from aux pool, 192 bits of entropy remaining

--> the pool is immediately replenished by virtio-rng

> From what I can see that sysctl has been disconnected, meaning that any > applications using it to control when poll operations are triggered are > broken, that's what really needs to be fixed. I do not see that. What I > see the rngd seems to be doing is while (avail_entropy < poolsize) { > RNDADDENTROPY(new entropy); avail_entropy = GETENTCNT; } And if > avail_entropy does not change, you enter an endless loop - the one I see. > If I change the LRNG code such that poolsize reports only the user-space > managable pool and GETENTCNT only returns the entropy level in that > user-space managable pool, the rngd works well. Thus, why not simply call > RNDADDENTROPY without any loop checking the available entropy and wait > until the kernel tells you to insert new entropy using select() on the > /dev/random FD? Because I never had to do that before, this is why I'm asserting this is ABI breakage. First to your point about how rngd behaves, it actually does 2 things: 1) It adds entropy, then checks to see if the pool is full, exactly as you mention (this was done to aid in aggressive entropy refilling when its drained quickly - i.e. we wanted to be able to keep filling the pool without having to go through an extra kernel trap for the poll call, see rngd commit fa6f75d

-> this repeated check is what causes the failure with the LRNG currently

2) Once the pool is full, we sleep blocking on POLLOUT in random_sleep, waiting for the entropy pool to drop to a configured level (write_wakeup_threshold) at which time we begin filling the entropy pool again. This has been in place since 2017, commit 5e6f8f9

-> this works as intended with the LRNG

It seems to me that (1) no longer works, Regardless of what I do. I could certainly modify rngd to only use the poll check, but I'm opposed to that for three reasons: 1) What I currently have should work, and it has in the past. I don't dispute that its sub-optimal code, but it worked, and from what I see, there no reason it should not have worked, and continue to work (that is to say, I should be able to do a comparison of poolsize to the value returned from GETENTCNT, and conclude weather or not the entropy pool is full).

As mentioned at the top, this is where the LRNG deviates conceptually. And if I adjust it, the rngd behaves properly.

Because that is no longer the case, I am aserting that this is an ABI break. While I can make the modification to avoid this comparison, that's not how this should have worked. There should have been a new abi established, along with a deprecation of the old one, such that applications could make the switch without having to scramble to fix breakages that were revealed as a result of the change.

Ok, there is a new API that lists the individual pools

2) Making the change you suggest re-introduces the problem I was trying to avoid in commit fa6f75d. That is to say, If I go to sleep in a poll after every write to /dev/random, rngd has to go through a scheduling cycle to get back on the CPU to add more entropy if it is being drained rapidly. That creates a significant performance impact. Perhaps its less of an issues now that /dev/random no longer blocks, I'm not sure, but I don't like the idea of relying on it, especially since we now need to straddle kernel versions where /dev/random may or may not block based on the kernel version we are running with. 3) Polling on /dev/random no longer follows the semantics of write_wake_threshold (or more specifically, it may or may not, dependent on what kernel version is being used). In the latest kernel version write_wakeup_threshold appears to be ignored, instead waking any waiters when entropy Is added to the user pool. This implies that rngd will either wait forever to be activated when sleeping on a poll, creating a deadlock, or be woken at random times when the kernel adds to the pool, only for rngd to have to go through the potentially unneeded activity of having to add entropy to what may be an already full pool. This really feels like a mess to me right now. It seems the existing ABI has been repruposed to a new behavioral model, which while perhaps superior to the old model, simply discarded any old users, in the assumption they would adapt, which is backwards from the way its supposed to be.

Ok, I see that it should be adjusted on my side then. I will do that, the patch is rather small.

I'm not really sure what to do here. I think the "right" philosophical thing to do, is restore the behavior of write_wake_threshold as well as the behavior of the random_ioctl, so that applications continue to function as they previously expected, but that's likely impractical, as you've already started forcing adoption on other applications such that the new model is somewhat de-facto, which leaves me in a lousy position. I think perhaps the best/only real course of action here that's even loosely palatable would be to: 1) restore the functionality of write_wake_threshold, such that when the amount of entropy in a pool falls below that mark, any process waiting on poll for /dev/random wakes up. it may wake up more often than that, but it should at least wake up at that point. 2) Have rngd check for the presence of /proc/lrng_type, and based on its presence or absence, using that as a toggle to determine if poolsize and the value of GENENTCNT can be safely used to determine pool full status. if we can't use that measurement, then we rely solely on write_wake_threshold to trigger refills (which underscores the need for it to function as documented) I really don't like having to do that, but given that its already out in the wild, and reverting this behavior will cause as many problems as it solves, I don't see a way around it.

With the mentioned change to the LRNG, you do not need to change anything. You have convinced me that my ABI is slightly off.

> Barring that an alternate method needs to be developed to allow for > applications to detect when more entropy should be added. Ciao Stephan

Ciao Stephan

nhorman · 2022-05-31T14:51:29Z

Are you referring to random.c or the LRNG? If you refer to the LRNG, I am not
sure why you say that. The wakeup happens at the right time when the user-
space managable pool entropy level falls below the threshold. It works
precisely as it is defined IMHO.
I'm looking at random_poll in random.c. Am I missing something about what would trigger a pullout event here?

I am surprised that you say that the LRNG deviates in this interface from
random.c. The LRNG DOES wake you up if it has insufficient entropy in its
entropy pool managable by external. It works exactly as defined in random(4).

On my test system:

$ cat /proc/sys/kernel/random/write_wakeup_threshold
448

Its possible that I'm missing something here. I'm looking at the random_poll function in random.c, which waits on the crng_init_wait queue, that is woken up only on calls to _credit_init_bits.

If the lrng patches do something different, and are still pending somewhere other than linus' tree, please point them out to me. This may be a non-issue if I'm looking at the wrong code

smuellerDD · 2022-05-31T15:52:18Z

Am Dienstag, 31. Mai 2022, 16:41:53 CEST schrieb Stephan Mueller: Hi Stephan,

Am Dienstag, 31. Mai 2022, 16:23:35 CEST schrieb Neil Horman: Hi Neil, > > Am Dienstag, 31. Mai 2022, 14:44:47 CEST schrieb Neil Horman: Hi Neil, > > I should point out that if the LRNG patch has altered > > /proc/sys/kernel/random/poolsize's meaning, that could be construed as a > > alteration of userspace ABI, and should likely be modified from a > > philosophical standpoint. This also goes for the write_wakeup_threshold > > value. It seems like the LRNG patch has changed the ABI such that the > > previous meaning of the value (the amount of entropy in the pool at > > which > > a process blocking on a POLLIN event will be scheduled) to something > > else, which is really not good. Well, IMHO it is still ABI compliant as > > the LRNG has multiple pools and only one of them is relevant for the > > user > > space interaction. As I said, I could trivially change the LRNG to be > > 100% identical to what random.c provides. But that would not reflect > > reality. But on the other hand, the LRNG has another interface where the > > different pool values are reflected. > > I'm not sure of your reasoning here. I agree the LRNG introduces multiple > pools and that the existing ABI for random.c is insufficient to allow for > the expression of multiple pool entropy counts, but amalgamating them into > a single value is causing behavioral issues, and exporting an entropy > poolsize that can never be reached (when previously it could), strikes me > as an ABI alteration. Ok, then I will revise it.

After changing the LRNG to only report the pool size / available entropy from the user-visible pool, the rngd works flawless. The rngd is inserting data, the kernel wakes it up when it needs more data, the rngd goes to sleep. But again, avail_entropy now will not report the full entropy rate any more, but /proc/lrng_type would. This is currently my concern as this value could be zero until any user-space caller provides data. I.e. it does not reflect reality. But /proc/lrng_type does. Do you think that this patch should be aired? Ciao Stephan

nhorman · 2022-05-31T16:26:33Z

ok, that sounds good. Given that lrng is the new API here, I'm personally ok with that. The existing API needs to maintain functionality with applications that rely on it, even if its not an accurate reflection of reality. As long as there is an interface (in this case /proc/lrng_type) that does reflect reality, I think we're ok.

But more opinions never hurt, it certainly would be a bad idea to circulate this idea and get additional input

smuellerDD · 2022-05-31T17:26:05Z

Am Dienstag, 31. Mai 2022, 18:26:44 CEST schrieb Neil Horman: Hi Neil,

ok, that sounds good. Given that lrng is the new API here, I'm personally ok with that. The existing API needs to maintain functionality with applications that rely on it, even if its not an accurate reflection of reality. As long as there is an interface (in this case /proc/lrng_type) that does reflect reality, I think we're ok. But more opinions never hurt, it certainly would be a bad idea to circulate this idea and get additional input

Here is the solution so far that I put into a separate branch. https://github.com/smuellerDD/lrng/commit/ 44100475f59b4213d8ac8dad683a51bc1d54cd38 But this implies after booting without any rngd, I see # cat /proc/sys/kernel/random/entropy_avail 0 # cat /proc/sys/kernel/random/poolsize 512 # cat /proc/sys/kernel/random/write_wakeup_threshold 448 # cat /proc/lrng_type ... LRNG entropy level: 574 ... After spinning up the rngd # cat /proc/sys/kernel/random/entropy_avail 448 # cat /proc/sys/kernel/random/write_wakeup_threshold 384 # cat /proc/lrng_type ... LRNG entropy level: 1160 ... Ciao Stephan

nhorman · 2022-05-31T17:51:52Z

I'm not sure I understand why entropy_avail would be zero here. Shouldn't it reflect entropy gathered by the kernel through its various processes? Or is this because with the lrng patch random/entropy_avail is zero because those in-kerenel entropy sources fill other pools?

smuellerDD · 2022-06-01T05:22:13Z

Am Dienstag, 31. Mai 2022, 19:52:02 CEST schrieb Neil Horman: Hi Neil,

I'm not sure I understand why entropy_avail would be zero here. Shouldn't it reflect entropy gathered by the kernel through its various processes? Or is this because with the lrng patch random/entropy_avail is zero because those in-kerenel entropy sources fill other pools?

The latter. See the contents of /proc/lrng_type: # cat /proc/lrng_type DRNG name: drbg_nopr_ctr_aes256 LRNG security strength in bits: 256 Number of DRNG instances: 4 Standards compliance: SP800-90C LRNG minimally seeded: true LRNG fully seeded: true LRNG entropy level: 712 Entropy Source 0 properties: Name: IRQ Hash for operating entropy pool: sha512 Available entropy: 448 per-CPU interrupt collection size: 1024 Standards compliance: SP800-90B High-resolution timer: true Continuous compression: true Entropy Source 1 properties: Name: Scheduler Hash for operating entropy pool: sha512 Available entropy: 0 per-CPU scheduler event collection size: 1024 Standards compliance: SP800-90B High-resolution timer: true Entropy Source 2 properties: Name: JitterRNG Available entropy: 256 Enabled: true Entropy Source 3 properties: Name: CPU Hash for compressing data: N/A Available entropy: 8 Data multiplier: 1 Entropy Source 4 properties: Name: Auxiliary Hash for operating entropy pool: sha512 Available entropy: 0 The entropy pool user space can influence is the last one, the auxiliary pool. That one would now feed the sysctl interfaces as well as the RNDGETENTCNT IOCTL. After starting the rngd, the auxiliary pool would be filled: Entropy Source 4 properties: Name: Auxiliary Hash for operating entropy pool: sha512 Available entropy: 448 Ciao Stephan

smuellerDD · 2022-10-11T07:36:16Z

Am Dienstag, 31. Mai 2022, 16:51:40 CEST schrieb Neil Horman: Hi Neil,

> Are you referring to random.c or the LRNG? If you refer to the LRNG, I am > not sure why you say that. The wakeup happens at the right time when the > user- space managable pool entropy level falls below the threshold. It > works precisely as it is defined IMHO. I'm looking at random_poll in random.c. Am I missing something about what would trigger a pullout event here?

Hehe, it took me a while to figure out how that works for my ESDM tool. Here it is: poll_wait registers one wait queue for the write poll and one for the read poll. This wait queue is triggered by random.c when it wants readers/writers to occur. In this case, the poll() framework wakes up and calls random_poll again. Now, the mask can be set and returned via the poll() framework.

> I am surprised that you say that the LRNG deviates in this interface from > random.c. The LRNG DOES wake you up if it has insufficient entropy in its > entropy pool managable by external. It works exactly as defined in > random(4). > > On my test system: > > $ cat /proc/sys/kernel/random/write_wakeup_threshold > 448 Its possible that I'm missing something here. I'm looking at the random_poll function in random.c, which waits on the crng_init_wait queue, that is woken up only on calls to _credit_init_bits.

Right.

…

If the lrng patches do something different, and are still pending somewhere other than linus' tree, please point them out to me. This may be a non-issue if I'm looking at the wrong code

__poll_t lrng_random_poll(struct file *file, poll_table *wait) { __poll_t mask; poll_wait(file, &lrng_init_wait, wait); poll_wait(file, &lrng_write_wait, wait); mask = 0; if (lrng_state_operational()) mask |= EPOLLIN | EPOLLRDNORM; if (lrng_need_entropy() || lrng_state_exseed_allow(lrng_noise_source_user)) { lrng_state_exseed_set(lrng_noise_source_user, false); mask |= EPOLLOUT | EPOLLWRNORM; } return mask; }

--> it is identical to random.c. There it waits for read/write polls in different wait queues which are kicked at the right time. Then the poll framework calls the function again and now the LRNG can set the proper mask. Look at the time when lrng_init_wait and lrng_write_wait are kicked. lrng_write_wait is the one that is kicked when new entropy is requested. So, I do not see any differences here. Ciao Stephan

nhorman · 2022-10-11T11:47:22Z

Ok, I'm sorry, its been so long since we discussed this, I'm not sure where we left off. Your most recent post describes how the lrng wakes up in the kernel, but IIRC we were discussing how to make the proc interface behave in a backwards compatible way as I recall?

smuellerDD · 2022-10-11T11:54:49Z

Am Dienstag, dem 11.10.2022 um 04:47 -0700 schrieb Neil Horman:

Ok, I'm sorry, its been so long since we discussed this, I'm not sure where we left off. Your most recent post describes how the lrng wakes up in the kernel, but IIRC we were discussing how to make the proc interface behave in a backwards compatible way as I recall?

The LRNG has been updated as you suggested to be 100% compatible now. The rngd works with LRNG starting from v46. Thus, nothing is left open. :-) It may be the case that I forgot to close the issue :-( Thanks Stephan

nhorman · 2022-10-11T11:56:15Z

No worries, that feels right to me. Thanks for all your help!

nhorman closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rngd and the LRNG #173

rngd and the LRNG #173

smuellerDD commented May 31, 2022

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented Jun 1, 2022 via email

smuellerDD commented Oct 11, 2022 via email

nhorman commented Oct 11, 2022

smuellerDD commented Oct 11, 2022 via email

nhorman commented Oct 11, 2022

rngd and the LRNG #173

rngd and the LRNG #173

Comments

smuellerDD commented May 31, 2022

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented May 31, 2022 via email

nhorman commented May 31, 2022

smuellerDD commented Jun 1, 2022 via email

smuellerDD commented Oct 11, 2022 via email

nhorman commented Oct 11, 2022

smuellerDD commented Oct 11, 2022 via email

nhorman commented Oct 11, 2022