Fix the DRBG reseed propagation [1.1.1] #12759
Conversation
|
The two commits ba5b2dfaa46c and d86bb47aab98 were left separated intentionally to facilitate comparing the old and the new fix. They will be squashed in the end, unless you prefer them to remain separated. |
| else | ||
| tsan_store(&drbg->reseed_counter, | ||
| tsan_load(&drbg->parent->reseed_counter)); | ||
| } | ||
|
|
bernd-edlinger
Sep 1, 2020
Member
After all, all what was missing from my original solution was to convert some increments and assignments to atomic operations.
No, that completely misses the point.
The race happens here: drbg->parent->reseed_counter
can be incremented, after get_entropy returns.
So the reseed_counter does no longer tell you,
which generation was used above.
After all, all what was missing from my original solution was to convert some increments and assignments to atomic operations.
No, that completely misses the point.
The race happens here: drbg->parent->reseed_counter
can be incremented, after get_entropy returns.
So the reseed_counter does no longer tell you,
which generation was used above.
mspncp
Sep 1, 2020
Author
Contributor
I don't care about race conditions between different threads. All I need is that a call to RAND_add() triggers an immediate reseed on a subsequent RAND_bytes() call within the same thread.
I don't care about race conditions between different threads. All I need is that a call to RAND_add() triggers an immediate reseed on a subsequent RAND_bytes() call within the same thread.
mspncp
Sep 1, 2020
Author
Contributor
And this is what my solution achieves, doesn't it?
And this is what my solution achieves, doesn't it?
bernd-edlinger
Sep 1, 2020
Member
Okay, when you dont care about other threads, using outdated seed material then that is fine.
Okay, when you dont care about other threads, using outdated seed material then that is fine.
mspncp
Sep 1, 2020
Author
Contributor
Assume an multithreaded application would have called RAND_add() in one of several threads in OpenSSL <= 1.1.0. Then unless the application would have provided it's own synchronization, there wouldn't have been any guarantee either whether another thread would pick up the fresh entropy or not.
Assume an multithreaded application would have called RAND_add() in one of several threads in OpenSSL <= 1.1.0. Then unless the application would have provided it's own synchronization, there wouldn't have been any guarantee either whether another thread would pick up the fresh entropy or not.
mspncp
Sep 1, 2020
•
Author
Contributor
No, I don't think we talk past each other: I understand that the RAND_bytes() call of thread 2 can fail to pick up the entropy provided by thread 1 if it is executed between the reseeding (holding the lock) and the incrementing of the counter (using only atomics). But my point is that thread 2 can also fail to pick up the entropy of thread 1 in RAND_bytes(), if thread 1 is paused in RAND_add() immediately before the reseeding.
Because calling RAND_bytes() and RAND_seed() concurrently from two different threads is inherently a race condition, no matter how you slice it (pun intended). In order to ensure that the entropy of thread 1 is picked up by thread 2 before the latter does something important, you need extra synchronization by the application.
No, I don't think we talk past each other: I understand that the RAND_bytes() call of thread 2 can fail to pick up the entropy provided by thread 1 if it is executed between the reseeding (holding the lock) and the incrementing of the counter (using only atomics). But my point is that thread 2 can also fail to pick up the entropy of thread 1 in RAND_bytes(), if thread 1 is paused in RAND_add() immediately before the reseeding.
Because calling RAND_bytes() and RAND_seed() concurrently from two different threads is inherently a race condition, no matter how you slice it (pun intended). In order to ensure that the entropy of thread 1 is picked up by thread 2 before the latter does something important, you need extra synchronization by the application.
mspncp
Sep 1, 2020
•
Author
Contributor
I also understand that in your case the failure is more permanent, because the secondary DRBG will think it‘s „up to date“. But who cares, it is properly seeded and will reseed automatically when its generate counter reaches the reseed interval.
I also understand that in your case the failure is more permanent, because the secondary DRBG will think it‘s „up to date“. But who cares, it is properly seeded and will reseed automatically when its generate counter reaches the reseed interval.
bernd-edlinger
Sep 1, 2020
Member
Yes, but I said: when tread 2 continues to call RAND_bytes it will not call get_entropy again since the
reseed_counters agree but the entropy was from a previous seeding.
The reseed interval may be hours...
Yes, but I said: when tread 2 continues to call RAND_bytes it will not call get_entropy again since the
reseed_counters agree but the entropy was from a previous seeding.
The reseed interval may be hours...
mspncp
Sep 1, 2020
•
Author
Contributor
With due respect, I think now you are missing the point. Your setup is a little bit academical and does not represent a real life use case: A CTR-DRBG which has been seeded with 256 bits of entropy is considered cryptographically secure for up to 2^48 generate requests before a reseeding is mandated by NIST SP 800 90Ar1. In libcrypto the threshold for a public DRBG before it reseeds from the primary DRBG is way below that limit, namely after 2^16 generate requests or 7 minutes.
So even if the thread missed the entropy shower poured by another unrelated thread, that lapse won't last longer than 7 minutes. If you consider that too much, you can lower the threshold as you like.
The main purpose of regular reseeding is to have a countermeasure against the case where the internal state of the DRBG gets compromised (e.g. by stealing it or by some side channel attack). Because in that case the entire future output of the DRBG becomes fully predictable (but not the past output). To mitigate this problem, the internal state needs to be refreshed periodically. For normal usage of RAND_bytes() the above thresholds are fully sufficient and it is irrelevant if the public DRBG misses some entropy shower of an unrelated thread or not.
Conversely, if you are generating a top secret master key for the next thirty years and you feed extra entropy into the CSPRNG using RAND_add() to improve prediction resistance (the DRBG has better ways to do that), then your entropy dose better take immediate effect to the subsequent RAND_bytes() call. It's not enough to succeed on the second attempt, because there won't bee a second try. If you fail the first attempt, you loose.
OpenSSL 1.1.1 guarantees that the call sequence RAND_add(); RAND_bytes(); continues to work as it did in older versions. But the guarantee holds only within the same thread. Because of the previous reasons and because this can be implemented in a simple lock-free fashion. And I believe that it is important for the performance of a multithreaded web application to avoid locking the primary DRBG, otherwise it might become a bottleneck.
With due respect, I think now you are missing the point. Your setup is a little bit academical and does not represent a real life use case: A CTR-DRBG which has been seeded with 256 bits of entropy is considered cryptographically secure for up to 2^48 generate requests before a reseeding is mandated by NIST SP 800 90Ar1. In libcrypto the threshold for a public DRBG before it reseeds from the primary DRBG is way below that limit, namely after 2^16 generate requests or 7 minutes.
So even if the thread missed the entropy shower poured by another unrelated thread, that lapse won't last longer than 7 minutes. If you consider that too much, you can lower the threshold as you like.
The main purpose of regular reseeding is to have a countermeasure against the case where the internal state of the DRBG gets compromised (e.g. by stealing it or by some side channel attack). Because in that case the entire future output of the DRBG becomes fully predictable (but not the past output). To mitigate this problem, the internal state needs to be refreshed periodically. For normal usage of RAND_bytes() the above thresholds are fully sufficient and it is irrelevant if the public DRBG misses some entropy shower of an unrelated thread or not.
Conversely, if you are generating a top secret master key for the next thirty years and you feed extra entropy into the CSPRNG using RAND_add() to improve prediction resistance (the DRBG has better ways to do that), then your entropy dose better take immediate effect to the subsequent RAND_bytes() call. It's not enough to succeed on the second attempt, because there won't bee a second try. If you fail the first attempt, you loose.
OpenSSL 1.1.1 guarantees that the call sequence RAND_add(); RAND_bytes(); continues to work as it did in older versions. But the guarantee holds only within the same thread. Because of the previous reasons and because this can be implemented in a simple lock-free fashion. And I believe that it is important for the performance of a multithreaded web application to avoid locking the primary DRBG, otherwise it might become a bottleneck.
paulidale
Sep 8, 2020
Contributor
An aside: RAND_bytes() locked in 1.0.2 and thus every request would be reseeded after RAND_add(). Such as it was reseeded.
@bernd-edlinger I don't think this change in semantics is a major issue because of the automatic reseeding @mspncp mentioned. Still, it was worthwhile raising it.
An aside: RAND_bytes() locked in 1.0.2 and thus every request would be reseeded after RAND_add(). Such as it was reseeded.
@bernd-edlinger I don't think this change in semantics is a major issue because of the automatic reseeding @mspncp mentioned. Still, it was worthwhile raising it.
|
(Note: i unresolved the two conversations again because I don't want that an essential part of the discussion is hidden from the reader by GitHub) |
|
LGTM. I thought my head would hurt by looking at this and it does. |
crypto/rand/drbg_lib.c
Outdated
Show resolved
Hide resolved
| if (drbg->parent == NULL) | ||
| tsan_counter(&drbg->reseed_counter); | ||
| else | ||
| tsan_store(&drbg->reseed_counter, |
paulidale
Sep 8, 2020
Contributor
I don't understand why tsan_store is required since only this thread accesses this counter. Still, it's harmless to have it here.
I don't understand why tsan_store is required since only this thread accesses this counter. Still, it's harmless to have it here.
mspncp
Sep 9, 2020
Author
Contributor
I also pondered whether it is necessary, but then I left it because I had the case in mind where the child DRBG itself is accessed concurrently by "grandchildren" DRBGs.
This does not occur in our standard DRBG triple setup, but might be a setup in some application. (The new EVP_RAND does support chaining by the application, doesn't it?) Of course the application needs to guard concurrent access to the child DRBG (in the middle of the chain) by a lock, but that does not guarantee an atomic write operation of the reseed_counter, if the grandchildren use only atomics with relaxed semantics and not a read lock (of an rwlock). At least that's how I understood @bernd-edlinger's explanations.
I also pondered whether it is necessary, but then I left it because I had the case in mind where the child DRBG itself is accessed concurrently by "grandchildren" DRBGs.
This does not occur in our standard DRBG triple setup, but might be a setup in some application. (The new EVP_RAND does support chaining by the application, doesn't it?) Of course the application needs to guard concurrent access to the child DRBG (in the middle of the chain) by a lock, but that does not guarantee an atomic write operation of the reseed_counter, if the grandchildren use only atomics with relaxed semantics and not a read lock (of an rwlock). At least that's how I understood @bernd-edlinger's explanations.
paulidale
Sep 9, 2020
Contributor
Good point.
Good point.
|
This pull request is ready to merge |
This reverts commit 1f98527.
[will be squashed with the previous commit using the following commit message] In a nutshell, reseed propagation is a compatibility feature with the sole purpose to support the traditional way of (re-)seeding manually by calling 'RAND_add()' before 'RAND_bytes(). It ensures that the former has an immediate effect on the latter *within the same thread*, but it does not care about immediate reseed propagation to other threads. The implementation is lock-free, i.e., it works without taking the lock of the primary DRBG. Pull request #7399 not only fixed the data race issue #7394 but also changed the original implementation of the seed propagation unnecessarily. This commit reverts most of the changes of commit 1f98527 and intends to fix the data race while retaining the original simplicity of the seed propagation. - use atomics with relaxed semantics to load and store the seed counter - add a new member drbg->enable_reseed_propagation to simplify the overflow treatment of the seed propagation counter - don't handle races between different threads This partially reverts commit 1f98527.
|
I had to rebase to resolve two merge conflicts which were caused by whitespace changes ( <<<<<<< variant A
drbg->reseed_next_counter = tsan_load(&drbg->reseed_prop_counter);
if (drbg->reseed_next_counter) {
drbg->reseed_next_counter++;
if (!drbg->reseed_next_counter)
drbg->reseed_next_counter = 1;
}
>>>>>>> variant B
####### Ancestor
drbg->reseed_next_counter = tsan_load(&drbg->reseed_prop_counter);
if (drbg->reseed_next_counter) {
drbg->reseed_next_counter++;
if(!drbg->reseed_next_counter)
drbg->reseed_next_counter = 1;
}
======= end<<<<<<< variant A
drbg->reseed_next_counter = tsan_load(&drbg->reseed_prop_counter);
if (drbg->reseed_next_counter) {
drbg->reseed_next_counter++;
if (!drbg->reseed_next_counter)
drbg->reseed_next_counter = 1;
}
>>>>>>> variant B
####### Ancestor
drbg->reseed_next_counter = tsan_load(&drbg->reseed_prop_counter);
if (drbg->reseed_next_counter) {
drbg->reseed_next_counter++;
if(!drbg->reseed_next_counter)
drbg->reseed_next_counter = 1;
}
======= end |
|
|
The original names were more intuitive: the generate_counter counts the
number of generate requests, and the reseed_counter counts the number
of reseedings (of the principal DRBG).
reseed_gen_counter -> generate_counter
reseed_prop_counter -> reseed_counter
This partially reverts commit 35a3450.
|
The last force-push replaces the two last commits, because I commuted the fixup with the renaming. These are the tree changes I made. |
|
Sorry, your approval came too quick :-) |
|
SLGTM |
|
I'm okay for the 24 hour wait to be considered done at this point. The subsequent changes are trivial or asked for. |
|
Thanks for the review and the exemption from the grace period. Nevertheless, I'll sleep over it and do the merge tomorrow morning. :-) |
|
I'm so sorry, but I had a last look and found another tiny issue, which is fixed in e0eed1a. |
|
Good catch. |
|
24 hours has passed since 'approval: done' was set, but as this PR has been updated in that time the label 'approval: ready to merge' is not being automatically set. Please review the updates and set the label manually. |
In a nutshell, reseed propagation is a compatibility feature with the sole purpose to support the traditional way of (re-)seeding manually by calling 'RAND_add()' before 'RAND_bytes(). It ensures that the former has an immediate effect on the latter *within the same thread*, but it does not care about immediate reseed propagation to other threads. The implementation is lock-free, i.e., it works without taking the lock of the primary DRBG. Pull request #7399 not only fixed the data race issue #7394 but also changed the original implementation of the seed propagation unnecessarily. This commit reverts most of the changes of commit 1f98527 and intends to fix the data race while retaining the original simplicity of the seed propagation. - use atomics with relaxed semantics to load and store the seed counter - add a new member drbg->enable_reseed_propagation to simplify the overflow treatment of the seed propagation counter - don't handle races between different threads This partially reverts commit 1f98527. Reviewed-by: Paul Dale <paul.dale@oracle.com> (Merged from #12759)
The original names were more intuitive: the generate_counter counts the
number of generate requests, and the reseed_counter counts the number
of reseedings (of the principal DRBG).
reseed_gen_counter -> generate_counter
reseed_prop_counter -> reseed_counter
This partially reverts commit 35a3450.
Reviewed-by: Paul Dale <paul.dale@oracle.com>
(Merged from #12759)
Before explaining the issue and the fix, let me recap briefly what reseed propagation is all about. In a nutshell, it is a compatibility feature to support the traditional way of (re-)seeding manually by calling 'RAND_add()' before 'RAND_bytes(). I'll explain it for the 'public' DRBG, the situation for the 'private' DRBG is analogous.
Reseed Propagation
For performance reasons, every thread has its own 'public' DRBG instance, and all public DRBGs are chained to a common 'primary' DRBG, which seeds from the operating system. The primary DRBG is shared by all threads and guarded by a mutex. Every DRBG reseeds independently when its number of generate requests or its elapsed time interval since the last (re-)seeding exceeds a certain threshold. This is the 'normal' way to reseed according to the NIST standard.
When a thread of the application calls RAND_add() to seed the CSPRNG, it expects an immediate effect on the output of a subsequent RAND_bytes() call within the same thread. However, the former triggers a reseeding of the primary DRBG, whereas the latter generates the output using the thread-local 'public' DRBG. In this specific situation, OpenSSL needs to guarantee an immediate propagation of the reseeding to the 'public' DRBG within the same thread.
The original implementation
Again for performance reasons, the seed propagation was implemented in a non-blocking fashion, i.e. without taking the look of the primary DRBG: The primary DRBG maintains a reseed_counter (not to be confused with the generate_counter which counts the generate requests), which is incremented automatically when it reseeds.
openssl/crypto/rand/drbg_lib.c
Lines 612 to 615 in 1708e3e
The secondary DRBG has a copy of the reseed_counter, and whenever the two counters are out of sync, the secondary DRBG will reseed automatically.
openssl/crypto/rand/drbg_lib.c
Lines 439 to 444 in 1708e3e
In other words, the reseed_counter was used like a revision counter to detect and propagate the reseeding. Note that eventually the reseeding will propagate to the public DRBGs of all threads, but it is not guaranteed that it will happen immediately.
The issue
Part 1: the thread sanitizer warning
The original implementation did not use atomics, because I assumed that integer read and write operations where atomic with relaxed semantic, which would be sufficient for synchronization within the same thread. (In fact, I did not know much about memory ordering and aqcuire/release/relaxed semantics at that time.) This assumption turned out to be naive when @bernd-edlinger reported a thread sanitizer (TSAN) warning about a data race in RAND_DRBG_generate() in issue #7394:
The data race affects the aforementioned operations happening in different threads:
thread 8, frame 0
openssl/crypto/rand/drbg_lib.c
Lines 612 to 615 in 1708e3e
thread 27, frame 0
openssl/crypto/rand/drbg_lib.c
Lines 439 to 444 in 1708e3e
It was considered that the data race might be harmless but nevertheless it would be better to fix the warning (see #7394 (comment) and ff.).
Part 2: another data race
During the analysis of the issue, @bernd-edlinger noticed another potential race
The fix
Bernds pull request #7399 fixed the data races, but at the price that the original simplicity of the implementation got lost, in particular by the introduction of the new 'reseed_next_counter' member. Unfortunately, https://github.com/openssl/openssl/pull/7399/files#diff-9181ac017a6177a5f2619f65c9b7a346R411 also introduced a breaking change of the original semantics of the seed propagation: instead of copying the value from the parent, the secondary DRBGs now increment their counter by themselves. IIUC, a secondary DRBG which is behind the primary DRBG by three counts will reseed three times until it considers itself in-sync with the primary DRBG.
Alternative fix
My alternative proposal intends to fix the data race while retaining the original simplicity.
Note that in the meantime,
seed_counterhas been renamed toseed_prop_counterandgenerate_countertoseed_gen_counter(which is another issue to be fixed).Atomics with relaxed semantics should be sufficient
Since the incrementing of the reseed_counter of the principal DRBG is protected by the lock, my understanding is that it should be sufficient to read the value using an atomic operation with relaxed semantics, see also the comments by @kroeckx and @bernd-edlinger.
#7394 (comment)
#7394 (comment)
#7394 (comment)
See also this discussion between @dot-asm and @bernd-edlinger:
#7399 (comment) and ff.
Simplify the counter wrap
The counter wrap was complicated unnecessarily by the fact that
seed_counter == 0had the special meaning of turning seed propagation off. I removed that extra complexity by adding a separateenable_reseed_propagationmember.Remove the 'reseed_next_counter'
The 'reseed_next_counter' was introduced to fix a race between two different threads. But it is not the intention of the seed propagation to enforce reseed propagation across threads.
Restore the original names
A lot of additional confusion between the two counters
generate_counterandreseed_counterhas been created by the renaming toreseed_gen_counterandreseed_prop_counteron master, which subsequently has partially been reverted. This turned out to be a footgun during the provider replumbing (I hope you remember that, @paulidale.) For that reason, I am restoring the original names here on 1.1.1 and I will do the same on master.What about the master branch?
On the master branch, the implementation has deviated even more from the simple original idea.
openssl/providers/implementations/rands/drbg_local.h
Lines 181 to 192 in 714a1bb
It seems that 'get_parent_reseed_count()' nowadays takes the parent's lock before reading the reseed count, which completely thwarts the original intent of having a lock-free implementation.
openssl/providers/implementations/rands/drbg.c
Lines 137 to 146 in 714a1bb
I am working on a fix for master which will hopefully be in time for the beta1 freeze. I chose to raise the pull request for 1.1.1. first, because that's the root cause of the problem and it was easier to start with.
I'd be happy to get some feedback not only from @paulidale, but also from @bernd-edlinger and @kroeckx.