-
-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race in RAND_DRBG_generate #7399
Fix data race in RAND_DRBG_generate #7399
Conversation
These memory barrier discussions always make my head explode and I ask myself why compilers and CPUs are allowed to push out-of-order-execution beyond all limits, such that an average programmer is not able anymore to write correct code without a deep understanding of the hardware of his computer. (As a mathematician, I prefer functional programming, because it is thread safe without assumptions on the underlying hardware: no mutations, no race conditions ;-) ). Anyhow, I'd really like to understand what you are doing here, but I will need some more time to dig deeper. |
This is not true. You don't need to understand hardware, only compiler. It's sufficient to recognize that barriers are about what compiler can do with loads and stores around the barriers. That's it. Hardware is supposed to comply, no questions asked, out-of-order or not. For example release barrier means that all preceding stores are complete. Imagine you initialize a structure, and you want to ensure that it's completely initialized past point X. This point is release barrier. Note that it doesn't provide any synchronization in sense that another thread can't "wait" on release barrier. The only guarantee another thread gets that if it somehow "knows" that barrier was passed, then structure is "stable". And naturally acquire barrier is when no following loads are executed prior barrier. That is all to it. |
Well, it’s easy to understand the definition of memory barriers, but it’s hard to understand all implications of the presence (and more importantly, the absence) of memory barriers in multithreaded applications. If it were easy, we wouldn’t be discussing it all the time... |
crypto/rand/drbg_lib.c
Outdated
@@ -689,7 +679,8 @@ int RAND_DRBG_generate(RAND_DRBG *drbg, unsigned char *out, size_t outlen, | |||
reseed_required = 1; | |||
} | |||
if (drbg->reseed_prop_counter > 0 && drbg->parent != NULL) { | |||
if (drbg->reseed_prop_counter != drbg->parent->reseed_prop_counter) | |||
if (drbg->reseed_prop_counter | |||
!= tsan_load(&drbg->parent->reseed_prop_counter)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when it's not needed, I would like to see all reference to reseed_prop_counter annotated with one of the macro's, just to make sure we actually think about what we expect from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense. Maybe the child drbg could be a parent to another drbg.
There are also references in the test suite, but lets not touch them.
One thing is important here, that tsan_store/tsan_load as it stands does not have a formal guarantee That does mean, if one CPU does Therefore my patch does only use the return value of tsan_load in a != expression. However when the parent is locked in the reseed code path we see a consistent value, That's the trick. |
@@ -230,7 +231,8 @@ struct rand_drbg_st { | |||
* is added by RAND_add() or RAND_seed() will have an immediate effect on | |||
* the output of RAND_bytes() resp. RAND_priv_bytes(). | |||
*/ | |||
unsigned int reseed_prop_counter; | |||
TSAN_QUALIFIER unsigned int reseed_prop_counter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll note that tsan_assist.h has this comment.
* Special note about TSAN_QUALIFIER. It might be undesired to use it in
* a shared header. Because whether operation on specific variable or member
* is atomic or not might be irrelevant in other modules. In such case one
* can use TSAN_QUALIFIER in cast specifically when it has to count.
Disclaimer. Below is based rather on what is being said, than in-depth code analysis.
Things don't add up here. If one doesn't make assumption that loads/stores are single-instruction operations, then locking guarantees nothing if another threads doesn't do locking as well. And from what I hear it doesn't. I mean considering T1: BTW, here is related question. Assuming that loads/stores are single-instruction [on naturally aligned data], and T2 doesn't do locking. Would |
I may be wrong, but how I read the code, this happens: The parent is updating the counter, with both, locked mutex and tsan_load/store. (*): This access may fail and return a wrong value, but the error is unlikely and not persistent. |
Regarding the barrier, yes they are necessary. |
The point I'm trying to make is following. If one doesn't make assumption that [integer] loads/stores are single-instruction operations, then solution would have to be entirely different. Elsewhere I made a remark that tsan_ld_acq' and tsan_st_rel's occurrences have to be #if-conditional with locked fall-back code paths. And if one doesn't make the abovementioned assumption, then occurrences of all tsan macros would have to be made #if-conditional with fall-back code paths. Now, I'm not suggesting to not make the assumption. On the contrary, I'm saying it's already effectively made, and it's safe to make it on all supported platforms. And as for unsupported platforms on which the assumption would not be safe to make, it would be appropriate to just declare incompatibility with multi-threading (and add threads to disable in platform definition). Why? Because not making the assumption will result in fall-back code that will never be exercised, never ironed in real life. Not to mention that it's likely to turn to be wasted effort. I mean there is no such platform among supported, so we are talking about a possibility, and it might just as well never realize. |
On quick second thought, it might be possible to arrange dedicated platform-specific #if section in tsan_assist.h, so that one doesn't have to put it as "go away". To recap. The dispute started with observation that last-resort tsan macros are plain references such as #define tsan_load(p) (*(p)), and assertion that it doesn't fly in "spherical vacuum". It's formally correct assertion, but suggestion is to deal with it when/if we get our hands on "spherical vacuum." |
@@ -178,6 +178,8 @@ size_t rand_drbg_get_entropy(RAND_DRBG *drbg, | |||
prediction_resistance, | |||
NULL, 0) != 0) | |||
bytes = bytes_needed; | |||
drbg->reseed_next_counter | |||
= tsan_load(&drbg->parent->reseed_prop_counter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a default call back. I'm not sure if we should be doing this here. Do we support custom callbacks where they still use the parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't know how to implement custom callback functions, when the RAND_DRBG structure
is only defined by the local include crypto/rand/rand_lcl.h, no access functions exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is a little that this now uses something in the rand_lcl.h file while it didn't before, and so an other callback can not do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think when a non-default callback is used it will not work with the case parent != NULL,
but it might work with the case parent == NULL.
I wonder if the parent should be checked to be == NULL in RAND_DRBG_set_callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I moved the case where parent == NULL to the caller.
And check that parent == NULL in the RAND_DRBG_set_callbacks.
crypto/rand/rand_lib.c
Outdated
drbg->reseed_next_counter++; | ||
if(!drbg->reseed_next_counter) | ||
drbg->reseed_next_counter = 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can move this block to the caller, only the case parent != NULL need to stay.
deny changing get_entropy when parent != NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really lost me. The solution gets more complicated with every change. And prohibiting RAND_DRBG_set_callbacks()
for child DRBGs goes down the wrong path IMO. Maybe we should try a restart and go back to the original problem, because the original task was relatively simple:
Assume that a specific thread issues a RAND_add()
call (which reseeds the master
DRBG). Then this reseeding
- must have an immedate effect to a subsequent
RAND_bytes()
call (serviced by the per-threadpublic
DRBG) of the same thread. An analogue must hold for RAND_priv_bytes(). - Calls to
RAND_bytes()
resp.RAND_priv_bytes()
by other threads should ideally notice the reseeding of themaster
DRBG "soon", but it does not really matter whether they do it right away or on the next call.
It might be the case that I messed the solution up because I did not pay enough attention to technical issues like memory barriers, but I refuse to accept that such a simple problem requires such a complicated solution. :-/
@@ -761,7 +769,8 @@ int RAND_DRBG_set_callbacks(RAND_DRBG *drbg, | |||
RAND_DRBG_get_nonce_fn get_nonce, | |||
RAND_DRBG_cleanup_nonce_fn cleanup_nonce) | |||
{ | |||
if (drbg->state != DRBG_UNINITIALISED) | |||
if (drbg->state != DRBG_UNINITIALISED | |||
|| drbg->parent != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prohibiting RAND_DRBG_set_callbacks()
for child DRBGs is a severe restriction of the API and goes down the wrong path IMO :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I do not see, how the callback can do the magic that rand_drbg_get_entropy
does when drbg->master != NULL.
How is it supposed to do "rand_drbg_lock(drbg->parent)"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are right! We did so many changes to the DRBG code in the last year that I wasn't quite aware anymore that the chaining was kept entirely private. And thinking twice about all these problems we will probably have an easier live if we never make this feature public. In other words: just forget my previous comment. :-)
The original problem is there are race conditions, that make the child DRBG However it looks like I made the mistake to start a code review on the data flow of |
I guess this flow of information from child to parent and back is the source of all problems, in particular because the parent is accessed concurrently. Actually, if a thread calls RAND_add(), then it would be enough if that thread would set a 'reseed_required' flag in both its thread-local public DRBG and the master DRBG. Then a subsequent call of RAND_bytes() would have the desired effect to trigger a reseeding of both DRBGs. If the master was already reseeded in the mean time by a concurrent thread, that would not matter. Would this approach make life easier? |
The use case I am concerned about is thread 1 calls RAND_add with lousy entryopy. Then thread 2 call RAND_bytes and picks up some entopy from parent. ... gets interrupted ... and now thread 3 calls RAND_add with good entropy. and increments reseed_counter from 2 to 3. Thread 2 wakes up, and grabs reseed counter 3, Best solution would be to have get_entropy return the reseed_counter together with the entropy. Second best is what I try with this PR. |
By the way, how is this going to work with a custom get_entropy ?
If I see that right, then the RAND_seed is going to fail, right? |
If the entropy is "lousy", i.e., if it is below the security strength, then starting with pr #7456 the random input will be treated as additional_data only (i.e., with entropy=0) and the DRBG does a full blown reseed getting all its entropy from the trusted source, see here.
Ok, meanwhile master has been fully reseeded twice.
As I said, thread 3 does not care about whether or when thread 2 reseeds, it just cares whether it can itself call RAND_bytes() immediately afterwards and the entropy it just pushed in will be mixed into the random data it gets returned. Also it is wrong that thread 2 "never has a reason to reseed", because even if thread 2 doesn't notice master's reseeding, it will automatically reseed when its reseed_interval or reseed_time_interval has elapsed. And when that happens, it will obtain random data from master, which has been fully reseeded twice in the meantime. Of course it would be nice if thread 2 would reseed right after thread's 3 RAND_add() call, but if it doesn't happen, nobody cares. |
You are right. It is going to fail. It just wont work and it is not intended to work. PERIOD. This code is a workaround, it is a compromise between the totally diametrically opposed reseeding philosophies of the old RAND and the new RAND_DRBG API. It is there to keep existing applications working, by bringing both worlds together.
All this RAND_add() and RAND_seed() stuff is not part of the NIST DRBG design, but it is the way how existing programs reseed. And the way they do it must continue working, because we promised not to break them. Newer application should make a clear decision whether they want to stick with the old reseeding pattern or use the new one. But don't mix both, otherwise they are looking for trouble. Sorry in advance if the first sentence sounded rude. But it feels to me like I explained this stuff already thousand times (not to you) since I started working on the DRBG over a year ago. Meanwhile it really exhausts me. If you really want to know all the painful details, you might want to start with reading #4328 and/or NIST SP800 90Ar1 from the beginning to the end. |
After a night of sleep and with less emotions my answer is: Yes, you are completely right. The way it is implemented currently, RAND_seed() will stop working if somebody implements his own callback. If we want to support using both reseeding mechanisms at the same time then this part (the picking up of the added entropy) needs to be moved out of the callback and into the internal DRBG implementation at some time in the near future. I'll give it a thought. My personal opinion is that the application's seeding attempts would become completely irrelevant if the random generator would just work and reseed automatically. No application programmer would miss the burdon of having to seed his RNG properly. In 1.0.2, application programmers had to provide their own locking callbacks, otherwise OpenSSL would gloriously fail in multithreaded programs. Now since 1.1.0, OpenSSL does its locking completely by itself and the callbacks are gone. Did anybody complain about not being able to provide their own locking mechanism anymore? So why don't we just deprecate RAND_seed() and make it a no-op in future versions? It is no question that people should retain control over the choice of entropy source, but they should be enabled to do it simply by editing a config file. Also, it should be possible to add support for new RNG hardware without patching OpenSSL or recompiling the applications, simply by writing some sort of engine for the new hardware RNG, which can then be selected in the config file. If this would be implemented, who would still care about RAND_seed()? (Any application which really needs more control over the reseeding, can use the locking callbacks which are more powerful.) My favorite solution is that we deprecate RAND_seed() |
@kroeckx @paulidale, I'd be happy to hear your independent opinion about what I wrote in my previous post. |
Merged to master as a83dc59, |
1.1.1 PR here: #7502 |
I think it's the same reason why my cherry-pick failed, see #7456 (comment). It was caused by the renaming of I am of thinking of backporting the renaming (only), otherwise we will face this problem more often. |
It seems there was more than one variable renamed: |
I reopened this one, because the cherry-pick has been placed on hold until the renaming has been backported. (See discussion in #7502) |
In commit 8bf3665 some renamings andd typo fixes were made while adding back the DRBG-HMAC and DRBG-HASH implementation. Since the commit could not be backported, a lot of unnecessary differences between master and 1.1.1 were introduced. These differences result in tiresome merge conflicts when cherry-picking. To minimize these merge-conflicts, this patch ports all 'non-feature' changes of commit 8bf3665 (e.g., renamings of private variables, fixes of typographical errors, comment changes) manually back to 1.1.1. The commits a83dc59 (openssl#7399) and 8817215 (openssl#7456) failed to cherry-pick previously to 1.1.1, with this patch they both cherry-pick without conflicts.
Fix is in #7505. If you don't mind, I'll take care of the cherry-picking as soon as the fix is merged to 1.1.1. |
In commit 8bf3665 some renamings andd typo fixes were made while adding back the DRBG-HMAC and DRBG-HASH implementation. Since the commit could not be backported, a lot of unnecessary differences between master and 1.1.1 were introduced. These differences result in tiresome merge conflicts when cherry-picking. To minimize these merge-conflicts, this patch ports all 'non-feature' changes of commit 8bf3665 (e.g., renamings of private variables, fixes of typographical errors, comment changes) manually back to 1.1.1. The commits a83dc59 (#7399) and 8817215 (#7456) failed to cherry-pick previously to 1.1.1, with this patch they both cherry-pick without conflicts. Reviewed-by: Richard Levitte <levitte@openssl.org> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> (Merged from #7505)
Finally merged to 1.1.1, thanks! |
[will be squashed with the previous commit using the following commit message] In a nutshell, reseed propagation is a compatibility feature with the sole purpose to support the traditional way of (re-)seeding manually by calling 'RAND_add()' before 'RAND_bytes(). It ensures that the former has an immediate effect on the latter *within the same thread*, but it does not care about immediate reseed propagation to other threads. The implementation is lock-free, i.e., it works without taking the lock of the primary DRBG. Pull request openssl#7399 not only fixed the data race issue openssl#7394 but also changed the original implementation of the seed propagation unnecessarily. This commit reverts most of the changes of commit 1f98527 and intends to fix the data race while retaining the original simplicity of the seed propagation. - use atomics with relaxed semantics to load and store the seed counter - add a new member drbg->enable_reseed_propagation to simplify the overflow treatment of the seed propagation counter - don't handle races between different threads This partially reverts commit 1f98527.
In a nutshell, reseed propagation is a compatibility feature with the sole purpose to support the traditional way of (re-)seeding manually by calling 'RAND_add()' before 'RAND_bytes(). It ensures that the former has an immediate effect on the latter *within the same thread*, but it does not care about immediate reseed propagation to other threads. The implementation is lock-free, i.e., it works without taking the lock of the primary DRBG. Pull request #7399 not only fixed the data race issue #7394 but also changed the original implementation of the seed propagation unnecessarily. This commit reverts most of the changes of commit 1f98527 and intends to fix the data race while retaining the original simplicity of the seed propagation. - use atomics with relaxed semantics to load and store the seed counter - add a new member drbg->enable_reseed_propagation to simplify the overflow treatment of the seed propagation counter - don't handle races between different threads This partially reverts commit 1f98527. Reviewed-by: Paul Dale <paul.dale@oracle.com> (Merged from #12759)
This commit is the port of commit 958fec7 from the 1.1.1 stable branch to master. In a nutshell, reseed propagation is a compatibility feature with the sole purpose to support the traditional way of (re-)seeding manually by calling 'RAND_add()' before 'RAND_bytes(). It ensures that the former has an immediate effect on the latter *within the same thread*, but it does not care about immediate reseed propagation to other threads. The implementation is lock-free, i.e., it works without taking the lock of the primary DRBG. Pull request openssl#7399 not only fixed the data race issue openssl#7394 but also changed the original implementation of the seed propagation unnecessarily. This commit reverts most of the changes of commit 1f98527 and intends to fix the data race while retaining the original simplicity of the seed propagation. - use atomics with relaxed semantics to load and store the seed counter - add a new member drbg->enable_reseed_propagation to simplify the overflow treatment of the seed propagation counter - don't handle races between different threads This partially reverts commit a83dc59.
Fixes #7394
Okay, this is my idea how to fix with out relying on atomic relaxed semantic.