New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't get a lock when querying the parent reseed_count (alternative version) #20970
Conversation
Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not. In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate. This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case. Partially fixes openssl#20286
Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all. Partially fixes openssl#20286
Needs some docs I think. |
if (drbg == NULL || drbg->lock == NULL) | ||
return 1; | ||
return CRYPTO_THREAD_write_lock(drbg->lock); | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just wrong: the primary DRBG will no longer be protected by a lock and it is accessed by multiple threads. Chaos ensues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the case. The primary DRBG is still locked. It just does not lock in response to the evp lock/unlock calls any more. The DRBG manages its own locking instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Should this be incrementing a counter/metric or something, so we can "get a hint" if something needs to happen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what purpose? It's slightly tricky to do. It would probably have to be implemented as an atomic to give reliable results. But I'm not sure what we then do with that information? It can't really be used for any decision making. An increment of the counter just tells you that a thread somewhere thinks a lock is necessary for the thing it wants to do. You could validate that the number of lock calls equals the number of unlock calls....but you can only really do that when the drbg is finished with. I suppose you could detect that we've had more unlock calls that lock calls immediately. But these are all "should not happen" scenarios which could only really be used for "assert" checking. I'm not sure the extra cost of an atomic is worth it for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, then, perhaps it should just be removed? If it's not actually be used to "hint at" anything, it's kinda pointless to keep it. Doesn't block my approval, though.
Also adding an OTC hold here: are we willing to accept non-thread safe access to the single primary DRBG instance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.
We should, at the minimum, deprecate them.
PROV_DRBG *drbg = (PROV_DRBG *)vdrbg; | ||
int ret; | ||
|
||
if (drbg->lock != NULL && !CRYPTO_THREAD_write_lock(drbg->lock)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth moving the lock/unlock code into drbg.c?
It's the same code & functions for all three DRBGs, so seems like unnecessary duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure it's worth it. It's like one if line for lock and similarly for unlock.
I've added some performance figures for this PR here |
OTC: master only as there is some risk involved and the patch is invasive. |
Can the OTC explain the risk? |
Because I think this needs documentation and understanding the risks are important to document. |
This is a fairly significant change in the implementation of the DRBGs and moves some of the ctx param "getter" support outside of lock protection altogether. Locking code is tricky to get right and there's enough code changing that bugs might creep in which is undesirable in a stable release. If we get this wrong then its likely to have broad impacts because it is in the DRBG code - which is used from more or less everywhere. The risk is a lot less in master which will see several rounds of alpha and beta testing before it is released (which stable release patch updates do not go through). |
I don't think we can remove them any time soon. DRBGs may already have been written that rely on this. We're probably stuck with them for now. We could possibly deprecate them with a view to removing them in the future. On the other hand they are harmless. Another PR in any case. |
Ping for second review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions, but I want that one comment reviewed.
As an aside, does OpenSSL have a standard for functions ending in unlocked
, locked
and no_lock
, such that theose suffixes are used consistently? And how to ensure the meaning of unlocked
vs. no_lock
is enforced?
if (drbg == NULL || drbg->lock == NULL) | ||
return 1; | ||
return CRYPTO_THREAD_write_lock(drbg->lock); | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Should this be incrementing a counter/metric or something, so we can "get a hint" if something needs to happen here?
} | ||
|
||
/* Interpreted as a hint only and ignored as for ossl_drbg_lock() */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: In a similar vein, a counter/metric to track this? Even if just in a debug build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
@@ -487,13 +486,16 @@ int ossl_prov_drbg_uninstantiate(PROV_DRBG *drbg) | |||
/* | |||
* Reseed |drbg|, mixing in the specified data | |||
* | |||
* Requires that drbg->lock is already locked for write, if non-null. | |||
* Acquires the drbg->lock for writing, if non-null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this updated comment accurate? I'm not seeing a lock acquisition, except in the wrapper to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. This should be against the wrapper. Fixed!
@@ -212,7 +212,7 @@ OSSL_FUNC_rand_clear_seed_fn ossl_drbg_clear_seed; | |||
\ | |||
for (i = 0; i < OSSL_NELEM(v); i++) \ | |||
if ((v)[i] != 0) \ | |||
return 0; \ | |||
goto err; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already part of the code, but changing flow control within a macro makes me shudder. Would this be better implemented as a function doing a memcmp()
-like thing (yeah, I know aliasing) with a return value to change the flow control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I agree. But fixing this starts touching various unrelated bits of code that I'd rather not bring into the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible future change. But yeah, out of context for this PR.
I think we need some kind of policy here. We can't get rid of having these APis defined and making an attempt to call them if they are non-NULL. That would be a breaking change and would have to go via the deprecation process. Callers really should be null checking all provider API function pointers and not calling them if they are NULL. If this is being done, removal of these from the dispatch table is harmless. |
040ea69
to
cf1211d
Compare
Fixup pushed. @paulidale / @t8m / @tmshort - please take another look/reconfirm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good from my end.
if (drbg == NULL || drbg->lock == NULL) | ||
return 1; | ||
return CRYPTO_THREAD_write_lock(drbg->lock); | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, then, perhaps it should just be removed? If it's not actually be used to "hint at" anything, it's kinda pointless to keep it. Doesn't block my approval, though.
This pull request is ready to merge |
and merged! |
Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not. In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate. This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20970)
Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20970)
Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not. In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate. This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl/openssl#20970)
Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl/openssl#20970)
@paulidale @t8m |
OpenSSL Technical Committee (OTC) discussed this and decided that it should be for master only and not backported: |
This is an alternative fix to #20944.
This enables the parent reseed_count to be queried without a lock and only requires changes to our own DRBG implementations. It makes no change to the EVP_RAND layer and nor does it add or amend the libcrypto->provider interface. So, there are no API or behaviour changes required for this.
Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not.
In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate.
This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case.
Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all.
Partially fixes #20286