Don't get a lock when querying the parent reseed_count (alternative version) #20970

mattcaswell · 2023-05-15T15:10:09Z

This is an alternative fix to #20944.

This enables the parent reseed_count to be queried without a lock and only requires changes to our own DRBG implementations. It makes no change to the EVP_RAND layer and nor does it add or amend the libcrypto->provider interface. So, there are no API or behaviour changes required for this.

Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not.

In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate.

This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case.

Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all.

Partially fixes #20286

Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not. In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate. This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case. Partially fixes openssl#20286

Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all. Partially fixes openssl#20286

richsalz · 2023-05-15T17:16:35Z

Needs some docs I think.

paulidale · 2023-05-15T21:45:14Z

providers/implementations/rands/drbg.c

-    if (drbg == NULL || drbg->lock == NULL)
-        return 1;
-    return CRYPTO_THREAD_write_lock(drbg->lock);
+    return 1;


This is just wrong: the primary DRBG will no longer be protected by a lock and it is accessed by multiple threads. Chaos ensues.

This is not the case. The primary DRBG is still locked. It just does not lock in response to the evp lock/unlock calls any more. The DRBG manages its own locking instead.

Q: Should this be incrementing a counter/metric or something, so we can "get a hint" if something needs to happen here?

For what purpose? It's slightly tricky to do. It would probably have to be implemented as an atomic to give reliable results. But I'm not sure what we then do with that information? It can't really be used for any decision making. An increment of the counter just tells you that a thread somewhere thinks a lock is necessary for the thing it wants to do. You could validate that the number of lock calls equals the number of unlock calls....but you can only really do that when the drbg is finished with. I suppose you could detect that we've had more unlock calls that lock calls immediately. But these are all "should not happen" scenarios which could only really be used for "assert" checking. I'm not sure the extra cost of an atomic is worth it for that.

So, then, perhaps it should just be removed? If it's not actually be used to "hint at" anything, it's kinda pointless to keep it. Doesn't block my approval, though.

paulidale · 2023-05-15T21:46:10Z

Also adding an OTC hold here: are we willing to accept non-thread safe access to the single primary DRBG instance?

paulidale

Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.

We should, at the minimum, deprecate them.

paulidale · 2023-05-16T07:35:20Z

providers/implementations/rands/drbg_ctr.c

+    PROV_DRBG *drbg = (PROV_DRBG *)vdrbg;
+    int ret;
+
+    if (drbg->lock != NULL && !CRYPTO_THREAD_write_lock(drbg->lock))


Is it worth moving the lock/unlock code into drbg.c?
It's the same code & functions for all three DRBGs, so seems like unnecessary duplication.

I'm not really sure it's worth it. It's like one if line for lock and similarly for unlock.

mattcaswell · 2023-05-23T16:09:31Z

I've added some performance figures for this PR here

t8m · 2023-05-30T09:08:55Z

OTC: master only as there is some risk involved and the patch is invasive.

richsalz · 2023-05-30T13:57:12Z

Can the OTC explain the risk?

richsalz · 2023-05-30T13:57:33Z

Because I think this needs documentation and understanding the risks are important to document.

mattcaswell · 2023-05-30T15:50:23Z

This is a fairly significant change in the implementation of the DRBGs and moves some of the ctx param "getter" support outside of lock protection altogether. Locking code is tricky to get right and there's enough code changing that bugs might creep in which is undesirable in a stable release. If we get this wrong then its likely to have broad impacts because it is in the DRBG code - which is used from more or less everywhere. The risk is a lot less in master which will see several rounds of alpha and beta testing before it is released (which stable release patch updates do not go through).

mattcaswell · 2023-05-30T16:55:07Z

Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.

We should, at the minimum, deprecate them.

I don't think we can remove them any time soon. DRBGs may already have been written that rely on this. We're probably stuck with them for now. We could possibly deprecate them with a view to removing them in the future. On the other hand they are harmless. Another PR in any case.

mattcaswell · 2023-05-30T16:55:23Z

Ping for second review

tmshort

A few suggestions, but I want that one comment reviewed.

As an aside, does OpenSSL have a standard for functions ending in unlocked, locked and no_lock, such that theose suffixes are used consistently? And how to ensure the meaning of unlocked vs. no_lock is enforced?

tmshort · 2023-05-30T18:38:53Z

providers/implementations/rands/drbg.c

-    if (drbg == NULL || drbg->lock == NULL)
-        return 1;
-    return CRYPTO_THREAD_write_lock(drbg->lock);
+    return 1;


Q: Should this be incrementing a counter/metric or something, so we can "get a hint" if something needs to happen here?

tmshort · 2023-05-30T18:39:47Z

providers/implementations/rands/drbg.c

 }

+/* Interpreted as a hint only and ignored as for ossl_drbg_lock() */


Q: In a similar vein, a counter/metric to track this? Even if just in a debug build?

tmshort · 2023-05-30T18:41:35Z

providers/implementations/rands/drbg.c

@@ -487,13 +486,16 @@ int ossl_prov_drbg_uninstantiate(PROV_DRBG *drbg)
 /*
 * Reseed |drbg|, mixing in the specified data
 *
- * Requires that drbg->lock is already locked for write, if non-null.
+ * Acquires the drbg->lock for writing, if non-null.


Is this updated comment accurate? I'm not seeing a lock acquisition, except in the wrapper to this function.

Oops. This should be against the wrapper. Fixed!

tmshort · 2023-05-30T18:59:21Z

providers/implementations/rands/drbg_local.h

@@ -212,7 +212,7 @@ OSSL_FUNC_rand_clear_seed_fn ossl_drbg_clear_seed;
                                            \
        for (i = 0; i < OSSL_NELEM(v); i++) \
            if ((v)[i] != 0)                \
-                return 0;                   \
+                goto err;                   \


It's already part of the code, but changing flow control within a macro makes me shudder. Would this be better implemented as a function doing a memcmp()-like thing (yeah, I know aliasing) with a return value to change the flow control?

Yeah - I agree. But fixing this starts touching various unrelated bits of code that I'd rather not bring into the scope of this PR.

Possible future change. But yeah, out of context for this PR.

paulidale · 2023-05-30T22:36:32Z

Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.
We should, at the minimum, deprecate them.

I don't think we can remove them any time soon. DRBGs may already have been written that rely on this. We're probably stuck with them for now. We could possibly deprecate them with a view to removing them in the future. On the other hand they are harmless. Another PR in any case.

I think we need some kind of policy here.

We can't get rid of having these APis defined and making an attempt to call them if they are non-NULL. That would be a breaking change and would have to go via the deprecation process.

Callers really should be null checking all provider API function pointers and not calling them if they are NULL. If this is being done, removal of these from the dispatch table is harmless.

mattcaswell · 2023-05-31T15:10:34Z

Fixup pushed. @paulidale / @t8m / @tmshort - please take another look/reconfirm

tmshort

Good from my end.

tmshort · 2023-05-31T18:08:00Z

providers/implementations/rands/drbg.c

-    if (drbg == NULL || drbg->lock == NULL)
-        return 1;
-    return CRYPTO_THREAD_write_lock(drbg->lock);
+    return 1;


So, then, perhaps it should just be removed? If it's not actually be used to "hint at" anything, it's kinda pointless to keep it. Doesn't block my approval, though.

openssl-machine · 2023-06-01T21:00:17Z

This pull request is ready to merge

paulidale · 2023-06-01T21:30:18Z

and merged!

Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not. In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate. This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20970)

Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from #20970)

Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not. In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate. This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl/openssl#20970)

Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all. Partially fixes #20286 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Todd Short <todd.short@me.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from openssl/openssl#20970)

JerryDevis · 2023-07-11T11:48:21Z

@paulidale @t8m
Will this commit be picked into the 3.1 or 3.0 branch?

mattcaswell · 2023-07-11T13:07:22Z

OpenSSL Technical Committee (OTC) discussed this and decided that it should be for master only and not backported:

#20970 (comment)

mattcaswell added 2 commits May 15, 2023 16:06

mattcaswell mentioned this pull request May 15, 2023

Don't get a write lock when querying the parent reseed_count or strength #20944

Closed

github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 15, 2023

mattcaswell mentioned this pull request May 15, 2023

3.0 performance degraded due to locking #20286

Open

t8m added triaged: refactor The issue/pr requests/implements refactoring tests: exempted The PR is exempt from requirements for testing labels May 15, 2023

paulidale reviewed May 15, 2023

View reviewed changes

paulidale approved these changes May 16, 2023

View reviewed changes

mattcaswell mentioned this pull request May 23, 2023

Avoid taking a write lock in RAND_get_rand_method() #20929

Closed

t8m removed approval: otc review pending This pull request needs review by an OTC member hold: need otc decision The OTC needs to make a decision branch: 3.0 Merge to openssl-3.0 branch branch: 3.1 Merge to openssl-3.1 labels May 30, 2023

t8m approved these changes May 30, 2023

View reviewed changes

t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels May 30, 2023

tmshort requested changes May 30, 2023

View reviewed changes

fixup! Refactor the DRBG implementations to manage locking themselves

cf1211d

mattcaswell force-pushed the drbg-locks-alt branch from 040ea69 to cf1211d Compare May 31, 2023 15:02

mattcaswell added approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member and removed approval: done This pull request has the required number of approvals labels May 31, 2023

tmshort approved these changes May 31, 2023

View reviewed changes

t8m approved these changes May 31, 2023

View reviewed changes

t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member labels May 31, 2023

paulidale approved these changes May 31, 2023

View reviewed changes

openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Jun 1, 2023

paulidale closed this Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't get a lock when querying the parent reseed_count (alternative version) #20970

Don't get a lock when querying the parent reseed_count (alternative version) #20970

mattcaswell commented May 15, 2023

richsalz commented May 15, 2023

paulidale May 15, 2023

mattcaswell May 16, 2023

tmshort May 30, 2023 •

edited

mattcaswell May 31, 2023

tmshort May 31, 2023

paulidale commented May 15, 2023

paulidale left a comment

paulidale May 16, 2023

mattcaswell May 30, 2023

mattcaswell commented May 23, 2023

t8m commented May 30, 2023

richsalz commented May 30, 2023

richsalz commented May 30, 2023

mattcaswell commented May 30, 2023

mattcaswell commented May 30, 2023

mattcaswell commented May 30, 2023

tmshort left a comment

tmshort May 30, 2023 •

edited

tmshort May 30, 2023 •

edited

mattcaswell May 31, 2023

tmshort May 30, 2023

mattcaswell May 31, 2023

tmshort May 30, 2023

mattcaswell May 31, 2023

tmshort May 31, 2023

paulidale commented May 30, 2023

mattcaswell commented May 31, 2023 •

edited

tmshort left a comment

tmshort May 31, 2023

openssl-machine commented Jun 1, 2023

paulidale commented Jun 1, 2023

JerryDevis commented Jul 11, 2023

mattcaswell commented Jul 11, 2023

		}

		/* Interpreted as a hint only and ignored as for ossl_drbg_lock() */

Don't get a lock when querying the parent reseed_count (alternative version) #20970

Don't get a lock when querying the parent reseed_count (alternative version) #20970

Conversation

mattcaswell commented May 15, 2023

richsalz commented May 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmshort May 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulidale commented May 15, 2023

paulidale left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattcaswell commented May 23, 2023

t8m commented May 30, 2023

richsalz commented May 30, 2023

richsalz commented May 30, 2023

mattcaswell commented May 30, 2023

mattcaswell commented May 30, 2023

mattcaswell commented May 30, 2023

tmshort left a comment

Choose a reason for hiding this comment

tmshort May 30, 2023 • edited

Choose a reason for hiding this comment

tmshort May 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulidale commented May 30, 2023

mattcaswell commented May 31, 2023 • edited

tmshort left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openssl-machine commented Jun 1, 2023

paulidale commented Jun 1, 2023

JerryDevis commented Jul 11, 2023

mattcaswell commented Jul 11, 2023

tmshort May 30, 2023 •

edited

tmshort May 30, 2023 •

edited

tmshort May 30, 2023 •

edited

mattcaswell commented May 31, 2023 •

edited