Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't get a lock when querying the parent reseed_count (alternative version) #20970

Closed
wants to merge 3 commits into from

Conversation

mattcaswell
Copy link
Member

This is an alternative fix to #20944.

This enables the parent reseed_count to be queried without a lock and only requires changes to our own DRBG implementations. It makes no change to the EVP_RAND layer and nor does it add or amend the libcrypto->provider interface. So, there are no API or behaviour changes required for this.

Previously the EVP layer would call lock and unlock functions on the underlying DRBG implementation to say when a lock should be acquired and released. This gives the DRBG implementation no say as to what kind of lock should obtained (e.g. read/write) or even whether a lock is actually needed or not.

In reality we know whether a DRBG is supposed to be in locking mode or not because the "enable_locking()" function will have been called if locks should be used. Therefore we re-interpret the lock and unlock functions as "hints" from the EVP layer which we ignore. Instead we acquire locks only when we need them. By knowing the context we can obtain either a read or a write lock as appropriate.

This may mean that in some rare cases we acquire the locks more than once for a single EVP call, if the EVP call makes several calls to the underlying DRBG. But in practice almost all EVP calls only make one such call. EVP_RAND_generate() is an example of a call where multiple DRBG calls may be made. One of these gets the "max_request" parameter (which is constant for all of our own DRBGs) and it may make several calls to the DRBG generate call - but only if the requested size is very large which will rarely be the case.

Even if a DRBG has locking enabled on it, there are certain parameters which are still safe to obtain even without a lock. The max_request value is constant for all our DRBGs. The reseed_counter does not matter if we get it wrong - so it is safe to avoid the lock. So if all we are reading are those parameters then we take no lock at all.

Partially fixes #20286

Previously the EVP layer would call lock and unlock functions on the
underlying DRBG implementation to say when a lock should be acquired and
released. This gives the DRBG implementation no say as to what kind of
lock should obtained (e.g. read/write) or even whether a lock is actually
needed or not.

In reality we know whether a DRBG is supposed to be in locking mode or
not because the "enable_locking()" function will have been called if
locks should be used. Therefore we re-interpret the lock and unlock
functions as "hints" from the EVP layer which we ignore. Instead we
acquire locks only when we need them. By knowing the context we can obtain
either a read or a write lock as appropriate.

This may mean that in some rare cases we acquire the locks more than once
for a single EVP call, if the EVP call makes several calls to the underlying
DRBG. But in practice almost all EVP calls only make one such call.
EVP_RAND_generate() is an example of a call where multiple DRBG calls may
be made. One of these gets the "max_request" parameter (which is constant
for all of our own DRBGs) and it may make several calls to the DRBG generate
call - but only if the requested size is very large which will rarely be
the case.

Partially fixes openssl#20286
Even if a DRBG has locking enabled on it, there are certain parameters
which are still safe to obtain even without a lock. The max_request
value is constant for all our DRBGs. The reseed_counter does not matter
if we get it wrong - so it is safe to avoid the lock. So if all we are
reading are those parameters then we take no lock at all.

Partially fixes openssl#20286
@mattcaswell mattcaswell added branch: master Merge to master branch approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member hold: need otc decision The OTC needs to make a decision branch: 3.0 Merge to openssl-3.0 branch branch: 3.1 Merge to openssl-3.1 labels May 15, 2023
@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label May 15, 2023
@t8m t8m added triaged: refactor The issue/pr requests/implements refactoring tests: exempted The PR is exempt from requirements for testing labels May 15, 2023
@richsalz
Copy link
Contributor

Needs some docs I think.

if (drbg == NULL || drbg->lock == NULL)
return 1;
return CRYPTO_THREAD_write_lock(drbg->lock);
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just wrong: the primary DRBG will no longer be protected by a lock and it is accessed by multiple threads. Chaos ensues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the case. The primary DRBG is still locked. It just does not lock in response to the evp lock/unlock calls any more. The DRBG manages its own locking instead.

Copy link
Contributor

@tmshort tmshort May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should this be incrementing a counter/metric or something, so we can "get a hint" if something needs to happen here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what purpose? It's slightly tricky to do. It would probably have to be implemented as an atomic to give reliable results. But I'm not sure what we then do with that information? It can't really be used for any decision making. An increment of the counter just tells you that a thread somewhere thinks a lock is necessary for the thing it wants to do. You could validate that the number of lock calls equals the number of unlock calls....but you can only really do that when the drbg is finished with. I suppose you could detect that we've had more unlock calls that lock calls immediately. But these are all "should not happen" scenarios which could only really be used for "assert" checking. I'm not sure the extra cost of an atomic is worth it for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, then, perhaps it should just be removed? If it's not actually be used to "hint at" anything, it's kinda pointless to keep it. Doesn't block my approval, though.

@paulidale
Copy link
Contributor

Also adding an OTC hold here: are we willing to accept non-thread safe access to the single primary DRBG instance?

Copy link
Contributor

@paulidale paulidale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.

We should, at the minimum, deprecate them.

PROV_DRBG *drbg = (PROV_DRBG *)vdrbg;
int ret;

if (drbg->lock != NULL && !CRYPTO_THREAD_write_lock(drbg->lock))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth moving the lock/unlock code into drbg.c?
It's the same code & functions for all three DRBGs, so seems like unnecessary duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure it's worth it. It's like one if line for lock and similarly for unlock.

@mattcaswell
Copy link
Member Author

I've added some performance figures for this PR here

@t8m
Copy link
Member

t8m commented May 30, 2023

OTC: master only as there is some risk involved and the patch is invasive.

@t8m t8m removed approval: otc review pending This pull request needs review by an OTC member hold: need otc decision The OTC needs to make a decision branch: 3.0 Merge to openssl-3.0 branch branch: 3.1 Merge to openssl-3.1 labels May 30, 2023
@richsalz
Copy link
Contributor

Can the OTC explain the risk?

@richsalz
Copy link
Contributor

Because I think this needs documentation and understanding the risks are important to document.

@mattcaswell
Copy link
Member Author

This is a fairly significant change in the implementation of the DRBGs and moves some of the ctx param "getter" support outside of lock protection altogether. Locking code is tricky to get right and there's enough code changing that bugs might creep in which is undesirable in a stable release. If we get this wrong then its likely to have broad impacts because it is in the DRBG code - which is used from more or less everywhere. The risk is a lot less in master which will see several rounds of alpha and beta testing before it is released (which stable release patch updates do not go through).

@mattcaswell
Copy link
Member Author

Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.

We should, at the minimum, deprecate them.

I don't think we can remove them any time soon. DRBGs may already have been written that rely on this. We're probably stuck with them for now. We could possibly deprecate them with a view to removing them in the future. On the other hand they are harmless. Another PR in any case.

@mattcaswell
Copy link
Member Author

Ping for second review

@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels May 30, 2023
Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but I want that one comment reviewed.

As an aside, does OpenSSL have a standard for functions ending in unlocked, locked and no_lock, such that theose suffixes are used consistently? And how to ensure the meaning of unlocked vs. no_lock is enforced?

if (drbg == NULL || drbg->lock == NULL)
return 1;
return CRYPTO_THREAD_write_lock(drbg->lock);
return 1;
Copy link
Contributor

@tmshort tmshort May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should this be incrementing a counter/metric or something, so we can "get a hint" if something needs to happen here?

}

/* Interpreted as a hint only and ignored as for ossl_drbg_lock() */
Copy link
Contributor

@tmshort tmshort May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: In a similar vein, a counter/metric to track this? Even if just in a debug build?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

@@ -487,13 +486,16 @@ int ossl_prov_drbg_uninstantiate(PROV_DRBG *drbg)
/*
* Reseed |drbg|, mixing in the specified data
*
* Requires that drbg->lock is already locked for write, if non-null.
* Acquires the drbg->lock for writing, if non-null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this updated comment accurate? I'm not seeing a lock acquisition, except in the wrapper to this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This should be against the wrapper. Fixed!

@@ -212,7 +212,7 @@ OSSL_FUNC_rand_clear_seed_fn ossl_drbg_clear_seed;
\
for (i = 0; i < OSSL_NELEM(v); i++) \
if ((v)[i] != 0) \
return 0; \
goto err; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already part of the code, but changing flow control within a macro makes me shudder. Would this be better implemented as a function doing a memcmp()-like thing (yeah, I know aliasing) with a return value to change the flow control?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I agree. But fixing this starts touching various unrelated bits of code that I'd rather not bring into the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible future change. But yeah, out of context for this PR.

@paulidale
Copy link
Contributor

Is it reasonable to no longer export the lock and unlock RAND calls?
i.e. remove them from the dispatch table.
Callers should be null checking these before calling them.
We should, at the minimum, deprecate them.

I don't think we can remove them any time soon. DRBGs may already have been written that rely on this. We're probably stuck with them for now. We could possibly deprecate them with a view to removing them in the future. On the other hand they are harmless. Another PR in any case.

I think we need some kind of policy here.

We can't get rid of having these APis defined and making an attempt to call them if they are non-NULL. That would be a breaking change and would have to go via the deprecation process.

Callers really should be null checking all provider API function pointers and not calling them if they are NULL. If this is being done, removal of these from the dispatch table is harmless.

@mattcaswell mattcaswell added approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member and removed approval: done This pull request has the required number of approvals labels May 31, 2023
@mattcaswell
Copy link
Member Author

mattcaswell commented May 31, 2023

Fixup pushed. @paulidale / @t8m / @tmshort - please take another look/reconfirm

Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good from my end.

if (drbg == NULL || drbg->lock == NULL)
return 1;
return CRYPTO_THREAD_write_lock(drbg->lock);
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, then, perhaps it should just be removed? If it's not actually be used to "hint at" anything, it's kinda pointless to keep it. Doesn't block my approval, though.

@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member labels May 31, 2023
@openssl-machine openssl-machine added approval: ready to merge The 24 hour grace period has passed, ready to merge and removed approval: done This pull request has the required number of approvals labels Jun 1, 2023
@openssl-machine
Copy link
Collaborator

This pull request is ready to merge

@paulidale
Copy link
Contributor

and merged!

@paulidale paulidale closed this Jun 1, 2023
openssl-machine pushed a commit that referenced this pull request Jun 1, 2023
Previously the EVP layer would call lock and unlock functions on the
underlying DRBG implementation to say when a lock should be acquired and
released. This gives the DRBG implementation no say as to what kind of
lock should obtained (e.g. read/write) or even whether a lock is actually
needed or not.

In reality we know whether a DRBG is supposed to be in locking mode or
not because the "enable_locking()" function will have been called if
locks should be used. Therefore we re-interpret the lock and unlock
functions as "hints" from the EVP layer which we ignore. Instead we
acquire locks only when we need them. By knowing the context we can obtain
either a read or a write lock as appropriate.

This may mean that in some rare cases we acquire the locks more than once
for a single EVP call, if the EVP call makes several calls to the underlying
DRBG. But in practice almost all EVP calls only make one such call.
EVP_RAND_generate() is an example of a call where multiple DRBG calls may
be made. One of these gets the "max_request" parameter (which is constant
for all of our own DRBGs) and it may make several calls to the DRBG generate
call - but only if the requested size is very large which will rarely be
the case.

Partially fixes #20286

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Todd Short <todd.short@me.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #20970)
openssl-machine pushed a commit that referenced this pull request Jun 1, 2023
Even if a DRBG has locking enabled on it, there are certain parameters
which are still safe to obtain even without a lock. The max_request
value is constant for all our DRBGs. The reseed_counter does not matter
if we get it wrong - so it is safe to avoid the lock. So if all we are
reading are those parameters then we take no lock at all.

Partially fixes #20286

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Todd Short <todd.short@me.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from #20970)
MrE-Fog pushed a commit to MrE-Fog/opensslxxixx that referenced this pull request Jun 4, 2023
Previously the EVP layer would call lock and unlock functions on the
underlying DRBG implementation to say when a lock should be acquired and
released. This gives the DRBG implementation no say as to what kind of
lock should obtained (e.g. read/write) or even whether a lock is actually
needed or not.

In reality we know whether a DRBG is supposed to be in locking mode or
not because the "enable_locking()" function will have been called if
locks should be used. Therefore we re-interpret the lock and unlock
functions as "hints" from the EVP layer which we ignore. Instead we
acquire locks only when we need them. By knowing the context we can obtain
either a read or a write lock as appropriate.

This may mean that in some rare cases we acquire the locks more than once
for a single EVP call, if the EVP call makes several calls to the underlying
DRBG. But in practice almost all EVP calls only make one such call.
EVP_RAND_generate() is an example of a call where multiple DRBG calls may
be made. One of these gets the "max_request" parameter (which is constant
for all of our own DRBGs) and it may make several calls to the DRBG generate
call - but only if the requested size is very large which will rarely be
the case.

Partially fixes #20286

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Todd Short <todd.short@me.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl/openssl#20970)
MrE-Fog pushed a commit to MrE-Fog/opensslxxixx that referenced this pull request Jun 4, 2023
Even if a DRBG has locking enabled on it, there are certain parameters
which are still safe to obtain even without a lock. The max_request
value is constant for all our DRBGs. The reseed_counter does not matter
if we get it wrong - so it is safe to avoid the lock. So if all we are
reading are those parameters then we take no lock at all.

Partially fixes #20286

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Todd Short <todd.short@me.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from openssl/openssl#20970)
@JerryDevis
Copy link
Contributor

@paulidale @t8m
Will this commit be picked into the 3.1 or 3.0 branch?

@mattcaswell
Copy link
Member Author

OpenSSL Technical Committee (OTC) discussed this and decided that it should be for master only and not backported:

#20970 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: ready to merge The 24 hour grace period has passed, ready to merge branch: master Merge to master branch severity: fips change The pull request changes FIPS provider sources tests: exempted The PR is exempt from requirements for testing triaged: refactor The issue/pr requests/implements refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3.0 performance degraded due to locking
7 participants