New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear method store / query cache confusion #18151
Conversation
Looks like there are fips related problems in the CIs. I assume the intention is to backport this to 3.0 since this is a serious bug there? |
Ah, I forgot to update the FIPS checksums (and also to actually test for myself with FIPS enabled).
Yes. Labels updated. |
So from some FIPS related failures, it looks like there is actually an expectation that a method store gets cleared under some pretty special circumstances, all revolving around OSSL_SELF_TEST |
Making this a draft for some time. It appears that there's one bit that actually needs to be able to drop the methods of a provider:
Although, I'm not quite sure if clearing the method store is necessary or optimal, or from another point of view, if it's enough. |
IMO it makes sense. If after that the failing provider just won't respond to any query operations anymore. Only methods for the particular provider should be cleared though. |
Yeah... I had a chat with @mattcaswell and we figured out together what's actually going on and why. Dropping the methods is the correct response for this... but should be done selectively, i.e. only those associated with the provider for which the self test failed. |
I've added a fixup commit for a few details, and another commit that adds clearly defined functionality to remove all methods that belong to a specified provider. The latter is used to restore |
Tests are passing on my laptop, so I'm making this PR ready to review again |
crypto/provider_core.c
Outdated
@@ -1152,7 +1152,25 @@ static int provider_flush_store_cache(const OSSL_PROVIDER *prov) | |||
CRYPTO_THREAD_unlock(store->lock); | |||
|
|||
if (!freeing) | |||
return evp_method_store_flush(prov->libctx); | |||
return evp_method_store_cache_flush(prov->libctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I found infuriating is that we do this for EVP stuff only, but not OSSL_STORE, OSSL_ENCODER or OSSL_DECODER. I'm not addition the other implementation types in this PR, but to be complete, we really should add them (in a separate effort!), and also probably make a mechanism that helps us avoid this sort of hard coding, as it's just too easy to forget the gazillion places to touch as soon as a new (non_EVP) implementation type comes along
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance the code could be refactored so they shared most of it?
CI is relevant. |
Yikes, memory stuff. I'll deal with that on Monday, I'm pretty finished for today |
Looks ok to me, subject to the memory stuff in CIs being sorted. |
It would be good to know what impact this has on the decoder performance. |
The only measurement I personally have is a huge reduction of method constructions when calling |
0d70990
to
3c098fa
Compare
Only the address_ub_sanitizer job that's still failing. That's some sort of progress at least. |
Of course we forgot the label. I'm counting the 24h grace period from the last approval, though (more than 24hs ago) |
This is a misused function, as it was called during query cache flush, when the provider operation bits were meant to record if methods for a certain operation has already been added to the method store. Fixes #18150 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
The existing pre- and post-condition functions are supposed to check if methods have already been created and stored, using provider operation bits. This is supposed to only be done for "permanent" method stores. However, the way the pre-condition was called, it could not know if the set of implementations to be stored is likely to end up in a "permanent" or a temporary store. It needs access to the |no_store| flag returned by the provider's operation query function, because that call was done after the pre-condition was called. This requires a bit of refactoring, primarly of |algorithm_do_this()|, but also of |ossl_method_construct_precondition()|. Fixes #18150 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
When evp_method_store_flush() flushed the query cache, it also freed all methods in the EVP method store, through an unfortunate call of ossl_method_store_flush_cache() with an argument saying that all methods should indeed be dropped. To undo some of the confusion, ossl_method_store_flush_cache() is renamed to ossl_method_store_cache_flush_all(), and limited to do only that. Some if the items in the internal ALGORITHM structure are also renamed and commented to clarify what they are for. Fixes #18150 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
This adds ossl_method_store_remove_all_provided(), which selectively removes methods from the given store that are provided by the given provider. This also adds the EVP specific evp_method_store_remove_all_provided(), which matches ossl_method_store_remove_all_provided() but can also retrieve the correct store to manipulate for EVP functions. This allows us to modify ossl_provider_self_test() to do the job it's supposed to do, but through clearly defined functions instead of a cache flushing call that previously did more than that. ossl_provider_deactivate() is also modified to remove methods associated with the deactivated provider, and not just clearing the cache. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
The `alg_cleanup` didn't properly clear the OPENSSL_SA leaf that it had just freed the contents of. Fortunately, `ossl_sa_ALGORITHM_doall_arg()` allows us to pass the store pointer itself as an extra argument, which allows a modified `alg_cleanup` to complete the job. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
In child library contexts, which contain child "clones" of the providers the application has in store, one of these children will always be the provider that creates the child library context; let's call them self-refering child providers. For these self-refering child providers, we don't increment the parent provider reference count, nor do we free the parent provider, as those become self defeating and hinder the teardown and unloading process when the application cleans up. For non self-refering child providers, we must retain this propagation of reference count to the parent, so that aren't torn down too early, i.e. when there's still a "foreign" reference (fetched algorithm). Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
evp_method_store_flush() and evp_method_store_remove_all_provided() only cover EVP operations, but not encoders, decoders and store loaders. This adds corresponding methods for those as well. Without this, their method stores are never cleaned up when the corresponding providers are deactivated or otherwise modified. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151)
Merged master: 3.0: |
This is a misused function, as it was called during query cache flush, when the provider operation bits were meant to record if methods for a certain operation has already been added to the method store. Fixes #18150 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 20b6d85)
The existing pre- and post-condition functions are supposed to check if methods have already been created and stored, using provider operation bits. This is supposed to only be done for "permanent" method stores. However, the way the pre-condition was called, it could not know if the set of implementations to be stored is likely to end up in a "permanent" or a temporary store. It needs access to the |no_store| flag returned by the provider's operation query function, because that call was done after the pre-condition was called. This requires a bit of refactoring, primarly of |algorithm_do_this()|, but also of |ossl_method_construct_precondition()|. Fixes #18150 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 10937d5)
When evp_method_store_flush() flushed the query cache, it also freed all methods in the EVP method store, through an unfortunate call of ossl_method_store_flush_cache() with an argument saying that all methods should indeed be dropped. To undo some of the confusion, ossl_method_store_flush_cache() is renamed to ossl_method_store_cache_flush_all(), and limited to do only that. Some if the items in the internal ALGORITHM structure are also renamed and commented to clarify what they are for. Fixes #18150 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 60640d7)
This adds ossl_method_store_remove_all_provided(), which selectively removes methods from the given store that are provided by the given provider. This also adds the EVP specific evp_method_store_remove_all_provided(), which matches ossl_method_store_remove_all_provided() but can also retrieve the correct store to manipulate for EVP functions. This allows us to modify ossl_provider_self_test() to do the job it's supposed to do, but through clearly defined functions instead of a cache flushing call that previously did more than that. ossl_provider_deactivate() is also modified to remove methods associated with the deactivated provider, and not just clearing the cache. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 2e4d067)
The `alg_cleanup` didn't properly clear the OPENSSL_SA leaf that it had just freed the contents of. Fortunately, `ossl_sa_ALGORITHM_doall_arg()` allows us to pass the store pointer itself as an extra argument, which allows a modified `alg_cleanup` to complete the job. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 03454ba)
In child library contexts, which contain child "clones" of the providers the application has in store, one of these children will always be the provider that creates the child library context; let's call them self-refering child providers. For these self-refering child providers, we don't increment the parent provider reference count, nor do we free the parent provider, as those become self defeating and hinder the teardown and unloading process when the application cleans up. For non self-refering child providers, we must retain this propagation of reference count to the parent, so that aren't torn down too early, i.e. when there's still a "foreign" reference (fetched algorithm). Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 4da7663)
evp_method_store_flush() and evp_method_store_remove_all_provided() only cover EVP operations, but not encoders, decoders and store loaders. This adds corresponding methods for those as well. Without this, their method stores are never cleaned up when the corresponding providers are deactivated or otherwise modified. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from #18151) (cherry picked from commit 32e3c07)
Stop emptying the method stores or clearing the corresponding provider
synchronisation records (operation bits) when flushing the query cache.
Rename some internal things to better express what they are for.
This also refactors the core method construction loop so the provider
synchronisation records (operation bits) are correctky tested.
Fixes #18150