Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear method store / query cache confusion #18151

Closed
wants to merge 8 commits into from

Conversation

levitte
Copy link
Member

@levitte levitte commented Apr 22, 2022

Stop emptying the method stores or clearing the corresponding provider
synchronisation records (operation bits) when flushing the query cache.
Rename some internal things to better express what they are for.

This also refactors the core method construction loop so the provider
synchronisation records (operation bits) are correctky tested.

Fixes #18150

@github-actions github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Apr 22, 2022
@mattcaswell
Copy link
Member

Looks like there are fips related problems in the CIs.

I assume the intention is to backport this to 3.0 since this is a serious bug there?

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

Ah, I forgot to update the FIPS checksums (and also to actually test for myself with FIPS enabled).

I assume the intention is to backport this to 3.0 since this is a serious bug there?

Yes. Labels updated.

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

So from some FIPS related failures, it looks like there is actually an expectation that a method store gets cleared under some pretty special circumstances, all revolving around OSSL_SELF_TEST

@levitte levitte marked this pull request as draft April 22, 2022 12:40
@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

Making this a draft for some time. It appears that there's one bit that actually needs to be able to drop the methods of a provider:

OSSL_PROVIDER_self_test() is used to run a provider's self tests on
demand. If the self tests fail then the provider will fail to provide
any further services and algorithms. OSSL_SELF_TEST_set_callback(3) may
be called beforehand in order to display diagnostics for the running
self tests.

Although, I'm not quite sure if clearing the method store is necessary or optimal, or from another point of view, if it's enough.

@t8m t8m added the triaged: bug The issue/pr is/fixes a bug label Apr 22, 2022
@t8m
Copy link
Member

t8m commented Apr 22, 2022

Although, I'm not quite sure if clearing the method store is necessary or optimal, or from another point of view, if it's enough.

IMO it makes sense. If after that the failing provider just won't respond to any query operations anymore. Only methods for the particular provider should be cleared though.

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

Although, I'm not quite sure if clearing the method store is necessary or optimal, or from another point of view, if it's enough.

IMO it makes sense. If after that the failing provider just won't respond to any query operations anymore. Only methods for the particular provider should be cleared though.

Yeah... I had a chat with @mattcaswell and we figured out together what's actually going on and why. Dropping the methods is the correct response for this... but should be done selectively, i.e. only those associated with the provider for which the self test failed.

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

I've added a fixup commit for a few details, and another commit that adds clearly defined functionality to remove all methods that belong to a specified provider. The latter is used to restore ossl_provider_self_test()'s intended functionality.

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

Tests are passing on my laptop, so I'm making this PR ready to review again

@levitte levitte marked this pull request as ready for review April 22, 2022 14:50
@@ -1152,7 +1152,25 @@ static int provider_flush_store_cache(const OSSL_PROVIDER *prov)
CRYPTO_THREAD_unlock(store->lock);

if (!freeing)
return evp_method_store_flush(prov->libctx);
return evp_method_store_cache_flush(prov->libctx);
Copy link
Member Author

@levitte levitte Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I found infuriating is that we do this for EVP stuff only, but not OSSL_STORE, OSSL_ENCODER or OSSL_DECODER. I'm not addition the other implementation types in this PR, but to be complete, we really should add them (in a separate effort!), and also probably make a mechanism that helps us avoid this sort of hard coding, as it's just too easy to forget the gazillion places to touch as soon as a new (non_EVP) implementation type comes along

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance the code could be refactored so they shared most of it?

crypto/provider_core.c Outdated Show resolved Hide resolved
@t8m t8m added approval: review pending This pull request needs review by a committer approval: otc review pending This pull request needs review by an OTC member labels Apr 22, 2022
@t8m
Copy link
Member

t8m commented Apr 22, 2022

CI is relevant.

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

Yikes, memory stuff. I'll deal with that on Monday, I'm pretty finished for today

@mattcaswell
Copy link
Member

Looks ok to me, subject to the memory stuff in CIs being sorted.

@mattcaswell
Copy link
Member

It would be good to know what impact this has on the decoder performance.

@levitte
Copy link
Member Author

levitte commented Apr 22, 2022

It would be good to know what impact this has on the decoder performance.

The only measurement I personally have is a huge reduction of method constructions when calling OSSL_DECODER_CTX_new_for_pkey() multiple times with the same input, i.e. the methods are only constructed once instead of every time. @paulidale has a bigger thing that I hope he'll try this PR on.

@levitte levitte force-pushed the fix-18150 branch 3 times, most recently from 0d70990 to 3c098fa Compare April 25, 2022 04:52
@levitte
Copy link
Member Author

levitte commented Apr 25, 2022

Only the address_ub_sanitizer job that's still failing. That's some sort of progress at least.
Only having "Indirect leaks" indicates that there's some sort of mutual reference between two allocated objects, but no root object pointing at them... that's most likely some cleanup that doesn't do its job quite right

@levitte
Copy link
Member Author

levitte commented Apr 25, 2022

@hlandau, could the memory issue I encounter here possibly be related to #18139 (fixed in #18141)?

@t8m t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels May 5, 2022
@levitte
Copy link
Member Author

levitte commented May 5, 2022

Of course we forgot the label. I'm counting the 24h grace period from the last approval, though (more than 24hs ago)

openssl-machine pushed a commit that referenced this pull request May 5, 2022
This is a misused function, as it was called during query cache flush,
when the provider operation bits were meant to record if methods for a
certain operation has already been added to the method store.

Fixes #18150

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
The existing pre- and post-condition functions are supposed to check if
methods have already been created and stored, using provider operation
bits.  This is supposed to only be done for "permanent" method stores.

However, the way the pre-condition was called, it could not know if the
set of implementations to be stored is likely to end up in a "permanent"
or a temporary store.  It needs access to the |no_store| flag returned
by the provider's operation query function, because that call was done
after the pre-condition was called.

This requires a bit of refactoring, primarly of |algorithm_do_this()|,
but also of |ossl_method_construct_precondition()|.

Fixes #18150

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
When evp_method_store_flush() flushed the query cache, it also freed
all methods in the EVP method store, through an unfortunate call of
ossl_method_store_flush_cache() with an argument saying that all
methods should indeed be dropped.

To undo some of the confusion, ossl_method_store_flush_cache() is
renamed to ossl_method_store_cache_flush_all(), and limited to do
only that.  Some if the items in the internal ALGORITHM structure are
also renamed and commented to clarify what they are for.

Fixes #18150

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
This adds ossl_method_store_remove_all_provided(), which selectively
removes methods from the given store that are provided by the given
provider.

This also adds the EVP specific evp_method_store_remove_all_provided(),
which matches ossl_method_store_remove_all_provided() but can also
retrieve the correct store to manipulate for EVP functions.

This allows us to modify ossl_provider_self_test() to do the job it's
supposed to do, but through clearly defined functions instead of a
cache flushing call that previously did more than that.

ossl_provider_deactivate() is also modified to remove methods associated
with the deactivated provider, and not just clearing the cache.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
The `alg_cleanup` didn't properly clear the OPENSSL_SA leaf that it
had just freed the contents of.  Fortunately, `ossl_sa_ALGORITHM_doall_arg()`
allows us to pass the store pointer itself as an extra argument, which
allows a modified `alg_cleanup` to complete the job.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
In child library contexts, which contain child "clones" of the
providers the application has in store, one of these children will
always be the provider that creates the child library context; let's
call them self-refering child providers.

For these self-refering child providers, we don't increment the parent
provider reference count, nor do we free the parent provider, as those
become self defeating and hinder the teardown and unloading process
when the application cleans up.

For non self-refering child providers, we must retain this propagation
of reference count to the parent, so that aren't torn down too early,
i.e. when there's still a "foreign" reference (fetched algorithm).

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
evp_method_store_flush() and evp_method_store_remove_all_provided()
only cover EVP operations, but not encoders, decoders and store loaders.
This adds corresponding methods for those as well.  Without this, their
method stores are never cleaned up when the corresponding providers are
deactivated or otherwise modified.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)
@levitte
Copy link
Member Author

levitte commented May 5, 2022

Merged

master:
20b6d85 Drop ossl_provider_clear_all_operation_bits() and all uses of it
10937d5 Refactor method construction pre- and post-condition
60640d7 Don't empty the method store when flushing the query cache
2e4d067 Make it possible to remove methods by the provider that provides them
03454ba Complete the cleanup of an algorithm in OSSL_METHOD_STORE
4da7663 For child libctx / provider, don't count self-references in parent
32e3c07 Add method store cache flush and method removal to non-EVP operations

3.0:
7496913 Drop ossl_provider_clear_all_operation_bits() and all uses of it
27c4ac7 Refactor method construction pre- and post-condition
8b76db9 Don't empty the method store when flushing the query cache
215708c Make it possible to remove methods by the provider that provides them
106f5c2 Complete the cleanup of an algorithm in OSSL_METHOD_STORE
a860a58 For child libctx / provider, don't count self-references in parent
37a6e9e Add method store cache flush and method removal to non-EVP operations

@levitte levitte closed this May 5, 2022
openssl-machine pushed a commit that referenced this pull request May 5, 2022
This is a misused function, as it was called during query cache flush,
when the provider operation bits were meant to record if methods for a
certain operation has already been added to the method store.

Fixes #18150

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 20b6d85)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
The existing pre- and post-condition functions are supposed to check if
methods have already been created and stored, using provider operation
bits.  This is supposed to only be done for "permanent" method stores.

However, the way the pre-condition was called, it could not know if the
set of implementations to be stored is likely to end up in a "permanent"
or a temporary store.  It needs access to the |no_store| flag returned
by the provider's operation query function, because that call was done
after the pre-condition was called.

This requires a bit of refactoring, primarly of |algorithm_do_this()|,
but also of |ossl_method_construct_precondition()|.

Fixes #18150

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 10937d5)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
When evp_method_store_flush() flushed the query cache, it also freed
all methods in the EVP method store, through an unfortunate call of
ossl_method_store_flush_cache() with an argument saying that all
methods should indeed be dropped.

To undo some of the confusion, ossl_method_store_flush_cache() is
renamed to ossl_method_store_cache_flush_all(), and limited to do
only that.  Some if the items in the internal ALGORITHM structure are
also renamed and commented to clarify what they are for.

Fixes #18150

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 60640d7)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
This adds ossl_method_store_remove_all_provided(), which selectively
removes methods from the given store that are provided by the given
provider.

This also adds the EVP specific evp_method_store_remove_all_provided(),
which matches ossl_method_store_remove_all_provided() but can also
retrieve the correct store to manipulate for EVP functions.

This allows us to modify ossl_provider_self_test() to do the job it's
supposed to do, but through clearly defined functions instead of a
cache flushing call that previously did more than that.

ossl_provider_deactivate() is also modified to remove methods associated
with the deactivated provider, and not just clearing the cache.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 2e4d067)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
The `alg_cleanup` didn't properly clear the OPENSSL_SA leaf that it
had just freed the contents of.  Fortunately, `ossl_sa_ALGORITHM_doall_arg()`
allows us to pass the store pointer itself as an extra argument, which
allows a modified `alg_cleanup` to complete the job.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 03454ba)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
In child library contexts, which contain child "clones" of the
providers the application has in store, one of these children will
always be the provider that creates the child library context; let's
call them self-refering child providers.

For these self-refering child providers, we don't increment the parent
provider reference count, nor do we free the parent provider, as those
become self defeating and hinder the teardown and unloading process
when the application cleans up.

For non self-refering child providers, we must retain this propagation
of reference count to the parent, so that aren't torn down too early,
i.e. when there's still a "foreign" reference (fetched algorithm).

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 4da7663)
openssl-machine pushed a commit that referenced this pull request May 5, 2022
evp_method_store_flush() and evp_method_store_remove_all_provided()
only cover EVP operations, but not encoders, decoders and store loaders.
This adds corresponding methods for those as well.  Without this, their
method stores are never cleaned up when the corresponding providers are
deactivated or otherwise modified.

Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from #18151)

(cherry picked from commit 32e3c07)
@levitte levitte deleted the fix-18150 branch May 5, 2022 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: done This pull request has the required number of approvals branch: master Merge to master branch branch: 3.0 Merge to openssl-3.0 branch severity: fips change The pull request changes FIPS provider sources triaged: bug The issue/pr is/fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Method store and query cache confusion in our code
4 participants