Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the thread keys are always allocated in the same order #5911

Closed

Conversation

bernd-edlinger
Copy link
Member

Fixes: #5899

This is a possible memory leak and a race condition in base_inited
The crypto base module needs to be initialized always first.

@bernd-edlinger
Copy link
Member Author

This PR is for master only, I will need to back-port this one and #5863 for 1.1.0 as it looks like.

crypto/init.c Outdated

static CRYPTO_ONCE load_crypto_nodelete = CRYPTO_ONCE_STATIC_INIT;
DEFINE_RUN_ONCE_STATIC(ossl_init_load_crypto_nodelete)
{
#if !defined(OPENSSL_NO_DSO) && !defined(OPENSSL_USE_NODELETE)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a funny observation if configured with "./config no-pic" everything is linked statically
but nevertheless this code seems to be executed

@@ -376,7 +376,7 @@ int CRYPTO_memcmp(const void * in_a, const void * in_b, size_t len);
# define OPENSSL_INIT_ENGINE_CAPI 0x00002000L
# define OPENSSL_INIT_ENGINE_PADLOCK 0x00004000L
# define OPENSSL_INIT_ENGINE_AFALG 0x00008000L
# define OPENSSL_INIT_reserved_internal 0x00010000L
# define OPENSSL_INIT_BASE_ONLY 0x00010000L
Copy link
Member

@kroeckx kroeckx Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the documentation needs to be updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

crypto/err/err.c Outdated
@@ -664,6 +664,9 @@ ERR_STATE *ERR_get_state(void)
{
ERR_STATE *state = NULL;

if (!OPENSSL_init_crypto(OPENSSL_INIT_BASE_ONLY, NULL))
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to get called in drbg_lib.c before all RUN_ONCE()s too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, maybe as another PR.
It would be good for instance, if OPENSSL_thread_stop()
was able to clean up that thread state.
I don't know if that this function does currently with the per thread DRBGs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some lack of internal documentation on how to properly use those thread local keys, and it took a while for me to make sure it all passed without leaking. What I currently remember:
We only want 1 function that sets a destructor for the local key to be called when the thread it stopped. This is because detection of thread stop works differently on Windows, and so we have 1 function that cleans up all thread local data. You need to modify ossl_init_thread_stop and ossl_init_thread_start to keep track that the threads has thread local data or not, and so needs to be cleaned up or not. And you need to call ossl_init_thread_start in your RUN_ONCE funtion.

Anyway, we have thread local data for error, async and rand. My guess is that they all need to call this new function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async does not need to change.
The problem with the error module is that the init function creates a recursion,
So OPENSSL_INIT_BASE_ONLY is only needed here.
The other modules just have to make sure they call OPENSSL_init_crypto
before they allocate their thread local keys.
And they should better use ossl_init_thread_start which rand doesn't do currently.

@kroeckx
Copy link
Member

kroeckx commented Apr 8, 2018

The travis failures seem to be caused by the PR.

engines. This not a default option.

=item OPENSSL_INIT_BASE_ONLY

This option is for internal use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then does it need to be in a public header? If you move it to an internal header, please make sure that we don't reuse it by adding comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this define really hurts the eye and given it's name one would anyway guess
what it does.

@bernd-edlinger
Copy link
Member Author

The travis failures seem to be caused by the PR.

I dont understand what is going on with the travis.
I dont even see the sources being compiled, and if there are for instance some warnings.
When test starts just every test fails without printing any useful information....

@bernd-edlinger bernd-edlinger force-pushed the fix_thread_init_order branch 4 times, most recently from 6b09860 to 2768e71 Compare April 9, 2018 21:00
@bernd-edlinger
Copy link
Member Author

Hi @richsalz @kroeckx

The build failures were caused by a recursive call from
ossl_init_load_crypto_nodelete => ERR_set_mark => ERR_get_state =>
=> OPENSSL_init_crypto(OPENSSL_INIT_LOAD_CRYPTO_STRINGS,NULL) =>
=> deadlock at RUN_ONCE(&load_crypto_nodelete,ossl_init_load_crypto_nodelete)

I had no better idea than to invent a new internal API to put the current state of the
error module on a "shelve" and restore the state after this critical section completes.

I consider Rich's confirmation void, because of amount of code changes.
Please re-confirm if you agree with the way I want to resolve this.

Thanks,
Bernd.

@richsalz
Copy link
Contributor

Have you considered a global table, or set of variables, for all openssl per-thread keys and initializing that near the front of startup? I think breaking the modularity might be a better trade-off for this.

@bernd-edlinger
Copy link
Member Author

Actually I have found it difficult to come up with any solution.
The combination of the error module with the multi-threaded initialization
code is highly complex, and when I removed the hack at the
"BIG FAT WARNING" comment, the hell broke loose.

@bernd-edlinger
Copy link
Member Author

The many RUN_ONCE sections are much more frightening than the the per-thread keys.
It is unclear to me if there are cyclic dependencies between the different RUN_ONCE
functions, especially because almost all invoke OPENSSL_init_crypto, and if there
is a failure then it is suddenly a deadlock. Maybe it is only a priority inversion,
that happens if different thread race to initialize the openssl machinery.

@richsalz richsalz added the approval: done This pull request has the required number of approvals label Apr 18, 2018
@richsalz
Copy link
Contributor

Reconfirm

@@ -4527,3 +4527,5 @@ SM2_plaintext_size 4468 1_1_1 EXIST::FUNCTION:SM2
conf_ssl_name_find 4469 1_1_0i EXIST::FUNCTION:
conf_ssl_get_cmd 4470 1_1_0i EXIST::FUNCTION:
conf_ssl_get 4471 1_1_0i EXIST::FUNCTION:
err_shelve_state 4472 1_1_0i EXIST::FUNCTION:
err_unshelve_state 4473 1_1_0i EXIST::FUNCTION:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they exported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentionally, but I did see some travis failures, when I did not export them.
On second thought they may be useful even at other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are in an internal header file, and only used in libcrypto itself, there really isn't a reason to have them in the .num file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont recall exactly what was the reason but initially I did not have these lines, and
one travis build was a red X, also err_free_strings_int is listed in .num
I can try to find out if that can be avoided, OTOH these functions might be used even
in libssl one day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this hunk I get:

/usr/bin/perl util/mkdef.pl crypto linux > libcrypto.map
Error: err_shelve_state does not have a number assigned
Makefile:771: recipe for target 'libcrypto.map' failed
make[1]: *** [libcrypto.map] Error 25
make[1]: Leaving directory '/home/ed/OPC/openssl'
Makefile:170: recipe for target 'all' failed
make: *** [all] Error 2

I don't know how to work around that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdef.pl reads include/internal/err.h, that's why. Under the circumstances, and considering we might have use for them in libssl as well, I think it's ok to export them, rather than splitting up the declaration of interal err functions in even smaller header files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general put stuff you don't want to be exported but to be available to all of libcrypto in crypto/include/internal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to export unless we need to as a general rule. It is easy to export later if required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So create another header file for these two under crypto/include/internal... yeahok, just make sure to name it so it doesn't clash with files in include/internal

crypto/err/err.c Outdated
@@ -694,13 +705,35 @@ ERR_STATE *ERR_get_state(void)
return state;
}

void *err_shelve_state(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document new functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are more or less meant as internal functions.
similar to err_free_strings_int, which is not documented either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean add comments to the new functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, add comments. The documentation requirement exists for all functions, but only the public functions are in pod files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it can be something very simple

@@ -130,7 +149,7 @@ DEFINE_RUN_ONCE_STATIC(ossl_init_base)
*/
# endif
DSO_free(dso);
ERR_pop_to_mark();
err_unshelve_state(err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seems to do the same as the old code and looks wrong to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood things, and it's probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, while ERR_pop_to_mark does not always restore the error state (i.e. overflow),
err_unshelve_state is actually guaranteed to restore the exact state.

[to be squashed]
@bernd-edlinger
Copy link
Member Author

Okay, updated to review comments.

Please re-confirm.

crypto/err/err.c Outdated
return (void*)-1;

if (!RUN_ONCE(&err_init, err_do_init))
return (void*)-1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe it would be better to return an error code.
If this fails and the calling function continues it may end up in the recursive code path.
Theoretically. You may call me paranoid.

@bernd-edlinger
Copy link
Member Author

I'm going to merge today in the afternoon. I'll assume @richsalz's approval is still valid.

levitte pushed a commit that referenced this pull request Apr 20, 2018
Fixes: #5899

Reviewed-by: Rich Salz <rsalz@openssl.org>
Reviewed-by: Richard Levitte <levitte@openssl.org>
(Merged from #5911)
@bernd-edlinger
Copy link
Member Author

Merged to master. A back-port to 1.1.0 is on the way.

Thanks!

bernd-edlinger added a commit to bernd-edlinger/openssl that referenced this pull request Apr 20, 2018
levitte pushed a commit that referenced this pull request Apr 21, 2018
Back-port of #5911
Fixes: #5899

Reviewed-by: Rich Salz <rsalz@openssl.org>
(Merged from #6037)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval: done This pull request has the required number of approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants