Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use after free on ENGINE_finish(rdrand_engine). #548

Closed
dwmw2 opened this issue May 4, 2020 · 8 comments · Fixed by #550
Closed

Use after free on ENGINE_finish(rdrand_engine). #548

dwmw2 opened this issue May 4, 2020 · 8 comments · Fixed by #550

Comments

@dwmw2
Copy link
Contributor

dwmw2 commented May 4, 2020

Commit caec0c7 has caused SoftHSM to stop exiting cleanly when running the auth-pkcs11 test from OpenConnect self-tests. Instead of exiting, it just stops:

#0  __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x555555cbb320) at pthread_rwlock_common.c:917
#1  __GI___pthread_rwlock_wrlock (rwlock=0x555555cbb320) at pthread_rwlock_wrlock.c:27
#2  0x00007ffff6e49e7d in CRYPTO_THREAD_write_lock (lock=<optimized out>) at crypto/threads_pthread.c:78
#3  0x00007ffff6daf745 in ENGINE_finish (e=0x555555b8af90) at crypto/engine/eng_init.c:101
#4  0x00007fffe99a54d7 in OSSLCryptoFactory::~OSSLCryptoFactory (this=0x55555583a1b0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:240
#5  0x00007fffe99a5559 in OSSLCryptoFactory::~OSSLCryptoFactory (this=0x55555583a1b0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:227
#6  0x00007ffff7a12680 in __run_exit_handlers (status=1, listp=0x7ffff7b98738 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
    at exit.c:108
#7  0x00007ffff7a127c0 in __GI_exit (status=<optimized out>) at exit.c:139
#8  0x0000555555559878 in main ()

This appears to be true for the GnuTLS build of OpenConnect but not the OpenSSL build.

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 4, 2020

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 4, 2020

Valgrind says...

==228434== Invalid read of size 4
==228434==    at 0x54D9316: __pthread_rwlock_wrlock_full (pthread_rwlock_common.c:581)
==228434==    by 0x54D9316: pthread_rwlock_wrlock (pthread_rwlock_wrlock.c:27)
==228434==    by 0x58ADE7C: CRYPTO_THREAD_write_lock (threads_pthread.c:78)
==228434==    by 0x5813744: ENGINE_finish (eng_init.c:101)
==228434==    by 0x134794D6: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:240)
==228434==    by 0x13479558: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:259)
==228434==    by 0x552767F: __run_exit_handlers (exit.c:108)
==228434==    by 0x55277BF: exit (exit.c:139)
==228434==    by 0x404043: main (main.c:1553)
==228434==  Address 0x14bb74c8 is 24 bytes inside a block of size 56 free'd
==228434==    at 0x4839A0C: free (vg_replace_malloc.c:540)
==228434==    by 0x583FBE1: OPENSSL_cleanup (init.c:601)
==228434==    by 0x583FBE1: OPENSSL_cleanup (init.c:497)
==228434==    by 0x552767F: __run_exit_handlers (exit.c:108)
==228434==    by 0x55277BF: exit (exit.c:139)
==228434==    by 0x404043: main (main.c:1553)
==228434==  Block was alloc'd at
==228434==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==228434==    by 0x584859D: CRYPTO_zalloc (mem.c:230)
==228434==    by 0x58ADE0B: CRYPTO_THREAD_lock_new (threads_pthread.c:29)
==228434==    by 0x58137B9: do_engine_lock_init (eng_lib.c:25)
==228434==    by 0x58137B9: do_engine_lock_init_ossl_ (eng_lib.c:21)
==228434==    by 0x54DBD7E: __pthread_once_slow (pthread_once.c:116)
==228434==    by 0x58ADEFC: CRYPTO_THREAD_run_once (threads_pthread.c:118)
==228434==    by 0x5813878: ENGINE_new (eng_lib.c:33)
==228434==    by 0x58152FD: ENGINE_rdrand (eng_rdrand.c:70)
==228434==    by 0x58152FD: engine_load_rdrand_int (eng_rdrand.c:85)
==228434==    by 0x583F8BC: ossl_init_engine_rdrand (init.c:353)
==228434==    by 0x583F8BC: ossl_init_engine_rdrand_ossl_ (init.c:347)
==228434==    by 0x54DBD7E: __pthread_once_slow (pthread_once.c:116)
==228434==    by 0x58ADEFC: CRYPTO_THREAD_run_once (threads_pthread.c:118)
==228434==    by 0x584004C: OPENSSL_init_crypto (init.c:723)
==228434==    by 0x584004C: OPENSSL_init_crypto (init.c:620)

dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 4, 2020
@dwmw2
Copy link
Contributor Author

dwmw2 commented May 4, 2020

Ah, you're cleaning up engines after OpenSSL's own atexit handler has cleaned everything up.

Proof of concept fix at dwmw2@f9c24e0

@dwmw2 dwmw2 changed the title Deadlock on ENGINE_finish(rdrand_engine). Use after free on ENGINE_finish(rdrand_engine). May 5, 2020
@rijswijk
Copy link
Contributor

Thanks for report and PoC fix. I checked the OSSL documentation, and it seems that the atexit() behaviour was introduced in 1.1.0. So while the PoC works, it should check for versions greater than 1.1.0 (right now it looks like it checks for 1.1.1).

Could you submit a PR based on the PoC fix with the change mentioned above? That'd be much appreciated!

dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 12, 2020
dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 12, 2020
dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 12, 2020
As of 1.1.0, OpenSSL registers its own atexit() handler to call
OPENSSL_cleanup(). If our own atexit() handler subsequently tries to,
for example, unreference an ENGINE, then it'll crash or deadlock with a
use-after-free. This is issue opendnssec#548.

Fix it by hooking registering a callback OpenSSL_atexit() to be called
when OPENSSL_cleanup() is called. It sets a flag which prevents any
further touching of OpenSSL objects — which would otherwise happen
fairly much immediately thereafter when our own OSSLCryptoFactory
destructor gets called by the C++ runtime's own atexit() handler.

Fixes: opendnssec#548
dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 12, 2020
…y shut down

As of 1.1.0, OpenSSL registers its own atexit() handler to call
OPENSSL_cleanup(). If our own code subsequently tries to, for example,
unreference an ENGINE, then it'll crash or deadlock with a use after
free.

Fix it by registering a callback with OPENSSL_atexit() to be called when
OPENSSL_cleanup() is called. It sets a flag which prevents any further
touching of OpenSSL objects — which would otherwise happen fairly much
immediately thereafter when our own OSSLCryptoFactory destructor gets
called by the C++ runtime's own atexit() handler.

Fixes: opendnssec#548
dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 12, 2020
…y shut down

As of 1.1.0, OpenSSL registers its own atexit() handler to call
OPENSSL_cleanup(). If our own code subsequently tries to, for example,
unreference an ENGINE, then it'll crash or deadlock with a use after
free.

Fix it by registering a callback with OPENSSL_atexit() to be called when
OPENSSL_cleanup() is called. It sets a flag which prevents any further
touching of OpenSSL objects — which would otherwise happen fairly much
immediately thereafter when our own OSSLCryptoFactory destructor gets
called by the C++ runtime's own atexit() handler.

Fixes: opendnssec#548
halderen added a commit that referenced this issue May 12, 2020
Issue #548: Don't clean up engines after OpenSSL has already shut down
airtower-luna added a commit to airtower-luna/mod_gnutls that referenced this issue May 12, 2020
The hung CI jobs on fedora:32 and debian:sid have likely been caused
by a SoftHSM bug (see opendnssec/SoftHSMv2#548),
but the failure to upload their logs seems to be an issue with the
build config.
@dwmw2
Copy link
Contributor Author

dwmw2 commented May 13, 2020

I think we need to revert this. It causes problems if the SoftHSM DSO is unloaded before OpenSSL shuts down: https://bugzilla.redhat.com/show_bug.cgi?id=1831086#c8

I'll see if I can find a way to tell that OpenSSL has been shut down without registering an OpenSSL_atexit() handler for it (which can't be unregistered when we unload).

I wonder if we can register a system atexit() handler, and rely on ordering such that it runs before the C++ runtime's atexit handler which calls the OSSLCryptoFactory destructor? Ordering is guaranteed, I think:

DESCRIPTION
       The atexit() function registers the given function to be called at nor‐
       mal process termination, either via exit(3) or via return from the pro‐
       gram's main().  Functions so registered are called in the reverse order
       of their registration; no arguments are passed.

@dwmw2
Copy link
Contributor Author

dwmw2 commented May 13, 2020

Argh, of course it works for my test just to register ossl_factory_shutdown() with atexit() instead of OpenSSL_atexit() but of course it would have the same problem when the DSO is unloaded, since we can't unregister atexit() handlers either.

@halderen
Copy link
Member

halderen commented May 13, 2020 via email

dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 13, 2020
We can't register our own atexit() or OPENSSL_atexit() handler because
there's no way to unregister it when the SoftHSM DSO is unloaded. This
causes the crash reported at https://bugzilla.redhat.com/1831086#c8

Instead of using that method to set a flag showing that OPENSSL_cleanup()
has occurred, instead test directly by calling OPENSSL_init_crypto() for
something that *would* do nothing, but will fail if OPENSSL_cleanup()
has indeed been run already.

Fixes: c2cc065 "Issue opendnssec#548: Don't clean up engines after OpenSSL
                   has already shut down"
@Aearsis
Copy link
Contributor

Aearsis commented May 13, 2020

I was just writing a proposal for this hack :) It is not nice (ERR_R_INIT_FAIL is not used exclusively, we are not checking success in OSSLCryptoFactory, ...), but fighting fire with fire is acceptable in this case.

dwmw2 added a commit to dwmw2/SoftHSMv2 that referenced this issue May 21, 2020
We can't register our own atexit() or OPENSSL_atexit() handler because
there's no way to unregister it when the SoftHSM DSO is unloaded. This
causes the crash reported at https://bugzilla.redhat.com/1831086#c8

Instead of using that method to set a flag showing that OPENSSL_cleanup()
has occurred, instead test directly by calling OPENSSL_init_crypto() for
something that *would* do nothing, but will fail if OPENSSL_cleanup()
has indeed been run already.

Fixes: c2cc065 "Issue opendnssec#548: Don't clean up engines after OpenSSL
                   has already shut down"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants