Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoftHSMv2 crashes while running a destructor for OSSLCryptoFactory on a PKCS11 module unload #408

Closed
abbra opened this issue Jul 26, 2018 · 7 comments

Comments

@abbra
Copy link
Contributor

commented Jul 26, 2018

When using SoftHSMv2 as a PKCS11 module loaded by NSS library via p11-kit-proxy on Fedora (Rawhide), we see a reproducible crash. The crash happens when Mutex class destructor attempts to access MutexFactory instance while OSSLCryptoFactory destructor removes Mutex instances from the locks array:

Thread 1 "ns-slapd" hit Breakpoint 1, OSSLCryptoFactory::~OSSLCryptoFactory (this=0x555555a868a0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:201
201	OSSLCryptoFactory::~OSSLCryptoFactory()
(gdb) c
Continuing.

Thread 1 "ns-slapd" hit Breakpoint 1, OSSLCryptoFactory::~OSSLCryptoFactory (this=0x555555a868a0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:201
201	OSSLCryptoFactory::~OSSLCryptoFactory()
(gdb) c
Continuing.

Thread 1 "ns-slapd" hit Breakpoint 2, MutexFactory::i() () at MutexFactory.cpp:121
121	{
(gdb) c
Continuing.

Thread 1 "ns-slapd" hit Breakpoint 2, MutexFactory::i() () at MutexFactory.cpp:121
121	{
(gdb) n
342	/usr/include/c++/8/bits/unique_ptr.h: No such file or directory.
(gdb) n
Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:58
58			MutexFactory::i()->DestroyMutex(handle);
(gdb) n

Thread 1 "ns-slapd" hit Breakpoint 2, MutexFactory::i() () at MutexFactory.cpp:121
121	{
(gdb) n
342	/usr/include/c++/8/bits/unique_ptr.h: No such file or directory.
(gdb) n

Thread 1 "ns-slapd" received signal SIGSEGV, Segmentation fault.
0x00007fffe7f8e220 in ?? ()
(gdb) bt full
#0  0x00007fffe7f8e220 in None ()
#1  0x00007fffe7ae875f in Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:58
        this = 0x555555a868e0
        __func__ = "~Mutex"
        __func__ = "~Mutex"
#2  0x00007fffe7ae879d in Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:52
        __func__ = "~Mutex"
#3  0x00007fffe7aef380 in OSSLCryptoFactory::~OSSLCryptoFactory (this=<optimized out>, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:230
        i = 0
        _locks = 0x555555a868c0
        __func__ = "~OSSLCryptoFactory"
#4  0x00007fffe7aef3ed in OSSLCryptoFactory::~OSSLCryptoFactory (this=0x555555a868a0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:201
        __func__ = "~OSSLCryptoFactory"
#5  0x00007ffff4a01027 in __cxa_finalize () at /lib64/libc.so.6
#6  0x00007fffe7aa8bc7 in __do_global_dtors_aux () at SoftHSM.cpp:10976
#7  0x00007fffffffdca0 in None ()
#8  0x00007ffff7de98ed in _dl_close_worker () at /lib64/ld-linux-x86-64.so.2
(gdb) print instance
No symbol "instance" in current context.
(gdb) quit
A debugging session is active.

	Inferior 1 [process 29116] will be killed.

Quit anyway? (y or n) y

another sample, with a bit of debugging code added and visible:


    Thread 1 "ns-slapd" hit Breakpoint 1, OSSLCryptoFactory::~OSSLCryptoFactory (this=0x555555a868a0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:201
    201     OSSLCryptoFactory::~OSSLCryptoFactory()
    (gdb) n
     
    Thread 1 "ns-slapd" hit Breakpoint 1, OSSLCryptoFactory::~OSSLCryptoFactory (this=0x555555a868a0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:201
    201     OSSLCryptoFactory::~OSSLCryptoFactory()
    (gdb) n
    215             delete rng;
    (gdb) n
    224             _locks = locks;
    (gdb) n
    225             locks = NULL;
    (gdb) n
    226             ERROR_MSG("About to recycle mutexes. locks: %p, nlocks: %d", _locks, nlocks);
    (gdb) n
    227             for (unsigned i = 0; i < nlocks; i++)
    (gdb) n
    229                     ERROR_MSG("Recycle mutex %d: %p", i, _locks[i]);
    (gdb) n
    230                     MutexFactory::i()->recycleMutex(_locks[i]);
    (gdb) n
     
    Thread 1 "ns-slapd" hit Breakpoint 2, Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:52
    52      Mutex::~Mutex()
    (gdb) n
     
    Thread 1 "ns-slapd" hit Breakpoint 2, Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:52
    52      Mutex::~Mutex()
    (gdb) n
    54              ERROR_MSG("Mutex %p tries to destroy handle %p, isValid: %d", this, handle, isValid);
    (gdb) n
    55              if (isValid)
    (gdb) print isValid
    $1 = true
    (gdb) n
    57                      ERROR_MSG("About to call DestroyMutex using factory instance %p", MutexFactory::i());
    (gdb) n
    58                      MutexFactory::i()->DestroyMutex(handle);
    (gdb) step
    MutexFactory::i() () at MutexFactory.cpp:121
    121     {
    (gdb) n
    342     /usr/include/c++/8/bits/unique_ptr.h: No such file or directory.
    (gdb) n
     
    Thread 1 "ns-slapd" received signal SIGSEGV, Segmentation fault.
    0x00007fffe7f8e220 in ?? ()
    (gdb) bt full
    #0  0x00007fffe7f8e220 in None ()
    #1  0x00007fffe7ae875f in Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:58
            this = 0x555555a868e0
            __func__ = "~Mutex"
            __func__ = "~Mutex"
    #2  0x00007fffe7ae879d in Mutex::~Mutex (this=0x555555a868e0, __in_chrg=<optimized out>) at MutexFactory.cpp:52
            __func__ = "~Mutex"
    #3  0x00007fffe7aef380 in OSSLCryptoFactory::~OSSLCryptoFactory (this=<optimized out>, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:230
            i = 0
            _locks = 0x555555a868c0
            __func__ = "~OSSLCryptoFactory"
    #4  0x00007fffe7aef3ed in OSSLCryptoFactory::~OSSLCryptoFactory (this=0x555555a868a0, __in_chrg=<optimized out>) at OSSLCryptoFactory.cpp:201
            __func__ = "~OSSLCryptoFactory"
    #5  0x00007ffff4a01027 in __cxa_finalize () at /lib64/libc.so.6
    #6  0x00007fffe7aa8bc7 in __do_global_dtors_aux () at main.cpp:1178
    #7  0x00007fffffffdca0 in None ()
    #8  0x00007ffff7de98ed in _dl_close_worker () at /lib64/ld-linux-x86-64.so.2
    (gdb)

I tried to analyze what happens and added a debug output. The result looks like this (here and above source code lines are off compared to the upstream because of ERROR_MSG() calls I added):

 OSSLCryptoFactory.cpp(226): About to recycle mutexes. locks: 0x555555a868c0, nlocks: 1
 OSSLCryptoFactory.cpp(229): Recycle mutex 0: 0x555555a868e0 
 MutexFactory.cpp(54): Mutex 0x555555a868e0 tries to destroy handle 0x555555a86870, isValid: 1
 MutexFactory.cpp(57): About to call DestroyMutex using factory instance 0x555555a82e80
 MutexFactory.cpp(183): Factory 0x555555a82e80, DestroyMutex for 0x555555a86870, we are enabled? 1
 MutexFactory.cpp(186): Calling destroyMutex (func 0x7fffe7f8e220) with mutex handle 0x555555a86870
 MutexFactory.cpp(124): New MutexFactory about to be created 
 MutexFactory.cpp(104): MutexFactory 0x555555a82e80 created
 MutexFactory.cpp(48): New Mutex 0x555555a86850, handle 0x555555a82de0 is added
 OSSLCryptoFactory.cpp(106): New locks array created for 1 locks: 0x555555a868c0
 MutexFactory.cpp(48): New Mutex 0x555555a868e0, handle 0x555555a86870 is added
 OSSLCryptoFactory.cpp(110): New lock added to locks[0] = 0x555555a868e0
 MutexFactory.cpp(48): New Mutex 0x555555ac7320, handle 0x555555ac72a0 is added
 MutexFactory.cpp(48): New Mutex 0x555555ac62d0, handle 0x555555ac62f0 is added
 MutexFactory.cpp(48): New Mutex 0x555555ac6380, handle 0x555555ac63a0 is added
 MutexFactory.cpp(54): Mutex 0x555555ac6380 tries to destroy handle 0x555555ac63a0, isValid: 1
 MutexFactory.cpp(57): About to call DestroyMutex using factory instance 0x555555a82e80
 MutexFactory.cpp(183): Factory 0x555555a82e80, DestroyMutex for 0x555555ac63a0, we are enabled? 1
 MutexFactory.cpp(186): Calling destroyMutex (func 0x7fffe7f8e220) with mutex handle 0x555555ac63a0
 MutexFactory.cpp(60): Called DestroyMutex. Done
 MutexFactory.cpp(54): Mutex 0x555555ac62d0 tries to destroy handle 0x555555ac62f0, isValid: 1
 MutexFactory.cpp(57): About to call DestroyMutex using factory instance 0x555555a82e80
 MutexFactory.cpp(183): Factory 0x555555a82e80, DestroyMutex for 0x555555ac62f0, we are enabled? 1
 MutexFactory.cpp(186): Calling destroyMutex (func 0x7fffe7f8e220) with mutex handle 0x555555ac62f0
 MutexFactory.cpp(60): Called DestroyMutex. Done
 MutexFactory.cpp(54): Mutex 0x555555ac7320 tries to destroy handle 0x555555ac72a0, isValid: 1
 MutexFactory.cpp(57): About to call DestroyMutex using factory instance 0x555555a82e80
 MutexFactory.cpp(183): Factory 0x555555a82e80, DestroyMutex for 0x555555ac72a0, we are enabled? 1
 MutexFactory.cpp(186): Calling destroyMutex (func 0x7fffe7f8e220) with mutex handle 0x555555ac72a0
 MutexFactory.cpp(60): Called DestroyMutex. Done
 OSSLCryptoFactory.cpp(226): About to recycle mutexes. locks: 0x555555a868c0, nlocks: 1
 OSSLCryptoFactory.cpp(229): Recycle mutex 0: 0x555555a868e0 
 MutexFactory.cpp(54): Mutex 0x555555a868e0 tries to destroy handle 0x555555a86870, isValid: 1
 MutexFactory.cpp(57): About to call DestroyMutex using factory instance 0x555555a82e80
 MutexFactory.cpp(183): Factory 0x555555a82e80, DestroyMutex for 0x555555a86870, we are enabled? 1
 MutexFactory.cpp(186): Calling destroyMutex (func 0x7fffe7f8e220) with mutex handle 0x555555a86870

The last line is where things crash, an instrumented code is

// Destructor
Mutex::~Mutex()
{
        ERROR_MSG("Mutex %p tries to destroy handle %p, isValid: %d", this, handle, isValid);
        if (isValid)
        {
                ERROR_MSG("About to call DestroyMutex using factory instance %p", MutexFactory::i());
                MutexFactory::i()->DestroyMutex(handle);
                isValid = false;
                ERROR_MSG("Called DestroyMutex. Done");
        }
}

...


CK_RV MutexFactory::DestroyMutex(CK_VOID_PTR mutex)
{
        ERROR_MSG("Factory %p, DestroyMutex for %p, we are enabled? %d", this, mutex, enabled);
        if (!enabled) return CKR_OK;

        ERROR_MSG("Calling destroyMutex (func %p) with mutex handle %p", this->destroyMutex, mutex);
        return (this->destroyMutex)(mutex);
}

so we destruct on calling MutexFactory::i(), which means instance went out of scope and was killed by the unique_ptr.

While both OSSLCryptoFactory instance and MutexFactory instance are defined in the same compilation unit and in a correct order so that MutexFactory is supposed to destruct after OSSLCryptoFactory, there seems to be some weird state of the unique_ptr that causes a crash.

Let me know if there is something that can be fixed here. This crash is a blocker for Fedora 29 release as it prevents completely to deploy FreeIPA domain controller with DNSSEC support. Even if you would not enable DNSSEC, a mere fact that SoftHSMv2 is installed makes 389-ds LDAP server to crash when using NSS crypto library because SoftHSMv2 PKCS11 module is installed system-wide and is loaded by p11-kit.

@abbra

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2018

Appropriate Fedora bug is https://bugzilla.redhat.com/show_bug.cgi?id=1607635. Apparently, we had this reported during Spring 2018 but never were able to reproduce until now.

@abbra

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2018

Thanks to Lukas Slebodnik, here is a trivial reproducer: just run certutil to create an empty NSS database on Rawhide:

sh-4.4$ rm -rf p11_nssdb/
sh-4.4$ mkdir p11_nssdb
sh-4.4$ /usr/bin/certutil -d sql:./p11_nssdb -N --empty-password
Segmentation fault (core dumped)

sh-4.4$ rm -rf p11_nssdb/
sh-4.4$ mkdir p11_nssdb
sh-4.4$ valgrind /usr/bin/certutil -d sql:./p11_nssdb -N --empty-password
==18432== Memcheck, a memory error detector
==18432== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18432== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==18432== Command: /usr/bin/certutil -d sql:./p11_nssdb -N --empty-password
==18432== 
==18432== Jump to the invalid address stated on the next line
==18432==    at 0x5437680: ???
==18432==    by 0x559B804: UnknownInlinedFun (MutexFactory.cpp:56)
==18432==    by 0x559B804: Mutex::~Mutex() (MutexFactory.cpp:56)
==18432==    by 0x55A25BC: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:222)
==18432==    by 0x55A25FC: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:225)
==18432==    by 0x4B3092B: __run_exit_handlers (in /usr/lib64/libc-2.27.9000.so)
==18432==    by 0x4B30A5F: exit (in /usr/lib64/libc-2.27.9000.so)
==18432==    by 0x4B1A0E9: (below main) (in /usr/lib64/libc-2.27.9000.so)
==18432==  Address 0x5437680 is not stack'd, malloc'd or (recently) free'd
==18432== 
==18432== 
==18432== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==18432==  Access not within mapped region at address 0x5437680
==18432==    at 0x5437680: ???
==18432==    by 0x559B804: UnknownInlinedFun (MutexFactory.cpp:56)
==18432==    by 0x559B804: Mutex::~Mutex() (MutexFactory.cpp:56)
==18432==    by 0x55A25BC: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:222)
==18432==    by 0x55A25FC: OSSLCryptoFactory::~OSSLCryptoFactory() (OSSLCryptoFactory.cpp:225)
==18432==    by 0x4B3092B: __run_exit_handlers (in /usr/lib64/libc-2.27.9000.so)
==18432==    by 0x4B30A5F: exit (in /usr/lib64/libc-2.27.9000.so)
==18432==    by 0x4B1A0E9: (below main) (in /usr/lib64/libc-2.27.9000.so)
==18432==  If you believe this happened as a result of a stack
==18432==  overflow in your program's main thread (unlikely but
==18432==  possible), you can try to increase the size of the
==18432==  main thread stack using the --main-stacksize= flag.
==18432==  The main thread stack size used in this run was 8388608.
==18432== 
==18432== HEAP SUMMARY:
==18432==     in use at exit: 86,045 bytes in 39 blocks
==18432==   total heap usage: 10,804 allocs, 10,765 frees, 2,647,294 bytes allocated
==18432== 
==18432== LEAK SUMMARY:
==18432==    definitely lost: 40 bytes in 1 blocks
==18432==    indirectly lost: 72 bytes in 3 blocks
==18432==      possibly lost: 0 bytes in 0 blocks
==18432==    still reachable: 85,933 bytes in 35 blocks
==18432==         suppressed: 0 bytes in 0 blocks
==18432== Rerun with --leak-check=full to see details of leaked memory
==18432== 
==18432== For counts of detected and suppressed errors, rerun with: -v
==18432== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
@nmav

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2018

Note that checking the code in OSSLCryptoFactory::OSSLCryptoFactory it seems to be using CRYPTO_num_locks() which is a stub in openssl 1.1.0. I have not figured out why could that crash but most likely getting rid of that part of the code when compiling with openssl 1.1.0 would be an improvement as it is not needed.

@abbra

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2018

My understanding is that while accessing OS-specific mutex helpers via a MutexFactory instance allows us to be able to use helpers set by a caller application, we rely on undefined behavior in C++ on how such global static instances get destroyed past each other. One possibility here is to make that instance a part of a crypto factory as all places that want to access these mutexes have access to the crypto factory as well.

abbra added a commit to abbra/SoftHSMv2 that referenced this issue Jul 31, 2018
If a PKCS11 API caller provided own mutex handling callbacks,
we need to ensure they aren't used after C_Finalize is called
and SoftHSM instance is recycled.

Inability to do so may lead to a situation where callbacks might
be provided by a different dynamically loaded object which is removed
after C_Finalize() call. Thus, callback pointers become invalid and
calling them leads to crashes.

Fixes: opendnssec#408

Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
@bellgrim bellgrim closed this in #409 Aug 5, 2018
@bellgrim bellgrim added the bug label Aug 5, 2018
@bellgrim bellgrim added this to the 2.5.0 milestone Aug 5, 2018
@mouse07410

This comment has been minimized.

Copy link

commented Aug 14, 2019

@bellgrim It's been more than a year - any update...?

@abbra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

@mouse07410 the fix was merged already a year ago in #409

@mouse07410

This comment has been minimized.

Copy link

commented Aug 15, 2019

@abbra thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.