New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SoftHSMv2 crashes while running a destructor for OSSLCryptoFactory on a PKCS11 module unload #408
Comments
|
Appropriate Fedora bug is https://bugzilla.redhat.com/show_bug.cgi?id=1607635. Apparently, we had this reported during Spring 2018 but never were able to reproduce until now. |
|
Thanks to Lukas Slebodnik, here is a trivial reproducer: just run certutil to create an empty NSS database on Rawhide: |
|
Note that checking the code in |
|
My understanding is that while accessing OS-specific mutex helpers via a MutexFactory instance allows us to be able to use helpers set by a caller application, we rely on undefined behavior in C++ on how such global static instances get destroyed past each other. One possibility here is to make that instance a part of a crypto factory as all places that want to access these mutexes have access to the crypto factory as well. |
If a PKCS11 API caller provided own mutex handling callbacks, we need to ensure they aren't used after C_Finalize is called and SoftHSM instance is recycled. Inability to do so may lead to a situation where callbacks might be provided by a different dynamically loaded object which is removed after C_Finalize() call. Thus, callback pointers become invalid and calling them leads to crashes. Fixes: opendnssec#408 Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
|
@bellgrim It's been more than a year - any update...? |
|
@mouse07410 the fix was merged already a year ago in #409 |
|
@abbra thanks! |
When using SoftHSMv2 as a PKCS11 module loaded by NSS library via p11-kit-proxy on Fedora (Rawhide), we see a reproducible crash. The crash happens when Mutex class destructor attempts to access MutexFactory instance while OSSLCryptoFactory destructor removes Mutex instances from the
locksarray:another sample, with a bit of debugging code added and visible:
I tried to analyze what happens and added a debug output. The result looks like this (here and above source code lines are off compared to the upstream because of ERROR_MSG() calls I added):
The last line is where things crash, an instrumented code is
so we destruct on calling MutexFactory::i(), which means instance went out of scope and was killed by the unique_ptr.
While both OSSLCryptoFactory instance and MutexFactory instance are defined in the same compilation unit and in a correct order so that MutexFactory is supposed to destruct after OSSLCryptoFactory, there seems to be some weird state of the unique_ptr that causes a crash.
Let me know if there is something that can be fixed here. This crash is a blocker for Fedora 29 release as it prevents completely to deploy FreeIPA domain controller with DNSSEC support. Even if you would not enable DNSSEC, a mere fact that SoftHSMv2 is installed makes 389-ds LDAP server to crash when using NSS crypto library because SoftHSMv2 PKCS11 module is installed system-wide and is loaded by p11-kit.
The text was updated successfully, but these errors were encountered: