Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/init.c: Fix a locking issue in OPENSSL_init_crypto() #13288

Closed

Conversation

mspncp
Copy link
Contributor

@mspncp mspncp commented Oct 31, 2020

OPENSSL_init_crypto() passes its settings argument to
ossl_init_config() through a static variable called
conf_settings, because ossl_init_config() is a
run-once function which can't take any parameters.

This workaround is thread-safe, because passing a non-null
settings argument is well defined and allowed only during
early initialization in single-threaded mode.

Nevertheless, the 'init_lock' was added in commit bf24111
(Avoid a race condition in loading config settings, 2016-02-09)
to protect this workaround. This probably happened to fix a
thread sanitizer warning. I believe it was a false false
positive caused by unnecessary writes of null values to the
conf_settings variable.

Since commit e6c5461 (Load the default config file before
working with default properties, 2020-07-31) the unnecessary
lock is causing a lock-order-inversion, so it needs to be
removed. Instead, we prevent concurrency warnings by avoiding
to update conf_settings unnecessarily.

Fixes #12869

Checklist
  • documentation is added or updated
  • tests are added or updated

@mspncp mspncp added branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug labels Oct 31, 2020
@mspncp mspncp added this to the 3.0.0 milestone Oct 31, 2020
OPENSSL_init_crypto() passes its `settings` argument to
`ossl_init_config()` through a static variable called
`conf_settings`, because ossl_init_config() is a
run-once function which can't take any parameters.

This workaround is thread-safe, because passing a non-null
`settings` argument is well defined and allowed only during
early initialization in single-threaded mode.

Nevertheless, the 'init_lock' was added in commit bf24111
(Avoid a race condition in loading config settings, 2016-02-09)
to protect this workaround. This probably happened to fix a
thread sanitizer warning. I believe it was a false positive
caused by unnecessary writes of null values to the
`conf_settings` variable.

Since commit e6c5461 (Load the default config file before
working with default properties, 2020-07-31) the unnecessary
lock is causing a lock-order-inversion, so it needs to be
removed. Instead, we prevent concurrency warnings by avoiding
to update `conf_settings` unnecessarily.

Fixes openssl#12869
@mspncp mspncp force-pushed the pr-crypto-init-fix-locking-issue branch from b58b2f7 to 9ffbf0f Compare October 31, 2020 23:43
@mspncp
Copy link
Contributor Author

mspncp commented Oct 31, 2020

@bernd-edlinger I'd be interested in your expert feedback.

Copy link
Contributor

@paulidale paulidale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Wait for @bernd-edlinger's input though.

@bernd-edlinger
Copy link
Member

I think the thread sanitizer is simply not designed to look for cycles involving
mutexes and pthread_once functions. In other words, it is pure luck that you
see a thread sanitizer warning now, but the underlying problem will not go
away so easily.

@mspncp
Copy link
Contributor Author

mspncp commented Nov 1, 2020

But would you say that the code is ok under the assumption that the application is only allowed to pass a non-zero pointer before using libcrypto resp. creating additional threads?

@bernd-edlinger
Copy link
Member

I think the problem is when the application passes zero.
I spent a while looking at the sanitizer output in #12869,
and I don't think this is a false positive warning.
Just replace M38 by the internal mutex that is held,
while the run_once function executes.

@mspncp
Copy link
Contributor Author

mspncp commented Nov 1, 2020

Is there really a lock lurking inside pthread_once()? I would have expected that it's just some atomic test and set.

@mspncp
Copy link
Contributor Author

mspncp commented Nov 1, 2020

Is there really a lock lurking inside pthread_once()? I would have expected that it's just some atomic test and set.

See pthread_once.c.

@mspncp
Copy link
Contributor Author

mspncp commented Nov 1, 2020

@bernd-edlinger I assume you are talking about the wait operations on the once_control? But those happen outside the init_routine, do you really think they could cause a lock order inversion?

@mspncp
Copy link
Contributor Author

mspncp commented Nov 1, 2020

Editorial note: When this PR gets merged, I would like to add the following line to the commit message to acknowledge @jwalch's contribution, who did the initial analysis of #12869 and suggested a first fix in #13285.

    Co-authored-by: jwalch <jeremy.walch@gmail.com>

@jwalch, is that ok for you?

@mspncp
Copy link
Contributor Author

mspncp commented Nov 1, 2020

@bernd-edlinger I assume you are talking about the wait operations on the once_control? But those happen outside the init_routine, do you really think they could cause a lock order inversion?

My understanding is that pthread_once() will take the fast path except for the first time, so no locking is involved under the assumptions of the commit message.

@jwalch
Copy link
Contributor

jwalch commented Nov 1, 2020

@jwalch, is that ok for you?

shrug If you want to, that's fine, but I honestly don't care about getting "credit" in the commit message. What analysis I've contributed is there in the issues / PRs regardless. I do appreciate your concern for properly acknowledging contributions, however.

@bernd-edlinger
Copy link
Member

@bernd-edlinger I assume you are talking about the wait operations on the once_control? But those happen outside the init_routine, do you really think they could cause a lock order inversion?

Yes, exactly, one thread sets __PTHREAD_ONCE_INPROGRESS
and enters the init_routine (); while the second threads starts to wait
for __PTHREAD_ONCE_DONE, which only happens after the first
thread returns from init_routine. The use of the atomics just obscures
the semantic from the sanitizer.

@bernd-edlinger
Copy link
Member

My understanding is that pthread_once() will take the fast path except for the first time, so no locking is involved under the assumptions of the commit message.

Yes, but what happens if there is no call with settings != NULL,
and two or more threads race to call EVP_MD_fetch, EVP_MAC_fetch
or RAND_bytes and one calls OPENSSL_init_ssl.

So while you are right that the mutex is unnecessary when the OPENSSL_init_crypto
is invoked in the right way, it has the nice side-effect that we now know there is
a potential deadlock.

@paulidale
Copy link
Contributor

paulidale commented Nov 2, 2020

Is a TSAN build where there is locking around the atomic operations worthwhile? I.e. make TSAN more aware of what is happening.

I don't see why it would help but it's easy to miss things when dealing with concurrency.

@mspncp
Copy link
Contributor Author

mspncp commented Nov 4, 2020

@bernd-edlinger after digging more deeply into the issue I agree with your #13288 (comment) and #13288 (comment) that removing lock M38 around the run-once call does not fix the true issue, which is a potential race of different threads competing for a (first) call to OPENSSL_init_crypto(). Although my pull request might improve the situation because it reduces the likelihood of a deadlock (the lock in OPENSSL_init_crypto() effectively becomes a one-time lock), I still have a bad feeling about this until the problem is properly understood. I tried to untangle the lock cycles mentally, but I failed. So I had a call with @mattcaswell and explained the problem to him. He will try to help me figure it out in the next days. For his convenience, I post the results of my preliminary analysis. I won't merge this pull request, until the problem is fully understood.

The two locking cycles

I reproduced issue #12869 with a recent copy of master. The TSAN report is still the same, it reports two cycles. I separated them into different files and added markers in the call stack at the interesting locations.

If you look at the diff of those two reports, you can see that it's essentially the same cycle, only with M70 replaced by M89. The difference is caused by the fact that one lock is taken by an EVP_MAC_fetch(), the other one by an EVP_MD_fetch().

--- M38-M62-M70-M38.log	2020-11-03 01:18:53.837246012 +0100
+++ M38-M62-M89-M38.log	2020-11-02 23:41:47.972695248 +0100
@@ -1,5 +1,5 @@
 WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=26018)
-  Cycle in lock order graph: M38 (0x7b10000005c0) => M62 (0x7b1000000800) => M70 (0x7b1000000740) => M38
+  Cycle in lock order graph: M38 (0x7b10000005c0) => M62 (0x7b1000000800) => M89 (0x7b1000000740) => M38
 
   Mutex M62 acquired here while holding mutex M38 in main thread:
     #0 pthread_rwlock_wrlock /var/tmp/portage/sys-devel/gcc-9.3.0-r1/work/gcc-9.3.0/libsanitizer/tsan/tsan_interceptors.cc:1324 (libtsan.so.0+0x2f2ba)
@@ -30,11 +30,11 @@
     #4 apps_startup apps/openssl.c:66 (openssl+0x7a187)
     #5 main apps/openssl.c:260 (openssl+0x7a2cf)
 
-  Mutex M70 acquired here while holding mutex M62 in main thread:
+  Mutex M89 acquired here while holding mutex M62 in main thread:
     #0 pthread_rwlock_wrlock /var/tmp/portage/sys-devel/gcc-9.3.0-r1/work/gcc-9.3.0/libsanitizer/tsan/tsan_interceptors.cc:1324 (libtsan.so.0+0x2f2ba)
     #1 CRYPTO_THREAD_write_lock crypto/threads_pthread.c:86 (libcrypto.so.3+0x2da193)
     #2 ossl_property_write_lock crypto/property/property.c:127 (libcrypto.so.3+0x31f360)
-==> #3 ossl_method_store_add crypto/property/property.c:249 (libcrypto.so.3+0x31f9b1) M62[provider store lock]=>M70[method store lock]
+==> #3 ossl_method_store_add crypto/property/property.c:249 (libcrypto.so.3+0x31f9b1) M62[provider store lock]=>M89[method store lock]
     #4 put_evp_method_in_store crypto/evp/evp_fetch.c:168 (libcrypto.so.3+0x27bef5)
     #5 ossl_method_construct_this crypto/core_fetch.c:95 (libcrypto.so.3+0x2c0214)
     #6 algorithm_do_this crypto/core_algorithm.c:65 (libcrypto.so.3+0x2bfb51)
@@ -44,10 +44,14 @@
     #10 ossl_method_construct crypto/core_fetch.c:124 (libcrypto.so.3+0x2c03e4)
     #11 inner_evp_generic_fetch crypto/evp/evp_fetch.c:316 (libcrypto.so.3+0x27c57f)
     #12 evp_generic_fetch crypto/evp/evp_fetch.c:363 (libcrypto.so.3+0x27c7e5)
-    #13 EVP_MAC_fetch crypto/evp/mac_meth.c:155 (libcrypto.so.3+0x294423)
-    #14 fipsinstall_main apps/fipsinstall.c:425 (openssl+0x6c502)
-    #15 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
-    #16 main apps/openssl.c:293 (openssl+0x7a458)
+    #13 EVP_MD_fetch crypto/evp/digest.c:961 (libcrypto.so.3+0x255de9)
+    #14 ossl_prov_digest_fetch providers/common/provider_util.c:131 (libcrypto.so.3+0x478880)
+    #15 ossl_prov_digest_load_from_params providers/common/provider_util.c:154 (libcrypto.so.3+0x4789cf)
+    #16 hmac_set_ctx_params providers/implementations/macs/hmac_prov.c:262 (libcrypto.so.3+0x461a20)
+    #17 EVP_MAC_CTX_set_params crypto/evp/mac_lib.c:156 (libcrypto.so.3+0x29362e)
+    #18 fipsinstall_main apps/fipsinstall.c:445 (openssl+0x6c5fc)
+    #19 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
+    #20 main apps/openssl.c:293 (openssl+0x7a458)
 
   Mutex M62 previously acquired by the same thread here:
     #0 pthread_rwlock_rdlock /var/tmp/portage/sys-devel/gcc-9.3.0-r1/work/gcc-9.3.0/libsanitizer/tsan/tsan_interceptors.cc:1294 (libtsan.so.0+0x2ef7a)
@@ -57,39 +61,50 @@
     #4 ossl_method_construct crypto/core_fetch.c:124 (libcrypto.so.3+0x2c03e4)
     #5 inner_evp_generic_fetch crypto/evp/evp_fetch.c:316 (libcrypto.so.3+0x27c57f)
     #6 evp_generic_fetch crypto/evp/evp_fetch.c:363 (libcrypto.so.3+0x27c7e5)
-    #7 EVP_MAC_fetch crypto/evp/mac_meth.c:155 (libcrypto.so.3+0x294423)
-    #8 fipsinstall_main apps/fipsinstall.c:425 (openssl+0x6c502)
-    #9 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
-    #10 main apps/openssl.c:293 (openssl+0x7a458)
+    #7 EVP_MD_fetch crypto/evp/digest.c:961 (libcrypto.so.3+0x255de9)
+    #8 ossl_prov_digest_fetch providers/common/provider_util.c:131 (libcrypto.so.3+0x478880)
+    #9 ossl_prov_digest_load_from_params providers/common/provider_util.c:154 (libcrypto.so.3+0x4789cf)
+    #10 hmac_set_ctx_params providers/implementations/macs/hmac_prov.c:262 (libcrypto.so.3+0x461a20)
+    #11 EVP_MAC_CTX_set_params crypto/evp/mac_lib.c:156 (libcrypto.so.3+0x29362e)
+    #12 fipsinstall_main apps/fipsinstall.c:445 (openssl+0x6c5fc)
+    #13 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
+    #14 main apps/openssl.c:293 (openssl+0x7a458)
 
-  Mutex M38 acquired here while holding mutex M70 in main thread:
+  Mutex M38 acquired here while holding mutex M89 in main thread:
     #0 pthread_rwlock_wrlock /var/tmp/portage/sys-devel/gcc-9.3.0-r1/work/gcc-9.3.0/libsanitizer/tsan/tsan_interceptors.cc:1324 (libtsan.so.0+0x2f2ba)
     #1 CRYPTO_THREAD_write_lock crypto/threads_pthread.c:86 (libcrypto.so.3+0x2da193)
-==> #2 OPENSSL_init_crypto crypto/init.c:542 (libcrypto.so.3+0x2c57f7) M70[method store lock]=>M38[init lock]
+==> #2 OPENSSL_init_crypto crypto/init.c:542 (libcrypto.so.3+0x2c57f7) M89[method store lock]=>M38[init lock]
     #3 ossl_ctx_global_properties crypto/property/property.c:103 (libcrypto.so.3+0x31f1e5)
-==> #4 ossl_method_store_fetch crypto/property/property.c:360 (libcrypto.so.3+0x31ffc4) M70[method store lock]
+==> #4 ossl_method_store_fetch crypto/property/property.c:360 (libcrypto.so.3+0x31ffc4) M89[method store lock]
     #5 get_evp_method_from_store crypto/evp/evp_fetch.c:130 (libcrypto.so.3+0x27bd69)
     #6 ossl_method_construct crypto/core_fetch.c:130 (libcrypto.so.3+0x2c0424)
     #7 inner_evp_generic_fetch crypto/evp/evp_fetch.c:316 (libcrypto.so.3+0x27c57f)
     #8 evp_generic_fetch crypto/evp/evp_fetch.c:363 (libcrypto.so.3+0x27c7e5)
-    #9 EVP_MAC_fetch crypto/evp/mac_meth.c:155 (libcrypto.so.3+0x294423)
-    #10 fipsinstall_main apps/fipsinstall.c:425 (openssl+0x6c502)
-    #11 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
-    #12 main apps/openssl.c:293 (openssl+0x7a458)
+    #9 EVP_MD_fetch crypto/evp/digest.c:961 (libcrypto.so.3+0x255de9)
+    #10 ossl_prov_digest_fetch providers/common/provider_util.c:131 (libcrypto.so.3+0x478880)
+    #11 ossl_prov_digest_load_from_params providers/common/provider_util.c:154 (libcrypto.so.3+0x4789cf)
+    #12 hmac_set_ctx_params providers/implementations/macs/hmac_prov.c:262 (libcrypto.so.3+0x461a20)
+    #13 EVP_MAC_CTX_set_params crypto/evp/mac_lib.c:156 (libcrypto.so.3+0x29362e)
+    #14 fipsinstall_main apps/fipsinstall.c:445 (openssl+0x6c5fc)
+    #15 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
+    #16 main apps/openssl.c:293 (openssl+0x7a458)
 
-  Mutex M70 previously acquired by the same thread here:
+  Mutex M89 previously acquired by the same thread here:
     #0 pthread_rwlock_rdlock /var/tmp/portage/sys-devel/gcc-9.3.0-r1/work/gcc-9.3.0/libsanitizer/tsan/tsan_interceptors.cc:1294 (libtsan.so.0+0x2ef7a)
     #1 CRYPTO_THREAD_read_lock crypto/threads_pthread.c:73 (libcrypto.so.3+0x2da150)
     #2 ossl_property_read_lock crypto/property/property.c:122 (libcrypto.so.3+0x31f309)
-==> #3 ossl_method_store_fetch crypto/property/property.c:351 (libcrypto.so.3+0x31ff3d) M70[method store lock]
+==> #3 ossl_method_store_fetch crypto/property/property.c:351 (libcrypto.so.3+0x31ff3d) M89[method store lock]
     #4 get_evp_method_from_store crypto/evp/evp_fetch.c:130 (libcrypto.so.3+0x27bd69)
     #5 ossl_method_construct crypto/core_fetch.c:130 (libcrypto.so.3+0x2c0424)
     #6 inner_evp_generic_fetch crypto/evp/evp_fetch.c:316 (libcrypto.so.3+0x27c57f)
     #7 evp_generic_fetch crypto/evp/evp_fetch.c:363 (libcrypto.so.3+0x27c7e5)
-    #8 EVP_MAC_fetch crypto/evp/mac_meth.c:155 (libcrypto.so.3+0x294423)
-    #9 fipsinstall_main apps/fipsinstall.c:425 (openssl+0x6c502)
-    #10 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
-    #11 main apps/openssl.c:293 (openssl+0x7a458)
+    #8 EVP_MD_fetch crypto/evp/digest.c:961 (libcrypto.so.3+0x255de9)
+    #9 ossl_prov_digest_fetch providers/common/provider_util.c:131 (libcrypto.so.3+0x478880)
+    #10 ossl_prov_digest_load_from_params providers/common/provider_util.c:154 (libcrypto.so.3+0x4789cf)
+    #11 hmac_set_ctx_params providers/implementations/macs/hmac_prov.c:262 (libcrypto.so.3+0x461a20)
+    #12 EVP_MAC_CTX_set_params crypto/evp/mac_lib.c:156 (libcrypto.so.3+0x29362e)
+    #13 fipsinstall_main apps/fipsinstall.c:445 (openssl+0x6c5fc)
+    #14 do_cmd apps/openssl.c:411 (openssl+0x7aa88)
+    #15 main apps/openssl.c:293 (openssl+0x7a458)
 
 SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) crypto/threads_pthread.c:86 in CRYPTO_THREAD_write_lock
-==================

The locations of the locks in the source code

The following patch marks the locations in the source code where the locks are taken:

commit 97293ace93d24e29464fac830aaaa9d53411050e
Author: Dr. Matthias St. Pierre <matthias.st.pierre@ncp-e.com>
Date:   Tue Nov 3 20:04:31 2020 +0100

    Lock order inversion

diff --git a/crypto/init.c b/crypto/init.c
index cfd4eab9ed..dd04132abe 100644
--- a/crypto/init.c
+++ b/crypto/init.c
@@ -539,7 +539,7 @@ int OPENSSL_init_crypto(uint64_t opts, const OPENSSL_INIT_SETTINGS *settings)
 
     if (opts & OPENSSL_INIT_LOAD_CONFIG) {
         int ret;
-        CRYPTO_THREAD_write_lock(init_lock);
+        CRYPTO_THREAD_write_lock(init_lock);  /* M38[init_lock] */
         conf_settings = settings;
         ret = RUN_ONCE(&config, ossl_init_config);
         conf_settings = NULL;
diff --git a/crypto/property/property.c b/crypto/property/property.c
index 9cfca81190..8be2de8f40 100644
--- a/crypto/property/property.c
+++ b/crypto/property/property.c
@@ -100,7 +100,7 @@ OSSL_PROPERTY_LIST **ossl_ctx_global_properties(OSSL_LIB_CTX *libctx,
                                                 int loadconfig)
 {
 #ifndef FIPS_MODULE
-    if (loadconfig && !OPENSSL_init_crypto(OPENSSL_INIT_LOAD_CONFIG, NULL))
+    if (loadconfig && !OPENSSL_init_crypto(OPENSSL_INIT_LOAD_CONFIG, NULL)) /* M70,M89[method store lock]=>M38[init lock] */
         return NULL;
 #endif
     return ossl_lib_ctx_get_data(libctx, OSSL_LIB_CTX_GLOBAL_PROPERTIES,
@@ -246,7 +246,7 @@ int ossl_method_store_add(OSSL_METHOD_STORE *store, const OSSL_PROVIDER *prov,
      * A write lock is used unconditionally because we wend our way down to the
      * property string code which isn't locking friendly.
      */
-    ossl_property_write_lock(store);
+    ossl_property_write_lock(store);  /* M62[provider store lock]=>M70,M89[method store lock] */
     ossl_method_cache_flush(store, nid);
     if ((impl->properties = ossl_prop_defn_get(store->ctx, properties)) == NULL) {
         impl->properties = ossl_parse_property(store->ctx, properties);
@@ -348,7 +348,7 @@ int ossl_method_store_fetch(OSSL_METHOD_STORE *store, int nid,
      * This only needs to be a read lock, because queries never create property
      * names or value and thus don't modify any of the property string layer.
      */
-    ossl_property_read_lock(store);
+    ossl_property_read_lock(store);   /* M70,M89[method store lock] */
     alg = ossl_method_store_retrieve(store, nid);
     if (alg == NULL) {
         ossl_property_unlock(store);
diff --git a/crypto/provider_core.c b/crypto/provider_core.c
index 52e68641ae..6348dfda4f 100644
--- a/crypto/provider_core.c
+++ b/crypto/provider_core.c
@@ -226,7 +226,7 @@ OSSL_PROVIDER *ossl_provider_find(OSSL_LIB_CTX *libctx, const char *name,
 #endif
 
         tmpl.name = (char *)name;
-        CRYPTO_THREAD_write_lock(store->lock);
+        CRYPTO_THREAD_write_lock(store->lock); /* M38[init_lock]=>M62[provider store lock] */
         if ((i = sk_OSSL_PROVIDER_find(store->providers, &tmpl)) == -1
             || (prov = sk_OSSL_PROVIDER_value(store->providers, i)) == NULL
             || !ossl_provider_up_ref(prov))
@@ -742,7 +742,7 @@ int ossl_provider_forall_loaded(OSSL_LIB_CTX *ctx,
 #endif
 
     if (store != NULL) {
-        CRYPTO_THREAD_read_lock(store->lock);
+        CRYPTO_THREAD_read_lock(store->lock); /* M62[provider store lock] */
 
         provider_activate_fallbacks(store);
 

@mspncp
Copy link
Contributor Author

mspncp commented Nov 4, 2020

I forgot to mention in my last post: the following change by @mattcaswell added the dependency

M70,M89[method store lock]=>M38[init lock]

which closed the cycle.

 #ifndef FIPS_MODULE
-    if (loadconfig && !OPENSSL_init_crypto(OPENSSL_INIT_LOAD_CONFIG, NULL))
+    if (loadconfig && !OPENSSL_init_crypto(OPENSSL_INIT_LOAD_CONFIG, NULL)) /* M70,M89[method store lock]=>M38[init lock] */
         return NULL;
 #endif

It was introduced in commit e6c5461 in order to fix issue #12565.

@mspncp
Copy link
Contributor Author

mspncp commented Nov 5, 2020

Closing in favour of #13321.

@mspncp mspncp closed this Nov 5, 2020
@mspncp mspncp deleted the pr-crypto-init-fix-locking-issue branch November 5, 2020 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test: thread sanitizer reports lock-order-inversions in test_fipsinstall
4 participants