Check for NULL locks #23616

hyc · 2024-02-17T20:40:23Z

This will happen if a library calls SSL_CTX_free() in its destructor, which runs after OpenSSL's atexit() handler OPENSSL_cleanup() has already destroyed all the locks. This commit fixes a SEGV in that case.

Related to #23575

Checklist

This will happen if a library calls SSL_CTX_free() in its destructor, which runs after OpenSSL's atexit() handler OPENSSL_cleanup() has already destroyed all the locks. This commit fixes a SEGV in that case.

davidben · 2024-02-18T01:48:34Z

Given the myriad of ways that the atexit behavior can go wrong, I don't think it makes sense to play whack-a-mole trying to patch individual functions for broken invariants. Doing so could even obscure more serious problems. For example, if an fd to /dev/urandom is closed but not reset to -1, another part of the program could open another fd in its place, and then your program may silently use bad entropy in a security-critical component.

The only real fix is for OpenSSL to not tear itself down while it may still be in use.

hyc · 2024-02-18T10:26:53Z

your program may silently use bad entropy in a security-critical component.

So you're saying it's better for a security-critical component to crash and become a denial of service, than to detect an error and return an error code.

Except you're being inconsistent, since the same kind of check already exists just a few lines further down in the current code:

openssl/crypto/threads_pthread.c

Line 711 in 1a571a5

if (lock == NULL)

hyc · 2024-02-18T10:42:00Z

Btw, your fuzzcheck job failed because it's out of disk space.
https://github.com/hyc/openssl/actions/runs/7944169917/job/21689368952

2024-02-17T20:57:10.4107118Z /usr/bin/ld: final link failed: No space left on device
2024-02-17T20:57:10.4112347Z /usr/bin/ld: final link failed: No space left on device
2024-02-17T20:57:10.4319161Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.4333353Z make[1]: *** [Makefile:19005: test/fatalerrtest] Error 1
2024-02-17T20:57:10.4338346Z make[1]: *** Waiting for unfinished jobs....
2024-02-17T20:57:10.4346369Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.4364303Z make[1]: *** [Makefile:18960: test/exdatatest] Error 1
2024-02-17T20:57:10.6447205Z /usr/bin/ld/usr/bin/ld: final link failed: : final link failed: No space left on deviceNo space left on device
2024-02-17T20:57:10.6448514Z 
2024-02-17T20:57:10.6624368Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.6639259Z make[1]: *** [Makefile:19044: test/fips_version_test] Error 1
2024-02-17T20:57:10.6640359Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.6656258Z make[1]: *** [Makefile:19029: test/ffc_internal_test] Error 1
2024-02-17T20:57:10.6657504Z make[1]: Leaving directory '/src/openssl32'
2024-02-17T20:57:10.6691625Z make: *** [Makefile:2879: build_sw] Error 2
2024-02-17T20:57:14.2947813Z 2024-02-17 20:57:14,294 - root - ERROR - Building fuzzers failed.
2024-02-17T20:57:14.2949146Z 2024-02-17 20:57:14,294 - root - ERROR - Error building fuzzers for (commit: 1a571a59759465812b71f8f1ff3bcbd786839077, pr_ref: None).

kroeckx · 2024-02-18T11:07:44Z

The out of disk space is a known issue

davidben · 2024-02-18T14:33:45Z

@hyc It's not my CI job. While I contribute to OpenSSL and am quite familiar with the problem space, I'm not a decision maker for project. I work on BoringSSL, which solves this problem by not calling atexit in the first place. However, as OpenSSL does not seem to have decided what they wish to do here, all I can do is offer advice in comments.

This PR may address a symptom, but it ignores the underlying problem and misses that far, far worse symptoms can occur. Ignoring broken invariants piecemeal is a recipe security vulnerabilities. That you're only crashing on NULL dereference is a mercy. For a security-critical component, a clean crash is far better than use-after-free, overflow a buffer, race condition or signing ECDSA with bad entropy. Those will result in much worse! (Of course, it's best to do none of these. The way to do that is to fix OpenSSL to stop calling atexit.)

As for consistency, that's a free function. All free functions in OpenSSL behave like C free and C++ delete in accepting NULL. That's not a broken invariant but happens often as part of normal operation, so it's convenient for free functions to accept NULL. (It simplifies cleanup code.) Whereas trying to lock a NULL lock should never happen in normal operation and can only be a broken invariant elsewhere in the system.

tom-cosgrove-arm · 2024-02-19T08:58:16Z

So you're saying it's better for a security-critical component to crash and become a denial of service, than to detect an error and return an error code.

If "detect an error and return an error code" would always happen, then no. But since errors in very low-level routines like locks aren't always caught, and can't always be propagated back up the stack (a void function calls something that calls something ... that needs a lock) then there WILL be times that the error is ignored, and there just has to be one unlucky place that this becomes a serious vulnerability.

And "crash and become a denial of service" is definitely better than "serious vulnerability".

With the NULL-pointer crash, it is possible to know what the underlying problem was (lock was destroyed) - if this had been silently ignored and propagated back up the stack to something that was several levels deep it would have been much harder to figure out what had happened.

While I am not a decision maker for the project, I too think this change is potentially unsafe.

t8m · 2024-02-19T12:11:23Z

I totally agree with @davidben and @tom-cosgrove-arm on this one. The decision maker on technical issues is OTC. If any OTC member disagrees with my triage (resolved: wont fix - i.e., this won't be merged), he can add OTC hold label and it will be discussed by OTC as a whole.

kroeckx · 2024-03-12T12:31:05Z

Any reason to keep this open?

t8m · 2024-03-12T12:53:12Z

Closing now

Check for NULL locks

1a571a5

This will happen if a library calls SSL_CTX_free() in its destructor, which runs after OpenSSL's atexit() handler OPENSSL_cleanup() has already destroyed all the locks. This commit fixes a SEGV in that case.

t8m added resolved: wont fix The issue has been confirmed but won't be fixed triaged: bug The issue/pr is/fixes a bug labels Feb 19, 2024

t8m closed this Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for NULL locks #23616

Check for NULL locks #23616

hyc commented Feb 17, 2024

davidben commented Feb 18, 2024

hyc commented Feb 18, 2024

hyc commented Feb 18, 2024

kroeckx commented Feb 18, 2024 via email

davidben commented Feb 18, 2024 •

edited

tom-cosgrove-arm commented Feb 19, 2024

t8m commented Feb 19, 2024

kroeckx commented Mar 12, 2024

t8m commented Mar 12, 2024

Check for NULL locks #23616

Check for NULL locks #23616

Conversation

hyc commented Feb 17, 2024

Checklist

davidben commented Feb 18, 2024

hyc commented Feb 18, 2024

hyc commented Feb 18, 2024

kroeckx commented Feb 18, 2024 via email

davidben commented Feb 18, 2024 • edited

tom-cosgrove-arm commented Feb 19, 2024

t8m commented Feb 19, 2024

kroeckx commented Mar 12, 2024

t8m commented Mar 12, 2024

davidben commented Feb 18, 2024 •

edited