Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for NULL locks #23616

Closed
wants to merge 1 commit into from
Closed

Check for NULL locks #23616

wants to merge 1 commit into from

Conversation

hyc
Copy link

@hyc hyc commented Feb 17, 2024

This will happen if a library calls SSL_CTX_free() in its destructor, which runs after OpenSSL's atexit() handler OPENSSL_cleanup() has already destroyed all the locks. This commit fixes a SEGV in that case.

Related to #23575

Checklist

This will happen if a library calls SSL_CTX_free() in its destructor,
which runs after OpenSSL's atexit() handler OPENSSL_cleanup() has
already destroyed all the locks. This commit fixes a SEGV in that case.
@davidben
Copy link
Contributor

Given the myriad of ways that the atexit behavior can go wrong, I don't think it makes sense to play whack-a-mole trying to patch individual functions for broken invariants. Doing so could even obscure more serious problems. For example, if an fd to /dev/urandom is closed but not reset to -1, another part of the program could open another fd in its place, and then your program may silently use bad entropy in a security-critical component.

The only real fix is for OpenSSL to not tear itself down while it may still be in use.

@hyc
Copy link
Author

hyc commented Feb 18, 2024

your program may silently use bad entropy in a security-critical component.

So you're saying it's better for a security-critical component to crash and become a denial of service, than to detect an error and return an error code.

Except you're being inconsistent, since the same kind of check already exists just a few lines further down in the current code:

if (lock == NULL)

@hyc
Copy link
Author

hyc commented Feb 18, 2024

Btw, your fuzzcheck job failed because it's out of disk space.
https://github.com/hyc/openssl/actions/runs/7944169917/job/21689368952

2024-02-17T20:57:10.4107118Z /usr/bin/ld: final link failed: No space left on device
2024-02-17T20:57:10.4112347Z /usr/bin/ld: final link failed: No space left on device
2024-02-17T20:57:10.4319161Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.4333353Z make[1]: *** [Makefile:19005: test/fatalerrtest] Error 1
2024-02-17T20:57:10.4338346Z make[1]: *** Waiting for unfinished jobs....
2024-02-17T20:57:10.4346369Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.4364303Z make[1]: *** [Makefile:18960: test/exdatatest] Error 1
2024-02-17T20:57:10.6447205Z /usr/bin/ld/usr/bin/ld: final link failed: : final link failed: No space left on deviceNo space left on device
2024-02-17T20:57:10.6448514Z 
2024-02-17T20:57:10.6624368Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.6639259Z make[1]: *** [Makefile:19044: test/fips_version_test] Error 1
2024-02-17T20:57:10.6640359Z clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
2024-02-17T20:57:10.6656258Z make[1]: *** [Makefile:19029: test/ffc_internal_test] Error 1
2024-02-17T20:57:10.6657504Z make[1]: Leaving directory '/src/openssl32'
2024-02-17T20:57:10.6691625Z make: *** [Makefile:2879: build_sw] Error 2
2024-02-17T20:57:14.2947813Z 2024-02-17 20:57:14,294 - root - ERROR - Building fuzzers failed.
2024-02-17T20:57:14.2949146Z 2024-02-17 20:57:14,294 - root - ERROR - Error building fuzzers for (commit: 1a571a59759465812b71f8f1ff3bcbd786839077, pr_ref: None).

@kroeckx
Copy link
Member

kroeckx commented Feb 18, 2024 via email

@davidben
Copy link
Contributor

davidben commented Feb 18, 2024

@hyc It's not my CI job. While I contribute to OpenSSL and am quite familiar with the problem space, I'm not a decision maker for project. I work on BoringSSL, which solves this problem by not calling atexit in the first place. However, as OpenSSL does not seem to have decided what they wish to do here, all I can do is offer advice in comments.

This PR may address a symptom, but it ignores the underlying problem and misses that far, far worse symptoms can occur. Ignoring broken invariants piecemeal is a recipe security vulnerabilities. That you're only crashing on NULL dereference is a mercy. For a security-critical component, a clean crash is far better than use-after-free, overflow a buffer, race condition or signing ECDSA with bad entropy. Those will result in much worse! (Of course, it's best to do none of these. The way to do that is to fix OpenSSL to stop calling atexit.)

As for consistency, that's a free function. All free functions in OpenSSL behave like C free and C++ delete in accepting NULL. That's not a broken invariant but happens often as part of normal operation, so it's convenient for free functions to accept NULL. (It simplifies cleanup code.) Whereas trying to lock a NULL lock should never happen in normal operation and can only be a broken invariant elsewhere in the system.

@tom-cosgrove-arm
Copy link
Contributor

So you're saying it's better for a security-critical component to crash and become a denial of service, than to detect an error and return an error code.

If "detect an error and return an error code" would always happen, then no. But since errors in very low-level routines like locks aren't always caught, and can't always be propagated back up the stack (a void function calls something that calls something ... that needs a lock) then there WILL be times that the error is ignored, and there just has to be one unlucky place that this becomes a serious vulnerability.

And "crash and become a denial of service" is definitely better than "serious vulnerability".

With the NULL-pointer crash, it is possible to know what the underlying problem was (lock was destroyed) - if this had been silently ignored and propagated back up the stack to something that was several levels deep it would have been much harder to figure out what had happened.

While I am not a decision maker for the project, I too think this change is potentially unsafe.

@t8m t8m added resolved: wont fix The issue has been confirmed but won't be fixed triaged: bug The issue/pr is/fixes a bug labels Feb 19, 2024
@t8m
Copy link
Member

t8m commented Feb 19, 2024

I totally agree with @davidben and @tom-cosgrove-arm on this one. The decision maker on technical issues is OTC. If any OTC member disagrees with my triage (resolved: wont fix - i.e., this won't be merged), he can add OTC hold label and it will be discussed by OTC as a whole.

@kroeckx
Copy link
Member

kroeckx commented Mar 12, 2024

Any reason to keep this open?

@t8m
Copy link
Member

t8m commented Mar 12, 2024

Closing now

@t8m t8m closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolved: wont fix The issue has been confirmed but won't be fixed triaged: bug The issue/pr is/fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants