New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consult dlerror() only if a dl*() call fails #74469
Conversation
The string returned from dlerror() is purely diagnostic and should not itself be used to determine whether a previous call to dlopen() or dlsym() has failed. Those functions are documented with specific return values that signal failure; i.e., returning NULL. If we assume a non-NULL return from dlerror() means the prior dlsym() call failed, we are vulnerable to a race with another thread outside of Rust control concurrently inducing dynamic linking operations. This manifests on illumos systems with an intermittent spurious failure from rustc: error: ld.so.1: rustc: fatal: _ex_unwind: can't find symbol The illumos libc checks for the existence of an "_ex_unwind" symbol via dlsym() under some conditions when a thread exits, as part of an old contract with a particular C++ standard library. If another thread exits at the same time that rustc is attempting to load a plugin, we can hit this race and report an error that does not belong to us.
(rust_highfive has picked a reviewer for you, use r? to override) |
let s = CStr::from_ptr(last_error).to_bytes(); | ||
Err(str::from_utf8(s).unwrap().to_owned()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let s = CStr::from_ptr(last_error).to_bytes(); | |
Err(str::from_utf8(s).unwrap().to_owned()) | |
let s = CStr::from_ptr(last_error).to_str().unwrap(); | |
Err(s.to_owned()) |
// dlerror reports the most recent failure that occured during a | ||
// dynamic linking operation and then clears that error; we call | ||
// once in advance of our operation in an attempt to discard any | ||
// stale prior error report that may exist: | ||
let _old_error = libc::dlerror(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this call still needed? Surely any prior error will be replaced if there's a new error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @ollie27 said, there's no need to do this anymore if we don't use the return value of dlerror
to determine whether an error occurred.
if ptr::null() != result { | ||
Ok(result) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the else
block now has a condition inside, could you switch to an early return for the happy path?
// We should only check dlerror() in the event that the operation | ||
// fails, which we determine by checking for a NULL return. This | ||
// covers at least dlopen() and dlsym(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document these semantics at the function level? Specifically, if f
returns a null pointer, this function returns Err
with the string in dlerror
.
Also, just to be sure, do all the functions we pass to this helper return NULL
and only NULL
to indicate an error? There's no (void *) 1
weirdness or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dlsym
at least, the current approach is explicitly recommended on linux and seems to be necessary on illumos as well, since NULL
can indicate either a "symbol not found" error or a found symbol with the value NULL
. We should be checking the return value of dlopen
, but we will need to find a different workaround here.
The current approach is specifically mandated for |
@jclulow closing this due to inactivity. When you have the time, you can submit a new pr that works in a way that addresses the above concerns. Thanks for taking the time to contribute |
This works around behavior observed on illumos in rust-lang#74469, in which foreign code (libc according to the OP) was racing with rustc to check `dlerror`.
Refactor dynamic library error checking on *nix The old code was checking `dlerror` more often than necessary, since (unlike `dlsym`) checking the return value of [`dlopen`](https://www.man7.org/linux/man-pages/man3/dlopen.3.html) is enough to indicate whether an error occurred. In the first commit, I've refactored the code to minimize the number of system calls needed. It should be strictly better than the old version. The second commit is an optional addendum which fixes the issue observed on illumos in rust-lang#74469, a PR I reviewed that was ultimately closed due to inactivity. I'm not sure how hard we try to work around platform-specific bugs like this, and I believe that, due to the way that `dlerror` is specified in the POSIX standard, libc implementations that want to run on conforming systems cannot call `dlsym` in multi-threaded programs.
The string returned from dlerror() is purely diagnostic and should not
itself be used to determine whether a previous call to dlopen() or
dlsym() has failed. Those functions are documented with specific return
values that signal failure; i.e., returning NULL.
If we assume a non-NULL return from dlerror() means the prior dlsym()
call failed, we are vulnerable to a race with another thread outside of
Rust control concurrently inducing dynamic linking operations. This
manifests on illumos systems with an intermittent spurious failure from
rustc:
The illumos libc checks for the existence of an "_ex_unwind" symbol via
dlsym() under some conditions when a thread exits, as part of an old
contract with a particular C++ standard library. If another thread
exits at the same time that rustc is attempting to load a plugin, we can
hit this race and report an error that does not belong to us.