Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize is_main_thread by using a libc call #616

Merged
merged 1 commit into from
May 21, 2024

Conversation

notgull
Copy link
Contributor

@notgull notgull commented May 19, 2024

Previous, the "is_main_thread" function would always make a call to the
"NSThread::isMainThread_class" function to determine if the current thread is
the main thread. This requires an objc dispatch call every time it is called. As
this function or MainThreadMarker is used frequently throughout the objc2
ecosystem it takes a lot of time.

Crucially, the main-threadedness of a thread will never change. So we can avoid
making this objc call every time if we cache this state in a thread-local
variable. This commit does that for is_main_thread, and also makes it so
MainThreadMarker::new() uses the is_main_thread function.

Inspired by the code in async-task here:
https://github.com/smol-rs/async-task/blob/3065c372e1ef1611230195ad7f3aae80ffde8261/src/runnable.rs#L421-L428

@notgull
Copy link
Contributor Author

notgull commented May 19, 2024

iOS failure is not this PR's fault, I think

@madsmtm
Copy link
Owner

madsmtm commented May 19, 2024

Thanks for the PR!

Though did you actually measure this? The implementation of +[NSThread isMainThread] itself is extremely simple, and message sending is quite fast, I'd have suspected thread locals to be much heaver? Is the value that Rust can see through the thread-local, and optimize redundant reads from it? (e.g. if calling is_main_thread() twice, the second call can be elided).

In any case, I suspect there's an even better solution here: pthread_main_np, which +[NSThread isMainThread] itself is based on. The implementation could be something like:

#[cfg(feature = "libc")]
use libc::pthread_main_np;

#[link(name = "c", kind = "dylib")]
#[cfg(not(feature = "libc"))]
extern "C" {
    // Avoid a dependency on `libc` if possible
    fn pthread_main_np() -> std::os::raw::c_int;
}

#[inline]
fn is_main_thread() -> bool {
    // SAFETY: TODO
    unsafe { pthread_main_np() } != 0
}

@notgull
Copy link
Contributor Author

notgull commented May 19, 2024

thread_local is a bit slow but I'd be surprised if it's slower than a dynamic library call.

I haven't measured this, I was going to once I had access to a Darwin machine (or more likely a VM).

@madsmtm
Copy link
Owner

madsmtm commented May 20, 2024

I threw one together:

use criterion::{criterion_group, criterion_main, Criterion};
use objc2_foundation::NSThread;

extern "C" {
    fn pthread_main_np() -> std::os::raw::c_int;
}

pub fn is_main_thread() -> bool {
    unsafe { pthread_main_np() != 0 }
}

pub fn is_main_thread_cached() -> bool {
    std::thread_local! {
        static IS_MAIN_THREAD: bool = is_main_thread();
    }

    IS_MAIN_THREAD
        .try_with(|x| *x)
        .unwrap_or_else(|_| is_main_thread())
}

pub fn is_main_thread_nsthread() -> bool {
    NSThread::isMainThread_class()
}

fn criterion_benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("main");
    group.bench_function("is_main_thread", |b| b.iter(|| is_main_thread()));
    group.bench_function("is_main_thread_cached", |b| {
        b.iter(|| is_main_thread_cached())
    });
    group.bench_function("is_main_thread_nsthread", |b| {
        b.iter(|| is_main_thread_nsthread())
    });
    group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

And got the following result (on a Macbook Pro M2):

main/is_main_thread           time:   [1.6967 ns 1.6992 ns 1.7015 ns]
main/is_main_thread_cached    time:   [2.5315 ns 2.5363 ns 2.5412 ns]
main/is_main_thread_nsthread  time:   [4.2286 ns 4.2408 ns 4.2590 ns]

If I enable LTO, the result is quite different:

main/is_main_thread           time:   [1.6856 ns 1.6886 ns 1.6913 ns]
main/is_main_thread_cached    time:   [461.23 ps 461.97 ps 462.81 ps]
main/is_main_thread_nsthread  time:   [3.7592 ns 3.7614 ns 3.7640 ns]

Not sure what to make of this - is there a way to specialize the implementation based on whether LTO is enabled or not? Are there factors here that I'm not considering?

@notgull
Copy link
Contributor Author

notgull commented May 20, 2024

I don't think there's a way, and even if there was I would classify it as "hacky". I have no idea why this would happen; is there something weird on macOS with LTO/dynamic linking?

I would expect this to be a bug in macOS's LTO, so my way forwards to be to use pthread_main_np and expect that bug to be fixed soon.

@notgull notgull changed the title Optimize is_main_thread by using a thread-local variable Optimize is_main_thread by using a libc call May 20, 2024
@madsmtm
Copy link
Owner

madsmtm commented May 20, 2024

is there something weird on macOS with LTO/dynamic linking?

Hmm, what do you mean? It's faster using LTO, as you'd expect?

@@ -41,7 +41,22 @@ pub fn is_multi_threaded() -> bool {
/// Whether the current thread is the main thread.
#[cfg(feature = "NSThread")]
pub fn is_main_thread() -> bool {
NSThread::isMainThread_class()
#[cfg(feature = "libc")]
use libc::pthread_main_np;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always available on GNUStep, so I suspect you might need to keep using NSThread::isMainThread_class() on feature = "gnustep-1-7" to make CI happy?

@@ -41,7 +41,33 @@ pub fn is_multi_threaded() -> bool {
/// Whether the current thread is the main thread.
#[cfg(feature = "NSThread")]
pub fn is_main_thread() -> bool {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you #[inline] this and the imp functions, so that they can be inlined even without LTO?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, I'm doing merge commits in this repo, so if you could squash, that'd be nice!

Previous, the "is_main_thread" function would always make a call to the
"NSThread::isMainThread_class" function to determine if the current thread is
the main thread. This requires an objc dispatch call every time it is called. As
this function or MainThreadMarker is used frequently throughout the objc2
ecosystem it takes a lot of time.

The NSThread function just calls libc::pthread_main_np under the hood. Benchmarks
show that calling the libc function instead is four times faster. So on platforms
that have it we just call that instead.

Signed-off-by: John Nunley <dev@notgull.net>
@madsmtm madsmtm merged commit c6fd57a into madsmtm:master May 21, 2024
19 checks passed
@madsmtm madsmtm added enhancement New feature or request A-framework Affects the framework crates and the translator for them labels May 21, 2024
@notgull notgull deleted the opt-mtm branch May 21, 2024 13:56
@madsmtm madsmtm mentioned this pull request Jun 2, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-framework Affects the framework crates and the translator for them enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants