Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace uses of "static mut" with UnsafeCell #26550

Closed
jdm opened this issue May 17, 2020 · 2 comments
Closed

Replace uses of "static mut" with UnsafeCell #26550

jdm opened this issue May 17, 2020 · 2 comments

Comments

@jdm
Copy link
Member

@jdm jdm commented May 17, 2020

Given rust-lang/rust#53639, it seems safer to rely on tools that won't let us accidentally alias mutable data like UnsafeCell.

  • components/background_hang_monitor/sampler_linux.rs
  • components/profile_traits/energy.rs
  • components/script/dom/eventtarget.rs
  • components/script/dom/bindings/codegen/CodegenRust.py
  • components/profile/heartbeats.rs
@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented May 18, 2020

UnsafeCell is !Sync, so it can’t be used in a static directly. The most direct port would be to make a type that wraps UnsafeCell and implements Sync, but this transition seems to be a good time to reexamine the soundness of that unsafe code. It looks like some of these uses could be AtomicPtr.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented May 26, 2020

I did the examining. Here is what I found, in increasing levels of worrisome. TL;DR: see the parts in bold.

components/script/dom/eventtarget.rs

No mutability there, using const instead of static mut just works. (Using static complains that *const c_char: !Sync.)

components/script/dom/bindings/codegen/CodegenRust.py

This generates code for "proxy handlers", pointers that are initialized once during startup in the script::init::init function, and then never modified again. They are read in generated bindings code. Two options here to remove static mut:

  • Use AtomicPtr in static. This would require no unsafe code, but would theoretically have run-time cost in bindings code because previously-unsynchronized reads are now atomic. I don’t have a good sense of how hot this code is, or how big/small the overhead is for uncontended atomic operations. Does making reads of proxy handlers atomic in DOM bindings sound acceptable?

  • To keep those reads unsynchronized, we could make a type that wraps UnsafeCell and implements Sync, to use it in a static. Its APIs would be unsafe fn and require users to ensure external synchronization. In this case, script::init::init must not run concurrently with bindings code. Note that we already have a code comment in that vein:

    // Important that this call is done in a single-threaded fashion, we
    // can't defer it after `create_constellation` has started.
    let js_engine_setup = if !opts.multiprocess {
    Some(script::init())

Either way, if somehow we fail to run initialization code before bindings code, we’ll end up giving null pointers to SpiderMonkey for proxy handlers. I don’t know if SpiderMonkey checks for null (possibly to safely crash) or if this would be UB.

components/background_hang_monitor/sampler_linux.rs

The static mut contains a SharedState struct with one AtomicPtr field (easy enough to move to a static) and three Option<PosixSemaphore> fields where PosixSemaphore wraps UnsafeCell<libc::sem_t>.

My understanding is that libc::sem_t is thread-safe after it’s initialized, but here the code will repeatedly (re)initialize the semaphores and assign the options to Some(…), and later assign None which runs a destructor and cleans up/destroys the semaphores.

I don’t fully understand this code, but it appears to be sound if rather fragile. The idea is that there’s one sampler thread that conceptually owns the shared state. Nothing else touches that state without synchronizing with that thread (through the semaphores). This is what stops data races (if all of the code involved is perfectly bug-free). Sampled threads only ever touch the shared state in SIGPROF signal handler. That signal is sent by the sampler thread, which waits until the relevant work is finished before it collects data and moves on to the next sampled thread.

Short of redesigning the whole thing, we can replace static mut with static and, again, a type that wraps UnsafeCell and implements Sync, with unsafe fn APIs that rely on external synchronization.

Is there anyone still around that understands this code better than I do? It would be nice to add code comments there to explain the safety reasoning. (Or have I just become the designated Abyss Domain Expert?)

components/profile_traits/energy.rs

The bottom line is that we have a shared global “instance” of the energymon C library, but I don’t know whether that library tries to be thread-safe. https://github.com/energymon/energymon/blob/master/README.md does not contain the word “thread”. The Rust bindings methods take &self rather &mut self, which is is slightly encouraging.

If we assume that it is thread-safe (which the code effectively already does), it seems most correct for the bindings to implement Sync and Send on their wrapper struct. Or we make our own struct that wraps that wrapper, and implement the traits there.

Then we can replace static mut + manual std::sync::Once with static + once_cell::sync::Lazy, which is essentially a macro-less lazy_static!. Mutability is only used for run-time initialization.

components/profile/heartbeats.rs

As far as I can tell, we may have data races here. There’s a shared global hash map of instances of a the heartbeats C library, protected by a by an ad-hoc AtomicBool-based spinlock / mutex. It seems easy enough to move to an actual Mutex in a static. The thread-safety of the library itself is again a question, and again assuming it’s fine the Rust bindings should implement Send or we should in a new wrapper type. (Mutex<T>: Sync + Send if T: Send, but this does not requires T: Sync.)

The worrying part is that the C library takes a function pointer for a callback. In that callback we access and mutate the shared global data without taking the lock (the AtomicBool-based spinlock).

In the original implementation of this code in PR #7179 there wasn’t a Rust spinlock. The callback’s doc comment stated:

When this is called from native C, the heartbeat is safely locked

What does this mean? Does the hearbeats C library maintain its own lock? By the way, what threads does it call the callback from? Does it create its own threads? None of this is discussed in https://github.com/libheartbeats/heartbeats-simple/blob/master/README.md or https://github.com/libheartbeats/heartbeats-simple/blob/0aa634e/inc/heartbeat.h#L44-L60. Even if the C library does its own locking, that doesn’t help synchronizing with Rust code outside of that callback that mutates that same global hash map.

PR #15458 added the spinlock and changed the doc-comment to:

When this is called from native C, the heartbeat is safely locked internally and the global lock is held. If calling from this file, you must already hold the global lock!

Assuming “the global lock” refers to the Rust spinlock, it is indeed held maybe_create_heartbeat calls Heartbeat::new. However the C library can call the callback later, after the lock has been released.

Would locking a Rust Mutex from the callback be appropriate? Could it lead to dealocks?

bors-servo added a commit that referenced this issue Jun 4, 2020
Remove support for energy and heartbeats profiling

Both are disabled by default (energy at compile-time, heartbeats with a run-time option). Neither is tested of CI. Neither has been used in a long time. They might have Undefined Behavior: #26550 (comment). They each depend on a mostly-unmaintained C library. The thread-safety expectation of those libraries are unknown.
@bors-servo bors-servo closed this in 00b57b4 Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.