Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RwLock and Mutex on Window theoretically allows undefined behavior in safe code #35836

Open
retep998 opened this Issue Aug 19, 2016 · 31 comments

Comments

Projects
None yet
@retep998
Copy link
Member

retep998 commented Aug 19, 2016

According to this back and forth on a Microsoft blog post, it is currently undefined behavior to even try to acquire an SRWLock recursively, even recursive read locks.

It might (and probably will) cause the lock to fail to fulfil its contract in the future (e.g,. allow two simultaneous exclusive acquisitions). And since wait nodes are threaded on the stack, it can result in stack memory corruption.

Also apparently NT keyed events have no stability guarantee so the current implementation of parking_lot on Windows could theoretically break with a new version of Windows. No longer an issue as parking_lot uses the stable WaitOnAddress on newer Windows.

So uh, what do?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 19, 2016

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Aug 19, 2016

I don't see any easy way out of this. Consider this example:

let lock = RwLock::new();
mem::forget(lock.read());
lock.read();

To avoid undefined behavior we would have to either:

  • Have the RwLock track the thread IDs of all active readers so that we can detect a recursive lock.
  • Have each thread track the RwLocks that it has locked in thread local storage.
    Both of these solutions require O(n) space and a lot of overhead to manage it.
@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 19, 2016

…or document recursive reader lock as UB for now?

@retep998

This comment has been minimized.

Copy link
Member Author

retep998 commented Aug 19, 2016

@nagisa So we're going to accept, even temporarily, that a safe function in std can invoke UB?

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Aug 19, 2016

@nagisa But the code I've just written is entirely safe code. We can't allow UB in safe code.

@durka

This comment has been minimized.

Copy link
Contributor

durka commented Aug 20, 2016

Is there not a non-slim R/W lock primitive on Windows?

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 20, 2016

No, there isn't.

On Aug 20, 2016 5:44 AM, "Alex Burka" notifications@github.com wrote:

Is there not a non-slim R/W lock primitive on Windows?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35836 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AApc0gCFIgk5JgZR7j6tqhtgee6hWulpks5qhmoWgaJpZM4Jo4pj
.

@andlabs

This comment has been minimized.

Copy link

andlabs commented Aug 20, 2016

Even though the Windows API might not have non-slim RW locks, it does have loads of other synchronization objects that you can combine to form a recursive RW lock implementation. I wouldn't know what the best ones to use are, sorry.

@yilangmok

This comment has been minimized.

Copy link

yilangmok commented Aug 20, 2016

Boost.Interprocess provides a very complete mutex implementation that allows recursive lock acquisition and upgradeability. (I am not familiar with Rust, but I'd assume you have some way to call into C/C++ code if you're able to call Windows functions).

http://www.boost.org/doc/libs/1_61_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.sharable_upgradable_mutexes

@retep998

This comment has been minimized.

Copy link
Member Author

retep998 commented Aug 20, 2016

We are definitely not depending on Boost. We can however look at their implementation to create our own mutex inspired by it.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 20, 2016

That page is for inter-process implementations of synchronisation primitives. It has considerably different design constraints compared to inter-thread stuff (documented here). The implementation of RWLocks in Boost seems to use a number (3) of semaphores in addition to some state.

@yilangmok

This comment has been minimized.

Copy link

yilangmok commented Aug 21, 2016

Thanks for the correction @nagisa. I knew Boost had a RWLock, so I assumed the first page I found would be the right one. Oops!

@retep998, of course it's up to the Rust community to decide what to use or not use. However, I've found that for some areas (including concurrency and cryptography), it's best to use an existing library that has been battle tested. Boost provides primitives that are well tested, performant and cross-platform. Why not consider it?

@retep998

This comment has been minimized.

Copy link
Member Author

retep998 commented Aug 21, 2016

@yilangmok Because Boost is a massive dependency to pull in. Right now libstd is lightweight and pure Rust. Pulling in Boost would be hugely detrimental to libstd's weight. Also Boost is C++ so it would depend on the C++ runtime and standard library as well. Furthermore Rust is only able to easily work with C APIs, C++ is a massively complicated beast. Basically, I don't see any chances of Rust depending on Boost for something as simple as mutexes.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Aug 23, 2016

Discussion at @rust-lang/libs triage today concluded that this seems like a good bug to fix, but not high-enough priority for P-high

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 24, 2016

A nice suggestion from the comment section of the linked blog post:

Why can’t you keep using SRW locks and just keep a separate (possibly thread-local) variable to tell you whether you’ve already acquired the lock or not?

Extending the Windows RWLock structure as thus:

pub struct RWLock { 
    inner: UnsafeCell<c::SRWLOCK>,
    thread_local_reading: DWORD,
}

impl RWLock {
    pub /*const*/ fn new() -> RWLock { // cannot const anymore
        RWLock { 
           inner: UnsafeCell::new(c::SRWLOCK_INIT),
           thread_local_reading: c::TlsAlloc(), // free in drop!
        }
    }
    #[inline]
    pub unsafe fn read(&self) {
        if c::TlsGetValue(self.thread_local_reading) != 0 { panic!("recursive read") }
        TlsSetValue(self.thread_local_reading, 1);
        c::AcquireSRWLockShared(self.inner.get())
    }
    /* similar for try_read */

    #[inline]
    pub unsafe fn read_unlock(&self) {
        TlsSetValue(self.thread_local_reading, 0);
        c::ReleaseSRWLockShared(self.inner.get())
    }
}

would allow to fix the issue. Of course the TLS safeguard could be moved to any of the layers above plain wrapper; idea stays the same.

Alas, this limits the number of locks user could have at the same time to number of available TLS slots, which is 1088.

@mwinterb

This comment has been minimized.

Copy link

mwinterb commented Aug 25, 2016

Is the license for the source code in David Butenhof's "Programming with POSIX Threads" book known? The "best" information that I could find was from the pthreads-win32 COPYING file which has this to say:

The file tests/rwlock7.c is derived from code written by
Dave Butenhof for his book 'Programming With POSIX(R) Threads'.
The original code was obtained by free download from his website
http://home.earthlink.net/~anneart/family/Threads/source.html
and did not contain a copyright or author notice. It is assumed to
be freely distributable.

Currently, the source is available on the Informit.com website, which appears to be associated with the current publisher:
http://www.informit.com/store/programming-with-posix-threads-9780201633924

If the license is acceptable for use in Rust, I've written a fairly direct translation of his rwlock_t to use the Windows Vista+ primitives. By source examination, as it prefers readers over writers, recursive read locks will succeed, recursive write locks will deadlock, and acquiring a write lock while the same thread holds a read lock will deadlock. The text for the book describes how to convert it to a writer preference, which should make most recursive operations deadlock eventually, but as that text is not monetarily freely available, I have not provided that as an alternative. There should be no memory corruptions during normal operations†.

As I'm not a Rust developer, it's in C (maybe with some accidental C++ constructs). Someone more experienced in Rust could likely translate it, but that task feels like a bad fit for "my first Rust program".

The translation is available here:
https://gist.github.com/mwinterb/fcf29c312950e2c51ffa47822c8c5241

It's likely not the most efficient implementation of a reader/writer mutex as it is 10 years old, but it's O(1) space and has no dynamic memory allocations, so presumably pub const fn new() -> RWLock can remain const.

† Internally, some counters use int, so if there are more than INT_MAX read acquisitions, pending readers, or pending writers, undefined behavior will technically occur in the C implementation.

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Aug 26, 2016

@nagisa That could work with my thread_local crate which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the parking_lot crate which provides implementations of Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed in a RFC (rust-lang/rfcs#1632) but it in the end it was not accepted.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 26, 2016

On 2016-08-25 18:26:57-0700, Amanieu wrote:

@nagisa That could work with my thread_local crate
which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the
parking_lot crate which provides implementations of
Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed
in a RFC (rust-lang/rfcs#1632) but it in the end it was not accepted.

I do not disagree. I’m just brainstorming for simple and quick-to-implement fixes which could paper
over immediate issue without us doing any sort of large scale change.

-- You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#35836 (comment)

@mattico

This comment has been minimized.

Copy link
Contributor

mattico commented Mar 12, 2017

I'm interested in fixing this.

I think the best long-term solution is to use parking_lot in the stdlib. For now though, @nagisa's solution (above) is much better than UB, no? It's very unlikely that anyone is using more than 1,088 TLS + RwLock, and we can at least panic if that happens.

I'll send in a PR to do that unless someone has a better idea.

@mattico

This comment has been minimized.

Copy link
Contributor

mattico commented Mar 13, 2017

Making RwLock::new() non-const shouldn't be an issue because:

  1. const_fn is unstable so who cares 😄
  2. Everyone should be using sync::RWLock which does not have a const constructor

except panicking.rs in libstd uses the underlying const-fn impl now that StaticRWLock has been removed.

Not an insurmountable problem but there's going to have to be some hackery, like an ad-hoc reimplementation of lazy_static! - which should be in std, IMO.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Mar 13, 2017

@mattico I think such a solution would limit a process to at most 1088 instances of RwLock, right? If that's so that seems... unfortunately too low :(

@mattico

This comment has been minimized.

Copy link
Contributor

mattico commented Mar 13, 2017

Oh that's per process, I thought it was per-thread. That might actually be an issue, yes.

@mattico

This comment has been minimized.

Copy link
Contributor

mattico commented Mar 13, 2017

Maybe allocate an array per thread (how big?) to use as a bit-set for the RwLock flags, and store a pointer to it in TLS. Then add an index into the set to each RwLock. Now we're only using one TLS per thread that has a RwLock, but it's a bit more of a hack 🙁 . I'll write it up and see how bad it is.

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Mar 13, 2017

@mattico Have a look at the thread_local crate. It allows you to have per-object TLS without using up TLS indexes.

@mattico

This comment has been minimized.

Copy link
Contributor

mattico commented Mar 14, 2017

It might be better to handle this in the sync::RwLock impl instead of the windows platform implementation. It allows folks who know what they're doing (like std) to continue to use the unsafe API. This issue is similar to lock poisoning which we already deal with at this level. It is unfortunate to introduce platform-specific code into sync, though.

@CodesInChaos

This comment has been minimized.

Copy link

CodesInChaos commented Feb 15, 2018

Shouldn't this be tagged as a soundness issue since this it allows UB in safe code?

@retep998

This comment has been minimized.

Copy link
Member Author

retep998 commented Feb 15, 2018

Nobody has been able to actually demonstrate UB in safe code using this yet. It's only theoretical at the moment.

We really just need to migrate to parking_lot already.

@durka

This comment has been minimized.

Copy link
Contributor

durka commented Feb 15, 2018

What is that supposed to mean? It's either defined or not. Do you mean it won't be tagged I-unsound unless someone demonstrates memory corruption?

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 7, 2018

Nobody has been able to actually demonstrate UB in safe code using this yet. It's only theoretical at the moment.

The burden of proof is on the code author to show that their code does not have UB. If the code has UB, it might do anything, including working perfectly well. One cannot show absence of UB by example.


The Windows Mutex implementation is also wrong (lack of proper initialization), and RwLock is UB on macOS as well (and possibly other non-Linux POSIX platforms) because macOS also says recursive locking of an RwLock is UB when a write lock is involved. (Only recursive write and read-write locking though. Making recursive reads UB is just... wtf, what were they thinking?? And the documentation for the relevant function does not even mention this "detail". Wow.)

Oh and the Mutex situation is horrible on all POSIX platforms because there is no const fn way to initialize a sane Mutex. So Mutex initialization is split into two parts in the std::sys layer, and using the "half-initialized" one (which we do in a couple places in libstd) makes recursive locking UB as well. Together with the restrictions about moving things, the API provided by sys::mutex is just plain horrible and I am somewhat nervous about the fact that it might be used incorrectly somewhere in libstd.

I implemented a reentrancy checker for Mutex (which is easier than RwLock) because there are no read locks), and it makes my microbenchmarks slower by ~20%. Not great.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 7, 2018

Oh, and we are also using SRWLock for Mutex on Windows, so it will have the same UB.


We really just need to migrate to parking_lot already.

I have been thinking about implementing a lazily initialized const fn-capable Mutex for POSIX and maybe even rearrange things a bit to get rid of the restriction about not moving a Mutex around.

But, given all the other problems in particular around RwLock (where at least two major platforms just do not have a reasonable platform-specific implementation) -- it seems more and more reasonable to just roll our own implementation. I also hear that parking_lot is faster than the platform implementations even where those have reasonable semantics.

So, how realistic is it to use parking_lot? Alternatively, we might be able to rearrange the abstractions a bit, moving the (un)park stuff to sys (implemented directly on top of the platform APIs; whether mutexes are recursive should not be a concern here) and then implement Mutex, RwLock and Condvar on top of that ourselves... which I guess is what parking_lot does anyway?

@RalfJung RalfJung changed the title RwLock on Windows theoretically allows undefined behavior in safe code RwLock and Mutex on Window theoretically allows undefined behavior in safe code Aug 13, 2018

bors added a commit that referenced this issue Aug 23, 2018

Auto merge of #53311 - RalfJung:windows-mutex, r=retep998
Window Mutex: Document that we properly initialize the SRWLock

See #35836

kennytm added a commit to kennytm/rust that referenced this issue Aug 24, 2018

Rollup merge of rust-lang#53311 - RalfJung:windows-mutex, r=retep998
Window Mutex: Document that we properly initialize the SRWLock

See rust-lang#35836
@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Sep 20, 2018

I've opened an internals post to continue more long-form discussion on the topic of continuing to fix these issues.

bors added a commit that referenced this issue Apr 3, 2019

Auto merge of #56410 - faern:add-parking-lot, r=<try>
Use the parking_lot locking primitives

This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations.

This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9

Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course.

Fixes #35836
Fixes #53127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.