RwLock and Mutex on Window theoretically allows undefined behavior in safe code #35836

retep998 · 2016-08-19T21:52:16Z

According to this back and forth on a Microsoft blog post, it is currently undefined behavior to even try to acquire an SRWLock recursively, even recursive read locks.

It might (and probably will) cause the lock to fail to fulfil its contract in the future (e.g,. allow two simultaneous exclusive acquisitions). And since wait nodes are threaded on the stack, it can result in stack memory corruption.

~~Also apparently NT keyed events have no stability guarantee so the current implementation of parking_lot on Windows could theoretically break with a new version of Windows.~~ No longer an issue as parking_lot uses the stable WaitOnAddress on newer Windows.

So uh, what do?

The text was updated successfully, but these errors were encountered:

nagisa · 2016-08-19T21:53:23Z

cc @rust-lang/libs

Amanieu · 2016-08-19T22:13:53Z

I don't see any easy way out of this. Consider this example:

let lock = RwLock::new();
mem::forget(lock.read());
lock.read();

To avoid undefined behavior we would have to either:

Have the RwLock track the thread IDs of all active readers so that we can detect a recursive lock.
Have each thread track the RwLocks that it has locked in thread local storage.
Both of these solutions require O(n) space and a lot of overhead to manage it.

nagisa · 2016-08-19T22:25:39Z

…or document recursive reader lock as UB for now?

retep998 · 2016-08-19T22:27:04Z

@nagisa So we're going to accept, even temporarily, that a safe function in std can invoke UB?

Amanieu · 2016-08-19T22:27:04Z

@nagisa But the code I've just written is entirely safe code. We can't allow UB in safe code.

durka · 2016-08-20T02:44:06Z

Is there not a non-slim R/W lock primitive on Windows?

nagisa · 2016-08-20T10:46:12Z

No, there isn't.

On Aug 20, 2016 5:44 AM, "Alex Burka" notifications@github.com wrote:

Is there not a non-slim R/W lock primitive on Windows?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35836 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AApc0gCFIgk5JgZR7j6tqhtgee6hWulpks5qhmoWgaJpZM4Jo4pj
.

andlabs · 2016-08-20T14:05:21Z

Even though the Windows API might not have non-slim RW locks, it does have loads of other synchronization objects that you can combine to form a recursive RW lock implementation. I wouldn't know what the best ones to use are, sorry.

yilangmok · 2016-08-20T21:29:32Z

Boost.Interprocess provides a very complete mutex implementation that allows recursive lock acquisition and upgradeability. (I am not familiar with Rust, but I'd assume you have some way to call into C/C++ code if you're able to call Windows functions).

http://www.boost.org/doc/libs/1_61_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.sharable_upgradable_mutexes

retep998 · 2016-08-20T22:02:11Z

We are definitely not depending on Boost. We can however look at their implementation to create our own mutex inspired by it.

nagisa · 2016-08-20T23:34:09Z

That page is for inter-process implementations of synchronisation primitives. It has considerably different design constraints compared to inter-thread stuff (documented here). The implementation of RWLocks in Boost seems to use a number (3) of semaphores in addition to some state.

yilangmok · 2016-08-21T00:20:18Z

Thanks for the correction @nagisa. I knew Boost had a RWLock, so I assumed the first page I found would be the right one. Oops!

@retep998, of course it's up to the Rust community to decide what to use or not use. However, I've found that for some areas (including concurrency and cryptography), it's best to use an existing library that has been battle tested. Boost provides primitives that are well tested, performant and cross-platform. Why not consider it?

retep998 · 2016-08-21T00:31:33Z

@yilangmok Because Boost is a massive dependency to pull in. Right now libstd is lightweight and pure Rust. Pulling in Boost would be hugely detrimental to libstd's weight. Also Boost is C++ so it would depend on the C++ runtime and standard library as well. Furthermore Rust is only able to easily work with C APIs, C++ is a massively complicated beast. Basically, I don't see any chances of Rust depending on Boost for something as simple as mutexes.

alexcrichton · 2016-08-23T05:39:36Z

Discussion at @rust-lang/libs triage today concluded that this seems like a good bug to fix, but not high-enough priority for P-high

nagisa · 2016-08-24T20:53:27Z

A nice suggestion from the comment section of the linked blog post:

Why can’t you keep using SRW locks and just keep a separate (possibly thread-local) variable to tell you whether you’ve already acquired the lock or not?

Extending the Windows RWLock structure as thus:

pub struct RWLock { 
    inner: UnsafeCell<c::SRWLOCK>,
    thread_local_reading: DWORD,
}

impl RWLock {
    pub /*const*/ fn new() -> RWLock { // cannot const anymore
        RWLock { 
           inner: UnsafeCell::new(c::SRWLOCK_INIT),
           thread_local_reading: c::TlsAlloc(), // free in drop!
        }
    }
    #[inline]
    pub unsafe fn read(&self) {
        if c::TlsGetValue(self.thread_local_reading) != 0 { panic!("recursive read") }
        TlsSetValue(self.thread_local_reading, 1);
        c::AcquireSRWLockShared(self.inner.get())
    }
    /* similar for try_read */

    #[inline]
    pub unsafe fn read_unlock(&self) {
        TlsSetValue(self.thread_local_reading, 0);
        c::ReleaseSRWLockShared(self.inner.get())
    }
}

would allow to fix the issue. Of course the TLS safeguard could be moved to any of the layers above plain wrapper; idea stays the same.

Alas, this limits the number of locks user could have at the same time to number of available TLS slots, which is 1088.

mwinterb · 2016-08-25T22:37:06Z

Is the license for the source code in David Butenhof's "Programming with POSIX Threads" book known? The "best" information that I could find was from the pthreads-win32 COPYING file which has this to say:

The file tests/rwlock7.c is derived from code written by
Dave Butenhof for his book 'Programming With POSIX(R) Threads'.
The original code was obtained by free download from his website
http://home.earthlink.net/~anneart/family/Threads/source.html
and did not contain a copyright or author notice. It is assumed to
be freely distributable.

Currently, the source is available on the Informit.com website, which appears to be associated with the current publisher:
http://www.informit.com/store/programming-with-posix-threads-9780201633924

If the license is acceptable for use in Rust, I've written a fairly direct translation of his rwlock_t to use the Windows Vista+ primitives. By source examination, as it prefers readers over writers, recursive read locks will succeed, recursive write locks will deadlock, and acquiring a write lock while the same thread holds a read lock will deadlock. The text for the book describes how to convert it to a writer preference, which should make most recursive operations deadlock eventually, but as that text is not monetarily freely available, I have not provided that as an alternative. There should be no memory corruptions during normal operations†.

As I'm not a Rust developer, it's in C (maybe with some accidental C++ constructs). Someone more experienced in Rust could likely translate it, but that task feels like a bad fit for "my first Rust program".

The translation is available here:
https://gist.github.com/mwinterb/fcf29c312950e2c51ffa47822c8c5241

It's likely not the most efficient implementation of a reader/writer mutex as it is 10 years old, but it's O(1) space and has no dynamic memory allocations, so presumably pub const fn new() -> RWLock can remain const.

† Internally, some counters use int, so if there are more than INT_MAX read acquisitions, pending readers, or pending writers, undefined behavior will technically occur in the C implementation.

Amanieu · 2016-08-26T01:26:26Z

@nagisa That could work with my thread_local crate which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the parking_lot crate which provides implementations of Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed in a RFC (rust-lang/rfcs#1632) but it in the end it was not accepted.

nagisa · 2016-08-26T13:18:24Z

On 2016-08-25 18:26:57-0700, Amanieu wrote:

@nagisa That could work with my thread_local crate
which gives you per-object thread-local storage.

However I still think that the proper solution to this is to use the
parking_lot crate which provides implementations of
Mutex, RwLock and Condvar that work on all platforms including Windows XP. This was proposed
in a RFC (rust-lang/rfcs#1632) but it in the end it was not accepted.

I do not disagree. I’m just brainstorming for simple and quick-to-implement fixes which could paper
over immediate issue without us doing any sort of large scale change.

-- You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#35836 (comment)

mattico · 2017-03-12T23:48:42Z

I'm interested in fixing this.

I think the best long-term solution is to use parking_lot in the stdlib. For now though, @nagisa's solution (above) is much better than UB, no? It's very unlikely that anyone is using more than 1,088 TLS + RwLock, and we can at least panic if that happens.

I'll send in a PR to do that unless someone has a better idea.

mattico · 2017-03-13T05:08:13Z

Making RwLock::new() non-const shouldn't be an issue because:

const_fn is unstable so who cares 😄
Everyone should be using sync::RWLock which does not have a const constructor

except panicking.rs in libstd uses the underlying const-fn impl now that StaticRWLock has been removed.

Not an insurmountable problem but there's going to have to be some hackery, like an ad-hoc reimplementation of lazy_static! - which should be in std, IMO.

alexcrichton · 2017-03-13T22:30:47Z

@mattico I think such a solution would limit a process to at most 1088 instances of RwLock, right? If that's so that seems... unfortunately too low :(

mattico · 2017-03-13T23:07:11Z

Oh that's per process, I thought it was per-thread. That might actually be an issue, yes.

mattico · 2017-03-13T23:26:12Z

Maybe allocate an array per thread (how big?) to use as a bit-set for the RwLock flags, and store a pointer to it in TLS. Then add an index into the set to each RwLock. Now we're only using one TLS per thread that has a RwLock, but it's a bit more of a hack 🙁 . I'll write it up and see how bad it is.

Amanieu · 2017-03-13T23:28:20Z

@mattico Have a look at the thread_local crate. It allows you to have per-object TLS without using up TLS indexes.

durka · 2018-02-15T21:29:25Z

What is that supposed to mean? It's either defined or not. Do you mean it won't be tagged I-unsound unless someone demonstrates memory corruption?

RalfJung · 2018-08-07T10:20:29Z

Nobody has been able to actually demonstrate UB in safe code using this yet. It's only theoretical at the moment.

The burden of proof is on the code author to show that their code does not have UB. If the code has UB, it might do anything, including working perfectly well. One cannot show absence of UB by example.

The Windows Mutex implementation is also wrong (lack of proper initialization), and RwLock is UB on macOS as well (and possibly other non-Linux POSIX platforms) because macOS also says recursive locking of an RwLock is UB when a write lock is involved. (Only recursive write and read-write locking though. Making recursive reads UB is just... wtf, what were they thinking?? And the documentation for the relevant function does not even mention this "detail". Wow.)

Oh and the Mutex situation is horrible on all POSIX platforms because there is no const fn way to initialize a sane Mutex. So Mutex initialization is split into two parts in the std::sys layer, and using the "half-initialized" one (which we do in a couple places in libstd) makes recursive locking UB as well. Together with the restrictions about moving things, the API provided by sys::mutex is just plain horrible and I am somewhat nervous about the fact that it might be used incorrectly somewhere in libstd.

I implemented a reentrancy checker for Mutex (which is easier than RwLock) because there are no read locks), and it makes my microbenchmarks slower by ~20%. Not great.

RalfJung · 2018-08-07T10:27:24Z

Oh, and we are also using SRWLock for Mutex on Windows, so it will have the same UB.

We really just need to migrate to parking_lot already.

I have been thinking about implementing a lazily initialized const fn-capable Mutex for POSIX and maybe even rearrange things a bit to get rid of the restriction about not moving a Mutex around.

But, given all the other problems in particular around RwLock (where at least two major platforms just do not have a reasonable platform-specific implementation) -- it seems more and more reasonable to just roll our own implementation. I also hear that parking_lot is faster than the platform implementations even where those have reasonable semantics.

So, how realistic is it to use parking_lot? Alternatively, we might be able to rearrange the abstractions a bit, moving the (un)park stuff to sys (implemented directly on top of the platform APIs; whether mutexes are recursive should not be a concern here) and then implement Mutex, RwLock and Condvar on top of that ourselves... which I guess is what parking_lot does anyway?

Window Mutex: Document that we properly initialize the SRWLock See rust-lang#35836

alexcrichton · 2018-09-20T18:10:17Z

I've opened an internals post to continue more long-form discussion on the topic of continuing to fix these issues.

@Amanieu

Use the parking_lot locking primitives This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations. This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9 Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course. Fixes #35836 Fixes #53127

@Amanieu

Use the parking_lot locking primitives This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations. This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9 Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course. Fixes #35836 Fixes #53127

RalfJung · 2020-02-29T22:32:00Z

I just looked at the SRW lock docs again and now it says explicitly

Exclusive mode SRW locks cannot be acquired recursively. If a thread tries to acquire a lock that it already holds, that attempt will fail (for TryAcquireSRWLockExclusive) or deadlock (for AcquireSRWLockExclusive)

So, uh, did they retroactively define behavior? And if yes, how far back in terms of platform support does this guarantee go?

RalfJung · 2020-05-04T07:11:07Z

Cc @retep998 @mati865 would be good if someone more knowledgeable in Windows matters could decide if there still is a bug here.

We have a direct contradiction between the statements in this post (including @retep998 asking "is it really actually UB" and the answer being "yes") and the docs change added here. The former says

It's a programming error. It is your responsibility as a programmer not to call AcquireSRWLockShared or AcquireSRWLockExclusive from a thread that has already acquired the lock. Failing to comply with this rule will result in undefined behavior.

and the latter now says (and thus presumably guarantees)

Exclusive mode SRW locks cannot be acquired recursively. If a thread tries to acquire a lock that it already holds, that attempt will fail (for TryAcquireSRWLockExclusive) or deadlock (for AcquireSRWLockExclusive)

Looks like Raymond Chen and whoever changed the docs have a different interpretation of what "cannot be acquired recursively" means. The docs version makes total sense; actually counting how often the current thread holds the write lock would require extra state so a simple naive ("slim") RwLock implementation will deadlock on recursive write locks.

I wonder if there is any way to ask Raymond Chen again in light of this new information?

mati865 · 2020-05-04T15:24:07Z

I don't know anyone other than retep998 who could know those parts of Windows API well enough.

ghost · 2020-05-25T20:33:24Z

I wanted a simple mutex that is faster than std::sync::Mutex but didn't want to pull in the whole parking_lot dependency so I wrote this crate: https://github.com/stjepang/simple-mutex

The implementation is fairly simple (it should be reviewable at least). Not quite as efficient as parking_lot but still better than std. The overhead is two words + something allocated on the heap on first case of contention.

I wonder if we should consider implementing a simpler version of parking_lot with reasonable performance and get that into the standard library. Think something with 20% of its complexity and 80% of performance.

RalfJung · 2020-06-11T08:28:39Z

@rustbot ping windows

If any one of you is familiar enough with the details of SRWLocks on Windows, would be great if you could look into the question posed here. :)

rustbot · 2020-06-11T08:28:41Z

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some [instructions] for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3
[instructions]: https://rustc-dev-guide.rust-lang.org/notification-groups/windows.html

cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra @spastorino

kennykerr · 2020-06-11T15:41:22Z

If the lock is already held by the calling thread, AcquireSRWLockExclusive will hang and TryAcquireSRWLockExclusive will return false. Internally, there is a lock bit and this is used to check whether the lock may be acquired irrespective of thread.

m-ou-se · 2020-10-01T11:37:53Z

So this is solved then?

The official documentation is pretty clear now. And @kennykerr verified it. :)

And as @RalfJung already mentioned, making an SRW lock do anything other than deadlock on a recursive lock() call would require more effort. There's not really any space in such a lock to keep track of the thread that locked it.

The only thing left is the contradictory comments on that archived blog post. But those are a bit vague (and a bit aggressive 👀), and feels mostly like miscommunication. The author seems to misunderstand that a deadlock is the outcome we want, as opposed to succesfully recursively locking it: "The fact that they cannot be acquired recursively is what makes them slim!" (They are slim because they don't have to keep track of the thread id, which would mean recursive lock() deadlocks, which is good, that's what we want.) The author also says he isn't sure what Peter means with memory unsafe. There are not many languages where getting a deadlock in this case would be a good thing, so the response seems mostly something like 'just don't do this because it's always wrong, and since you shouldn't do this, we're not making any promises'. And that make sense from a C or C++ perspective, where there's not much reason to specify the difference between a deterministic deadlock and undefined behaviour, as both are unwanted and should be avoided anyway. Things are a bit different for Rust, where the distinction is very important to guard the unsafe/safe border. Luckily they decided to do make this promise after all and update the documentation.

RalfJung · 2020-10-03T10:31:09Z

Ah I did not realize we have a Windows engineer confirming our interpretation of the docs. :)

Indeed, based on all that, I am fine with closing this issue. But I'd leave the final decision to the libs team -- Cc @rust-lang/libs.

RalfJung · 2021-01-25T17:58:43Z

FWIW, there was another comment on the docs issue, that confirms that Rust is using SRWLocks correctly.

alexcrichton added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 20, 2016

alexcrichton added P-medium Medium priority E-help-wanted Call for participation: Help is requested to fix this issue. and removed I-nominated labels Aug 23, 2016

retep998 mentioned this issue Oct 6, 2016

Document that RwLock might panic #37010

Closed

This was referenced Aug 7, 2018

Windows: Mutex not properly initialized #53126

Closed

Window Mutex: Document that we properly initialize the SRWLock #53311

Merged

RalfJung changed the title ~~RwLock on Windows theoretically allows undefined behavior in safe code~~ RwLock and Mutex on Window theoretically allows undefined behavior in safe code Aug 13, 2018

kennytm added a commit to kennytm/rust that referenced this issue Aug 24, 2018

Rollup merge of rust-lang#53311 - RalfJung:windows-mutex, r=retep998

ab5a71b

Window Mutex: Document that we properly initialize the SRWLock See rust-lang#35836

RalfJung mentioned this issue Aug 29, 2018

Unix: RwLock is incorrect #53127

Closed

retep998 mentioned this issue Jan 2, 2019

Use the parking_lot locking primitives #56410

Closed

Centril added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Jan 2, 2019

RalfJung mentioned this issue May 4, 2020

Slim Reader Writer locks are not reentrant MicrosoftDocs/win32#218

Merged

Elinvynia added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020

LeSeulArtichaut removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 9, 2020

rustbot added the O-windows Operating system: Windows label Jun 11, 2020

Amanieu closed this as completed Oct 3, 2020

RwLock and Mutex on Window theoretically allows undefined behavior in safe code #35836

RwLock and Mutex on Window theoretically allows undefined behavior in safe code #35836

Comments

retep998 commented Aug 19, 2016 • edited Loading

nagisa commented Aug 19, 2016

Amanieu commented Aug 19, 2016

nagisa commented Aug 19, 2016

retep998 commented Aug 19, 2016 • edited Loading

Amanieu commented Aug 19, 2016

durka commented Aug 20, 2016

nagisa commented Aug 20, 2016

andlabs commented Aug 20, 2016 • edited Loading

yilangmok commented Aug 20, 2016

retep998 commented Aug 20, 2016

nagisa commented Aug 20, 2016

yilangmok commented Aug 21, 2016

retep998 commented Aug 21, 2016 • edited Loading

alexcrichton commented Aug 23, 2016

nagisa commented Aug 24, 2016

mwinterb commented Aug 25, 2016

Amanieu commented Aug 26, 2016

nagisa commented Aug 26, 2016

mattico commented Mar 12, 2017 • edited Loading

mattico commented Mar 13, 2017

alexcrichton commented Mar 13, 2017

mattico commented Mar 13, 2017

mattico commented Mar 13, 2017

Amanieu commented Mar 13, 2017

durka commented Feb 15, 2018

RalfJung commented Aug 7, 2018 • edited Loading

RalfJung commented Aug 7, 2018 • edited Loading

alexcrichton commented Sep 20, 2018

RalfJung commented Feb 29, 2020

RalfJung commented May 4, 2020

mati865 commented May 4, 2020

ghost commented May 25, 2020

RalfJung commented Jun 11, 2020

rustbot commented Jun 11, 2020

kennykerr commented Jun 11, 2020

m-ou-se commented Oct 1, 2020 • edited Loading

RalfJung commented Oct 3, 2020

RalfJung commented Jan 25, 2021

retep998 commented Aug 19, 2016 •

edited

Loading

retep998 commented Aug 19, 2016 •

edited

Loading

andlabs commented Aug 20, 2016 •

edited

Loading

retep998 commented Aug 21, 2016 •

edited

Loading

mattico commented Mar 12, 2017 •

edited

Loading

RalfJung commented Aug 7, 2018 •

edited

Loading

RalfJung commented Aug 7, 2018 •

edited

Loading

m-ou-se commented Oct 1, 2020 •

edited

Loading