-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime] Thread safety for weak references #1454
Conversation
let iterations = 200_000 | ||
for i in 1...iterations { | ||
let box = WeakBox(Thing()) | ||
dispatch_async(q) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have race test infrastructure (stdlib/private/StdlibUnittest/RaceTest.swift) that makes it possible to write such tests in a cross-platform way. Could you use that instead of dispatch?
@rjmccall What are the restrictions on the uses of the Take calls? (The issues I observed seemed to bein Why can you assume that assignment has exclusive access? Aren't those calls used to assign to a pre-existing var? One couldn't assume exclusive access of a |
The problem is due to not knowing the previous state of |
The only |
In general, Swift does not guarantee correctness in the presence of read/write, write/write, or anything/destroy data races on a variable. It is one of the few undefined behavior rules that we embrace. This applies to weak references just as much as it applies to strong references. Eventually, we want to provide a language concurrency model that makes races impossible outside of explicitly unsafe code. For now, it's the user's responsibility to ensure that these do not occur by properly synchronizing modifications of variables. That's not to say that you aren't fixing a real bug. It's not supposed to be a race condition for two threads to simultaneously read from a weak reference, but right now it is because of the way that reads can clear the reference when they detect that the object is undergoing deallocation. That's something that needs to be fixed. |
It's not easy to fix this particular race, however, because it isn't safe for other threads to access the referent at all if a thread is decrementing the unowned reference count, but other readers will naturally read the reference and try to increment the reference count of the referent. We basically need to have some way to tell other threads that there are reads in progress. One idea floated for fixing that is to use an activity count; please refer to the discussion thread with Mike Ash on swift-dev starting on 12/10. I'm not sure what the current status of his fix is, though. |
_ = box.weak | ||
} | ||
dispatch_async(q) { | ||
_ = box.weak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also want to use _blackHole()
from StdlibUnittest
to ensure that the compiler does not eliminate the unused weak load.
The implementation of unknownWeak that I'm referring to is the non-trivial implementation in SwiftObject.mm. This is only an issue for platforms with ObjC compatibility requirements. |
I see; I forgot to search the *.mm files. (I always do.) |
I had thought about something like the activity count, but assumed the WeakReference struct's ABI was set in stone. Is it changeable? The CAS solution as I implemented is not perfect, but it's better, as the window of opportunity for badness is much smaller, without being too onerous. |
The ABI is not set in stone yet. |
It's probably worth doing, then. |
Here's another attempt; it's a rather simple spinlock approach. On thinking about options that expand the WeakReference struct, I can't think of a way to implement it as a lock-free algorithm. If the read-read race is to be resolved with a read-write lock (which is what Mike Ash's activity count suggestion looks like to me), I'm not convinced the extra complexity over a spinlock is warranted (please advise). This being said, the tests that I called "direct capture" now complete without crashing, but the "class instance property" ones do not. When this occurs the crashed thread has attempted to dereference the spinlock illegal pointer, but the call stack does not contain any of the modified functions; the shared weak reference must be getting read as thread-local by error, but I don't know where at the moment. |
We need a lock that can yield to the scheduler, since iOS's scheduler doesn't guarantee preemption. If a background thread took the lock, a spinlock will deadlock any higher-priority threads that contend it. |
Agreed; however, I'm unsure of the appropriate way to yield (I hope there is one.) |
If the strong and weak counts could both be CAS'd at once, I think this combination would successfully be lock-free. |
We might be able to pull that off on 64-bit platforms, yeah. On 32-bit, the object header refcounts are only 32-bit aligned, and I don't think either i386 or armv7 can do an unaligned 64-bit atomic cmpxchg. We could align the refcounts, at the cost of four bytes per object, maybe. |
I don't understand how a CAS on WeakReference.Value can eliminate the nulling-out race. Could you spell that one out? There's a tempting solution that looks like that, but it is not actually correct because simply having loaded the reference does not guarantee its validity. You basically need transaction memory for anything like that to work. Other than this weak-reference-nulling issue, there is no way to validly get a race between an unowned-retain and the final unowned-release that triggers deallocation. You cannot perform an unowned retain without some reason to think that the referent is valid. For example, unowned reference variables maintain an invariant of owning an unowned-retain of their referent. As long as that invariant holds, it's safe to load/copy from the variable because the unowned reference count is always at least 1. Anything that would race with the load/copy would have to be a change to the value of the variable (or destroying it), or in other words, a store; and a load/store race is undefined behavior. |
You're right, in both cases I forgot that the pointer copied from Value becomes untrustworthy on the very next clock cycle if there's a race. So the lock-free thoughts go out the window. As for racing between a retain and a release,: I was thinking of a situation that starts with one weak reference, visible to 2 threads; strong count is 1. Thread A gets |
440108b
to
9a42cec
Compare
New version: uses 2 bits from |
@gparker42 Is |
Maybe. I don't know what the kernel folks can promise, but I have been unable to make simple tests of |
auto ptr = __atomic_fetch_or(&ref->Value, WR_READING, __ATOMIC_RELAXED); | ||
while (ptr & WR_READING) { | ||
short c = 0; | ||
while (ref->Value & WR_READING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not guaranteed to actually reload the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to not hit the cache line too often. See line 730: an atomic_fetch occurs after line 724 returned true. If it's preferable to just repeatedly call atomic_fetch, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to at least do something to prevent the compiler from folding the repeated loads into a single one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My (weak) circumstantial evidence was that the benchmarked spin time scales with the counter limit at line 725, though I haven't looked at the generated assembly. Given that this depends on the current abilities of the optimizer and we agree that it isn't a high contention case, would you think it's justified to simply do atomic_fetch repeatedly? Otherwise I'm not entirely sure what would ensure repeated loads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated atomic_fetch would be fine. You're doing a relaxed load, so this really should just compile down to an ordinary load; it's just that the compiler will not be able to combine the loads.
9a42cec
to
a85e4fa
Compare
a85e4fa
to
fa0078e
Compare
fa0078e
to
8fdd8d7
Compare
@swift-ci Please test. |
That test failure isn't this patch's fault. |
should I squash the later changes? |
If you wouldn't mind. |
|
||
WR_NATIVEMASK = WR_NATIVE | swift::heap_object_abi::ObjCReservedBitsMask, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a static_assert that WR_READING doesn't interfere with normal pointer bits, i.e. that WR_READING < alignof(void*).
Sorry about the long delay reviewing this, but with that static_assert and the squash, this should be good for 3.0. |
It has been fairly easy to cause the runtime to crash on multithreaded read-read access to weak references (e.g. https://bugs.swift.org/browse/SR-192). Although weak references are value types, they can get elevated to the heap in multiple ways, such as when captured by a closure or when used as a property in a class object instance. In such cases, race conditions involving weak references could cause the runtime to perform to multiple decrement operations of the unowned reference count for a single increment; this eventually causes early deallocation, leading to use-after-free, modify-after-free and double-free errors. This commit changes the weak reference operations to use a spinlock rather than assuming thread-exclusive access, when appropriate. With this change, the crasher discussed in SR-192 no longer encounters crashes due to modify-after-free or double-free errors.
Dispatch-based tests exist because (on OS X) they are more likely to encounter the race condition than `StdlibUnitTest`'s `runRaceTest()` is.
8fdd8d7
to
c87309e
Compare
Added the static_assert and squashed. |
I do think we should bump weak references out to occupy two words for ABI stability, but we don't need to do that in this patch. Committing. |
Thanks for taking care of this! |
Thanks! |
Thanks for taking this on @glessard, great to see this fixed! |
@jckarter I got lots of help from you guys! |
[pull] swiftwasm from master
It has been fairly easy to cause the runtime to crash on multithreaded accesses to weak references (e.g. SR-192). Although weak references are value types, they can get elevated to the heap in multiple ways, such as closure capture and property use in a class object. In such cases, race conditions involving weak references could cause the runtime to perform to multiple decrement operations of the unowned reference count for a single increment; this eventually caused early deallocation, leading to use-after-free, modify-after-free and double-free errors.
This commit changes the weak reference operations to use atomic operations rather than a thread-local logic. In particular, when the weak reference needs to be nulled, it is not done unconditionally, but via an atomic_compare_exchange operation, with the release operation only performed on success in order to avoid duplicated decrement operations. The assign operations assume the destination may be visible to multiple threads; the init operations assume the destination is local to the current thread. In all cases it is assumed that the source may be visible to multiple threads.
With this change, the crasher discussed in SR-192 no longer encounters modify-after-free or double-free crashes.