-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments say that SeekableHttpReaderEngine can deadlock #50
Comments
Ha. I considered trying to model this in the type system before declaring 1.0, on the assumption that I'd have got something wrong here. I should have! The comment is at least wrong. However a nested mutex probably won't solve things. Usually we want to claim I'll do some more thinking about this later! |
One option may be to put an Anyway, let me know your thoughts. If I get a chance, I'll have a deeper look at this code and see if I can understand it deeply enough to refactor. |
I'm working on it. |
OK I've convinced myself that it's fine. Let me try to persuade you. There is effectively one main mutex, the Guarded by the There are two places where the
So, although you're right that the Effectively The comments are still definitely wrong though! |
Ah, that does make sense, and the PR looks like a better approach! |
We hit some deadlock in this area of code recently. Our server was having issues and returning incomplete responses every so often. Ended up with a few threads stuck here. We fixed our webserver and I've been trying to reproduce it locally, I think what happened is the http error is returned here but the reader isn't put back into state so the other threads waiting to read are deadlocked. Just tricky crashing it on purpose with the readers waiting and returning an error on the other thread. If I can get a reproduction going I'd probably just check if it's an error, return the reader to the state and then return the error. Could try to put the reader into a mutex somehow that always fails-safe when it loses scope but sounds like more work for a bit better error handling. |
Ah, good spot! Yes I'll definitely improve handling of the error here so that other threads don't get stuck. Work started here: #70 |
ripunzip/src/unzip/seekable_http_reader.rs
Lines 254 to 265 in 5103ab0
Yet
I expect that the "invariant" was written to avoid this deadlock? It's a little too strong -- all you need is to always claim the mutexes in the same order. Of course, it's only strong if it's followed :)
The way this is written also looks like it's trying to accomplish "critical sections" -- mutual exclusion of pieces of code. The Rust Mutex isn't designed for that, and thus relies on some head-scratching by the developer. In this case, I think the "only one thread" assertion is saying that only one thread holds the STATE mutex, and waits on the condvar (releasing but then re-acquiring the mutex), then enters its own read-in-progress block. So that's equivalent to saying that "this deadlock" can't happen because only one thread can hold the STATE mutex. Which is tautological and doesn't imply the deadlock cannot happen.
One way to enforce an order-of-acquisition invariant in the type system is to put one mutex inside the other, e.g.,
That particular way of writing it is not very ergonomic, but maybe introducing some additional types to model the situation as protecting data instead of critical sections would help.
I can take a crack at trying to refactor this to be more correct, if I haven't missed some major piece of information in the above.
The text was updated successfully, but these errors were encountered: