-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure panic safety #24
Comments
Hmm. Ordinarily, I'm not a big fan of trying to be panic safe, and would rather that we just "tear it down". But I suppose that in the case of salsa -- since we are controlling all the mutable state -- so long as we propagate the panic cleanly, we can be sure that the system is in a consistent state afterwards. (And, if not, then there is some hidden state that would probably mess up the increment system anyway.) |
So I think what we need to do here... The current strategy when executing a query is that we lock the map and we swap in a placeholder during execution. That is done here, and the value Lines 248 to 264 in 4a6a626
Under ordinary circumstances, we eventually invoke Lines 456 to 466 in 4a6a626
You can see that this function also has the job of waking up other threads that may be blocked on us. The danger is that a panic could occur and we would wind up with the placeholder being left in there. That is bad for many reasons: future calls from this thread will error out as if a cycle occurred, and other threads that are blocked waiting for a message will hang indefinitely. I think that what we want to do is to install an "RAII guard" right here, basically after the Line 265 in 4a6a626 This guard would hold on to a reference to the key and looks something like this: struct PanicGuard<'me, DB, Q> {
map: &'me RwLock<FxHashMap<Q::Key, QueryState<DB, Q>>>,
key: &'me Q::Key,
}
impl<'me, DB, Q> Drop for PanicGuard<'me, DB, Q> {..} the fn overwrite_placeholder(
&self,
runtime: &Runtime<DB>,
descriptor: &DB::QueryDescriptor,
key: &Q::Key,
memo: Memo<DB, Q>,
new_value: &StampedValue<Q::Value>,
panic_guard: PanicGuard<'_, DB, Q>,
) {
// No panic occurred, do not run the panic-guard destructor:
std::mem::forget(panic_guard);
// ... as before ...
} If a panic does occur, then I think we probably just want to acquire the write lock on the Then we need some tests. =) |
There is a Zulip thread dedicated to this issue. |
So location of this test case may belong in a new test file within parallel since it's not required the threads be true_parallel or race. Both threads need to Ideally Line 120 in 4a6a626
Once it panics, it should What I could use a pointer on is how to ensure the We could add a panic |
@kleimkuhler Hmm. I imagine the simplest test-case would not require threads. For example, imagine we have two queries,
and you have fn foo(db) -> usize {
assert_eq!(db.bar(), 1);
} and then you invoke This will panic and — presently — leave an Now if you catch that panic (via But that is incorrect. |
But you are absolutely right that we should test the parallel behavior. In particular, I think my write-up neglected to mention an important detail. I wrote:
That is correct, but we also have to inspect the removed key to see if anyone is waiting for us and — if so — propagate a panic to them. I was imagining we would modify our channel so that instead of sending the final result I think for testing you might like to create this scenario where:
Unfortunately we can't quite force that scenario to occur yet (as you can see from the tests). Maybe it's worth modifying the library here -- the problem is that we have no hook that (The only thing I can imagine to do better would be somehow modifying the salsa runtime to send signals when a blocking event occurs or other things via a channel to some other thread which would then signal |
One thing that feels slightly off (but maybe is OK!) is that this will catch unexpected panics in salsa itself. An alternate is to create a wrapper around
|
(Note that with #63 you can figure out when one thread is blocking for another and thus induce panics more precisely.) |
To be honest, I can't remember if this was by design or not, but it seems like recovering gracefully from internal salsa panics is a feature, not a bug, no? Another option would be to use |
@matklad points out here that it may be very simple to implement the parallel case after all -- in fact, maybe it even works now! Basically, just the act of dropping the channel without sending anything should trigger the "dependent cases" to panic. |
When a query panics due to a bug during execution, the panic should be propagated to the caller, but the runtime should be left in a consistent state.
This is important for IDE use-case: language server is a long-lived process, and it is being exposed to incomplete code comparatively more often than a command-line compiler. As a result, panics due to bugs are expected to be both more numerous and more annoying.
Currently, I believe we leave
InProgress
marks in the storage in the case of a panic, which effectively poisons the query, and can't be fixed by changing inputs.The text was updated successfully, but these errors were encountered: