-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race conditions when throwing an exception #95
Comments
@ktf Does this happen after sending edit: nvm, |
@rbx I believe, in order to fix this, we need to intercept any exception, set the state machine into proper error state (which will notify the hanging control plugin threads), and then re-throw probably. Also, we need to enhance the control plugin to handle the error state properly. This could very well be a regression from the times before we stripped out the state machine thread from the device. |
What is the actual symptom/issue here? Does the device not shut down? Intercepting any exceptions from plugin seems like an overkill to me and may be really tricky since those may be running in different threads. What about notyfing the plugins out of the catch (which can be either one that Giulio already has, or an extra one inside our DeviceRunner/RunStateMachine. A note on error state - it is built in a way that will keep the device hanging (for attaching debugger) and will not allow any subsequent state changes. |
Yes, I assume the stack trace is collected in the hanging state. Only then it makes any sense to me. You can see, that the program waits in the destructor of the control plugin on one of its threads (The exception has led to the situation, that the device runner went out of scope and began destruction). All control plugin threads are also waiting. At that point no thread will progress the device state machine any more.
Not from plugin, but from the device. Basically wrap the states with a
Well, at least the way it is hanging now, is not this intended state of hanging. |
@dennisklein: Yes, I think there was also a @rbx: yes, it's hanging there in the destructor of the |
Agreed, it should react properly if anything else (device, user code, ..?) sets error state. What I am unsure about, is if an exception should in all cases lead to the error state. We could let it propagate until the What do you think? |
Going through the error state will make it unrecoverable (error state does not allow any further state transitions). |
If any exception escapes any user code we call, I would say this is always a proper unrecoverable error state. I am open to having a separate error state for this, if the existing one is meant for something else (e.g. set by user code only), but I would prefer if we signal plugins through the state machine change signals. |
@ktf We believe we have fixed the problem in It should not require any changes on your side. The intended behaviour is, that any uncaught exception may escape the Other plugins, that spawn long running threads, are now notified through the existing StateChange subscription plugin API and may react accordingly. We transition the device state machine to the error state now when we see an uncaught exception. |
Great, thanks.
…--
Ciao,
Giulio
|
@ktf Thank you for helping to improve FairMQ! We believe your issue has been resolved in release |
Looks like there is a race condition when trowing an exception and catching it inside the
ConditonalRun()
method of aFairMQDevice
. This seems to be yet again another Control plugin related issue. Traceback as follows:notice how
runDataProcessing.cxx:733
is acatch (std::exception& e)
. Moving theDeviceRunner
out of thetry..catch
does not work either.The text was updated successfully, but these errors were encountered: