Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEST Deadlock if exception thrown in update #718

Closed
maharjun opened this issue May 11, 2017 · 2 comments
Closed

NEST Deadlock if exception thrown in update #718

maharjun opened this issue May 11, 2017 · 2 comments

Comments

@maharjun
Copy link
Contributor

@maharjun maharjun commented May 11, 2017

It Appears that the following situation leads to a deadlock in the v2.12.0 commit. When an exception is raised on any one thread and not on the other thread in the same simulation slice, the loop for that thread only is terminated by the following condition in the while loop. The rest of the threads then stall on the next barrier leading to a deadlock.

    } while ( to_do_ > 0 and not exit_on_user_signal_
      and not exceptions_raised.at( thrd ) );

On a related note, how feasible would it be to implement error handling across MPI Processes? (For example by sending an additional integer during the allgather to signal the state (Error or not) of the MPI Process and then conditionally (i.e. if any one process is in an error state) gather the exception message and throw a consistent exception across processes?

@heplesser
Copy link
Contributor

@heplesser heplesser commented May 11, 2017

@maharjun Thank you for reporting this. Do you have a reproducer that we could use as the basis of a regression test?

I believe that the fix is reasonably straightforward: We can introduce a loop in the omp master section beginning at simulation_manager.cpp:837 to check if any thread has raised an exception and set a flag accordingly that is then checked at oin the while condition.

@heplesser heplesser self-assigned this May 11, 2017
@maharjun
Copy link
Contributor Author

@maharjun maharjun commented May 11, 2017

I have code in the following repository that causes a deadlock, although I'm not sure one detects a deadlock in a unit test, I have the neuron model and network setup with all the instructions to install and test at the following branch of my repository

https://github.com/IGITUGraz/SpikeDetectorFuse/tree/NESTExceptionDeadlockTest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.