New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEST Deadlock if exception thrown in update #718

Closed
maharjun opened this Issue May 11, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@maharjun
Contributor

maharjun commented May 11, 2017

It Appears that the following situation leads to a deadlock in the v2.12.0 commit. When an exception is raised on any one thread and not on the other thread in the same simulation slice, the loop for that thread only is terminated by the following condition in the while loop. The rest of the threads then stall on the next barrier leading to a deadlock.

    } while ( to_do_ > 0 and not exit_on_user_signal_
      and not exceptions_raised.at( thrd ) );

On a related note, how feasible would it be to implement error handling across MPI Processes? (For example by sending an additional integer during the allgather to signal the state (Error or not) of the MPI Process and then conditionally (i.e. if any one process is in an error state) gather the exception message and throw a consistent exception across processes?

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser May 11, 2017

Contributor

@maharjun Thank you for reporting this. Do you have a reproducer that we could use as the basis of a regression test?

I believe that the fix is reasonably straightforward: We can introduce a loop in the omp master section beginning at simulation_manager.cpp:837 to check if any thread has raised an exception and set a flag accordingly that is then checked at oin the while condition.

Contributor

heplesser commented May 11, 2017

@maharjun Thank you for reporting this. Do you have a reproducer that we could use as the basis of a regression test?

I believe that the fix is reasonably straightforward: We can introduce a loop in the omp master section beginning at simulation_manager.cpp:837 to check if any thread has raised an exception and set a flag accordingly that is then checked at oin the while condition.

@heplesser heplesser self-assigned this May 11, 2017

@maharjun

This comment has been minimized.

Show comment
Hide comment
@maharjun

maharjun May 11, 2017

Contributor

I have code in the following repository that causes a deadlock, although I'm not sure one detects a deadlock in a unit test, I have the neuron model and network setup with all the instructions to install and test at the following branch of my repository

https://github.com/IGITUGraz/SpikeDetectorFuse/tree/NESTExceptionDeadlockTest

Contributor

maharjun commented May 11, 2017

I have code in the following repository that causes a deadlock, although I'm not sure one detects a deadlock in a unit test, I have the neuron model and network setup with all the instructions to install and test at the following branch of my repository

https://github.com/IGITUGraz/SpikeDetectorFuse/tree/NESTExceptionDeadlockTest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment