New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional timers and local spike counter for profiling and performance measurements #434

Merged
merged 7 commits into from Aug 25, 2016

Conversation

Projects
None yet
4 participants
@wschenck
Contributor

wschenck commented Jul 29, 2016

In this pull request two timers and one counter are added to the nest kernel.

The timers contain (1) the time spent in gather_events for spike communication via MPI and (2) the time spent in gather_events for collocating spike buffers. The counter contains the number of spike events generated on the specific rank (both on- and offgrid). All these measurements are for the last call to simulate. They are reset to zero at the beginning of a new call to simulate within prepare_simulation.

These measurements are useful for light-weight profiling and collecting of performance data.

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Aug 2, 2016

Contributor

@wschenck Looks good to me. What is the reason for having the stopwatches as member variables instead of local variable where they are needed? And do we really need two of them?

Contributor

heplesser commented Aug 2, 2016

@wschenck Looks good to me. What is the reason for having the stopwatches as member variables instead of local variable where they are needed? And do we really need two of them?

@wschenck

This comment has been minimized.

Show comment
Hide comment
@wschenck

wschenck Aug 3, 2016

Contributor

@heplesser You are right, there is no real reason for the stopwatches being member variables. And one could live with only one of them. I will update the pull request in the beginning of next week. I tend towards defining the single stopwatch as a static local variable then.

Contributor

wschenck commented Aug 3, 2016

@heplesser You are right, there is no real reason for the stopwatches being member variables. And one could live with only one of them. I will update the pull request in the beginning of next week. I tend towards defining the single stopwatch as a static local variable then.

@@ -577,12 +596,26 @@ EventDeliveryManager::deliver_events( thread t )
void
EventDeliveryManager::gather_events( bool done )
{
// IMPORTANT: Ensure that gather_events(..) is called from a single thread and
// NOT from a parallel OpenMP region!!!

This comment has been minimized.

@heplesser

heplesser Aug 3, 2016

Contributor

Is there any way to enforce this by an assertion?

@heplesser

heplesser Aug 3, 2016

Contributor

Is there any way to enforce this by an assertion?

This comment has been minimized.

@apeyser

apeyser Aug 8, 2016

Contributor

! omp_in_parallel() might do the trick

@apeyser

apeyser Aug 8, 2016

Contributor

! omp_in_parallel() might do the trick

This comment has been minimized.

@wschenck

wschenck Aug 13, 2016

Contributor

I looked into this: Unfortunately, gather_events(..) is called from within a parallel region, although only by one thread, which is enforced by "omp single" and "omp master". However, omp_in_parallel() and also omp_get_num_threads() refer to the superordinate parallel region and don't perform as expected for this reason.
Anyhow, the problem that gather_events(..) should be called only by a single thread exists regardless of the StopWatch introduced here. In this way, the problem is independant from this specific pull request. But of course, it would be good to find a better solution than to just indicate this requirement not at all or just by a comment.

@wschenck

wschenck Aug 13, 2016

Contributor

I looked into this: Unfortunately, gather_events(..) is called from within a parallel region, although only by one thread, which is enforced by "omp single" and "omp master". However, omp_in_parallel() and also omp_get_num_threads() refer to the superordinate parallel region and don't perform as expected for this reason.
Anyhow, the problem that gather_events(..) should be called only by a single thread exists regardless of the StopWatch introduced here. In this way, the problem is independant from this specific pull request. But of course, it would be good to find a better solution than to just indicate this requirement not at all or just by a comment.

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Aug 3, 2016

Contributor

@wschenck Good. Since the stopwatch is only to be used in serial context, a static local should not cause any problems.

Contributor

heplesser commented Aug 3, 2016

@wschenck Good. Since the stopwatch is only to be used in serial context, a static local should not cause any problems.

@apeyser

This comment has been minimized.

Show comment
Hide comment
@apeyser

apeyser Aug 8, 2016

Contributor

👍

Contributor

apeyser commented Aug 8, 2016

👍

@wschenck

This comment has been minimized.

Show comment
Hide comment
@wschenck

wschenck Aug 13, 2016

Contributor

My travis tests failed only for the configurations with MPI. The corresponding error message is shown below. It occurs in "make installcheck" and looks really weird. I doubt that this is the fault of my code changes. See in the following:

Phase 1: Testing if SLI can execute scripts and report errors

Running test 'selftests/test_pass.sli'... *** Error in `mpirun': munmap_chunk(): invalid pointer: 0x00002ad9a33e111d ***
Failed: unexpected exit code 134
Running mpirun -np 1 /home/travis/build/wschenck/nest-simulator/result/bin/nest /home/travis/build/wschenck/nest-simulator/result/share/doc/nest/selftests/test_pass.sli
modprobe: ERROR: could not insert 'fglrx': No such device
Error: Fail to load fglrx kernel module!
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly

Contributor

wschenck commented Aug 13, 2016

My travis tests failed only for the configurations with MPI. The corresponding error message is shown below. It occurs in "make installcheck" and looks really weird. I doubt that this is the fault of my code changes. See in the following:

Phase 1: Testing if SLI can execute scripts and report errors

Running test 'selftests/test_pass.sli'... *** Error in `mpirun': munmap_chunk(): invalid pointer: 0x00002ad9a33e111d ***
Failed: unexpected exit code 134
Running mpirun -np 1 /home/travis/build/wschenck/nest-simulator/result/bin/nest /home/travis/build/wschenck/nest-simulator/result/share/doc/nest/selftests/test_pass.sli
modprobe: ERROR: could not insert 'fglrx': No such device
Error: Fail to load fglrx kernel module!
Error! Fail to load fglrx kernel module! Maybe you can switch to root user to load kernel module directly

@jougs

This comment has been minimized.

Show comment
Hide comment
@jougs

jougs Aug 14, 2016

Contributor

The failing is not due to your code, but is a known Travis issue at the moment. I've reported it to them, see here for details: travis-ci/packer-templates#221

Contributor

jougs commented Aug 14, 2016

The failing is not due to your code, but is a known Travis issue at the moment. I've reported it to them, see here for details: travis-ci/packer-templates#221

@heplesser

This comment has been minimized.

Show comment
Hide comment
@heplesser

heplesser Aug 25, 2016

Contributor

👍 from me and merging.

Contributor

heplesser commented Aug 25, 2016

👍 from me and merging.

@heplesser heplesser merged commit b5ede29 into nest:master Aug 25, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment