-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove superfluous OpenMP barriers #2098
Conversation
@stinebuu I think we should wait for the original reviewers on this one, even if that should delay it to NEST 3.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the removal of these barriers is sound and reasonable with minimal changes to the measured runtime of deliver_spike_data.
@@ -446,7 +442,6 @@ EventDeliveryManager::gather_spike_data_( const thread tid, | |||
decrease_buffer_size_spike_data_ = false; | |||
} | |||
} | |||
#pragma omp barrier | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we don't need this barrier since gather_completed_checker_.any_false() also invokes a barrier.
@@ -706,7 +701,6 @@ EventDeliveryManager::gather_target_data( const thread tid ) | |||
if ( gather_completed_checker_.all_true() ) | |||
{ | |||
set_complete_marker_target_data_( assigned_ranks, send_buffer_position ); | |||
#pragma omp barrier | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here: Superfluous barrier because of preceding invocation of gather_completed_checker_.all_true().
@@ -726,7 +720,6 @@ EventDeliveryManager::gather_target_data( const thread tid ) | |||
|
|||
const bool distribute_completed = distribute_target_data_buffers_( tid ); | |||
gather_completed_checker_[ tid ].logical_and( distribute_completed ); | |||
#pragma omp barrier | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here: Superfluous barrier because of following invocation of gather_completed_checker_.any_false().
@@ -736,7 +729,6 @@ EventDeliveryManager::gather_target_data( const thread tid ) | |||
buffer_size_target_data_has_changed_ = kernel().mpi_manager.increase_buffer_size_target_data(); | |||
} | |||
} | |||
#pragma omp barrier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here: Superfluous barrier because of preceding invocation of gather_completed_checker_.any_false().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with removing it, but I think the reason is that we will hit the following implicit barrier when checking for the while condition.
// Exit gather loop if all local threads and remote processes are | ||
// done. | ||
#pragma omp barrier | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With barrier, we measure for deliver_spike_data the runtime of the slowest thread; without barrier, we measure most likely the average runtime over all threads. I think, both variants are ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code provisions made for benchmarking should not reduce speed under production conditions, so I think it is sensible to remove this barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this fall into the category of performance instrumentation that was discussed some time ago? We could make this barrier dependent on a cmake flag, but it might also confuse more than it helps. In any case, for sure this important difference in measurement must also be expressed in the documentation of the kernel timers!
@stinebuu, could you check if this difference is clear enough in the docs?
Would this be a problem for comparing benchmarks between NEST versions with/without this barrier?
Sorry for not answering earlier, I completely missed that some of the large number of emails on the nest-simulator mailing list contained my name. I will change the settings of my email program to avoid this in the future. |
Hi @wschenck! Thank you for your review! |
Honestly, I don't think that it does not get much better than it is with the review by @wschenck. Together with the tests that @stinebuu carried out already earlier on Piz Daint and JUSUF, I think we should once the question by @terhorstd above is resolved. @wschenck: how bad would the difference between old and new measurements be for the timers you mention above? |
@jougs: I expect slight differences only for the timer deliver_spike_data. From a max operation (before) to a mean operation (after code change in PR) over threads. The resulting difference depends on the variance between threads. I guess that this variance is very small so that the measurement differences would be neglible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work @stinebuu. I agree, it appears to be save to remove all these barriers.
@@ -736,7 +729,6 @@ EventDeliveryManager::gather_target_data( const thread tid ) | |||
buffer_size_target_data_has_changed_ = kernel().mpi_manager.increase_buffer_size_target_data(); | |||
} | |||
} | |||
#pragma omp barrier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with removing it, but I think the reason is that we will hit the following implicit barrier when checking for the while condition.
This PR fixes #1441.
Tested on Piz Daint and JUSUF.