Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor functions to use MPI_Allreduce #2474

Merged
merged 4 commits into from
Oct 20, 2022

Conversation

hakonsbm
Copy link
Contributor

Fixes #1925 by refactoring MPIManager::any_true() and RandomManager::check_rng_synchrony() to use MPI_Allreduce.

@hakonsbm hakonsbm added S: Normal Handle this with default priority T: Maintenance Work to keep up the quality of the code and documentation. I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) labels Sep 20, 2022
@hakonsbm hakonsbm added this to PRs in progress in Kernel via automation Sep 20, 2022
Copy link
Contributor

@heplesser heplesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think there is a corner case for which I suggest a solution.

nestkernel/random_manager.cpp Outdated Show resolved Hide resolved
nestkernel/mpi_manager.cpp Show resolved Hide resolved
nestkernel/mpi_manager.cpp Outdated Show resolved Hide resolved
Kernel automation moved this from PRs in progress to PRs pending approval Sep 20, 2022
@hakonsbm
Copy link
Contributor Author

@heplesser I didn't think of that corner case, thanks for pointing it out. However, I don't agree with moving the local non-equality check into the equal_cross_ranks() function. The equal_cross_ranks() function has one job: to check if a value is equal across all ranks. Local non-equality is a special case, which should be handled locally before checking if values are equal across all ranks. Another problem with handling local non-equality in the equal_cross_ranks() function by passing local_min and local_max values is what to do in cases where there is only a single value, like when checking the rank-synchronized RNGs.

In the case where there is a local non-equality, this has to be communicated to all ranks. I've taken your suggestion of using -inf instead of -1 to flag a local non-equality, which will become the result of values[ 1 ] after communication and in that way raise an exception in all MPI processes. Please have another look.

@heplesser
Copy link
Contributor

@hakonsbm You got a point. My idea was to isolate the "-inf" magic entirely in the equal_cross_... function but then one should probably have an equal_across_vps() functions. Your solution is fine and I will approve.

@hakonsbm
Copy link
Contributor Author

@mlober Friendly ping

@mlober
Copy link
Contributor

mlober commented Oct 19, 2022

Hi, sorry for the super late reply, I completely missed this. I looked through everything and I like your solution. Thanks for the improvement!

@heplesser heplesser merged commit 52ff016 into nest:master Oct 20, 2022
Kernel automation moved this from PRs pending approval to Done (PRs and issues) Oct 20, 2022
@hakonsbm hakonsbm deleted the use_mpi_allreduce branch October 20, 2022 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: No breaking change Previously written code will work as before, no one should note anything changing (aside the fix) S: Normal Handle this with default priority T: Maintenance Work to keep up the quality of the code and documentation.
Projects
Kernel
  
Done
Development

Successfully merging this pull request may close these issues.

Implement functions with MPI_Allreduce
3 participants