Changes for processors grid numa style #4097
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As core counts per socket and data movement bottlenecks increase, it can be better to optimize the numa mapping within a node. This adds a processors option to specify the number of numa domains for the processors grid numa style, with the default being 2. It also simplifies the numa mapping algorithm with more aggressive optimization to reduce comm between numa domains.
Summary
Related Issue(s)
Author(s)
Mike Brown, Intel
Licensing
By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).
Backward Compatibility
I chose the numa domains option to be a separate keyword for processors, preserving backward compatibility. Of course, the other option, that might be more natural to the LAMMPS command style but break compatibility, would be to make this an argument for 'numa'.
Implementation Notes
Post Submission Checklist
Further Information, Files, and Links