Add Kokkos support for atom sorting on device #3740

stanmoore1 · 2023-04-19T18:53:25Z

Summary

Currently atom sorting is done on host CPU when using Kokkos. This is expensive since it must move all the data, and is performed in serial. This PR adds Kokkos support for atom sorting on device.

Related Issue(s)

None

Author(s)

Stan Moore (SNL)

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

Yes

stanmoore1 · 2023-04-19T20:33:19Z

For 8M LJ atoms running for 1000 timesteps on a single V100 GPU, this PR gives ~25% speedup when sorting every 100 timesteps with the new code vs the old code and default sorting every 1000 timesteps. Part of the speedup is due to a better sorting binsize, and the other part is due to sorting more often on device which improves cache access.

stanmoore1 · 2023-04-19T20:42:24Z

This gives a marginal speedup for 256k Rhodo benchmark, but only when using the old default binsize for sorting = 6 Angstrom instead of 12, otherwise it gives a slowdown. Not sure how to get a better heuristic for sorting binsize.

stanmoore1 · 2023-04-19T20:42:43Z

@weinbe2 @arghdos

stanmoore1 · 2023-04-19T20:48:02Z

Looks like I need to run performance tests to see if the majority of styles get a speedup or slowdown with the change in default sorting binsize.

stanmoore1 · 2023-04-19T21:07:55Z

3 regression tests failing on the GPU, will debug those too

sjplimp · 2023-04-19T22:48:25Z

@stanmoore1 My recollection is that the optimal sort frequency was also different on CPU vs GPU? Is that a default
setting that should be different between the two modes?

stanmoore1 · 2023-04-19T22:50:12Z

@sjplimp good point, we should consider changing the default from 1000 to 100, will do some tests.

stanmoore1 · 2023-04-19T23:08:07Z

Regressions appear to be false positives related to changing binsize affecting the pRNG in fix langevin and fix gle.

stanmoore1 · 2023-04-20T18:50:04Z

The sorting binsize change seems to help large LJ but hurts most other benchmarks like ReaxFF, Tersoff, EAM, and Rhodo so I will revert.

stanmoore1 · 2023-04-20T19:58:17Z

Looks like sorting every 100 helps LJ but slows down other benchmarks too.

stanmoore1 · 2023-04-20T20:30:27Z

This is ready to merge from my POV. Noted that LJ can get a speedup from using a different binsize and sorting every 100 timesteps, but this won't be the default behavior since it hurts other benchmarks.

stanmoore1 · 2023-04-20T20:39:39Z

Actually seeing a slowdown for ReaxFF, need to double check

sjplimp · 2023-04-20T20:43:54Z

@stanmoore1 I don't understand if the atom_modify sort setttings affect this new on-GPU sorting. If they do, it seems like there should be info added to the atom_modify doc page to explain how to adjust these settings for both CPUs and GPUs, and also what the defaults are for both. Note that the atom_modify settings are for bin size and frequency.

stanmoore1 · 2023-04-20T20:45:54Z

@sjplimp no I reverted all the changes to the atom_modify sort settings since they cause performance regressions for some benchmarks, keeping the old defaults for now. So the only change in this PR is to sort atoms on the GPU instead of CPU.

sjplimp · 2023-04-20T21:04:15Z

@stanmoore1 Hi Stan - my point is that the atom_modify sort doc page has no discussion of CPU vs GPU, including for the defaults. If you expect the user to do something different for GPUs, then that should be explained, even if it is just guidance on what to try for GPUs. Are you saying the user should not do anything different in their input script for GPUs (for sorting), and that the code should now simply run faster (for sorting) b/c it is being done on the GPU? However you also seem to be saying that sometimes sorting on the GPU is slower than on the CPU? Which doesn't make that much sense to me.

stanmoore1 · 2023-04-20T21:37:10Z

The atom_modify sort defaults are sortfreq = 1000 and binsize = cutneighmax/2.0. For the LJ benchmark on GPUs, using sortfreq = 100 and binsize = cutneighmax is better but that hurts performance of other benchmarks like Tersoff on the GPU so we can't make that the default for GPUs. I could mention this in the docs.

sjplimp · 2023-04-20T22:00:36Z

If you mean atom_modify sort, then yes, I think that is a good thing to do, and that is best place to do it.
You should probably also mention it on the Kokkos package page under 7.4 Accelerator packages in the manual.

stanmoore1 · 2023-04-20T22:59:32Z

@sjplimp will do

…_sort

stanmoore1 · 2023-04-27T17:14:17Z

The performance regression I saw with ReaxFF is related to #3756.

stanmoore1 · 2023-04-27T20:44:29Z

This PR is ready to merge from my POV, but should be merged after #3756 since I already merged those changes into this PR.

stanmoore1 · 2023-04-27T20:46:14Z

Actually forgot to update the docs, will do that now.

athomps

I approve.

Add Kokkos support for atom sorting on device

5cb3d15

stanmoore1 added enhancement kokkos_package labels Apr 19, 2023

stanmoore1 self-assigned this Apr 19, 2023

stanmoore1 requested a review from sjplimp as a code owner April 19, 2023 18:53

stanmoore1 added 2 commits April 19, 2023 12:56

Need to set var

f5e55bb

Update docs

cf2e55f

stanmoore1 marked this pull request as draft April 19, 2023 19:24

stanmoore1 added 4 commits April 19, 2023 13:31

whitespace

b58368d

Merge branch 'develop' of https://github.com/lammps/lammps into kk_sort

b7ea2cc

Add missing BinOp struct

28d31de

Fix typo

313b3a6

stanmoore1 marked this pull request as ready for review April 19, 2023 20:24

Revert binsize change

b511681

stanmoore1 assigned akohlmey and unassigned stanmoore1 Apr 20, 2023

stanmoore1 requested a review from athomps April 20, 2023 20:31

Revert docs

7c7e626

stanmoore1 assigned stanmoore1 and unassigned akohlmey Apr 20, 2023

akohlmey added this to the Stable Release Summer 2023 milestone Apr 24, 2023

Merge branch 'develop' of github.com:lammps/lammps into kk_sort

89aa45e

stanmoore1 mentioned this pull request Apr 27, 2023

Fix bug in atom sorting with triclinic boxes #3756

Merged

Merge branch 'triclinic_sort' of github.com:stanmoore1/lammps into kk…

4705f46

…_sort

Fix small issue

7791ab7

stanmoore1 force-pushed the kk_sort branch from 7962ff1 to 7791ab7 Compare April 27, 2023 17:26

stanmoore1 assigned akohlmey and unassigned stanmoore1 Apr 27, 2023

stanmoore1 assigned stanmoore1 and unassigned akohlmey Apr 27, 2023

stanmoore1 added 2 commits April 27, 2023 15:17

Add a couple notes to the docs

50adf2b

Small tweak to docs

b17f9ac

stanmoore1 assigned akohlmey and unassigned stanmoore1 Apr 27, 2023

athomps approved these changes Apr 28, 2023

View reviewed changes

sjplimp approved these changes Apr 29, 2023

View reviewed changes

Merge branch 'develop' into kk_sort

e679936

akohlmey merged commit 41a0196 into lammps:develop May 1, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kokkos support for atom sorting on device #3740

Add Kokkos support for atom sorting on device #3740

stanmoore1 commented Apr 19, 2023 •

edited

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

sjplimp commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

sjplimp commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

sjplimp commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023 •

edited

sjplimp commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 27, 2023

stanmoore1 commented Apr 27, 2023 •

edited

stanmoore1 commented Apr 27, 2023

athomps left a comment

Add Kokkos support for atom sorting on device #3740

Add Kokkos support for atom sorting on device #3740

Conversation

stanmoore1 commented Apr 19, 2023 • edited

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

sjplimp commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 19, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

sjplimp commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

sjplimp commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023 • edited

sjplimp commented Apr 20, 2023

stanmoore1 commented Apr 20, 2023

stanmoore1 commented Apr 27, 2023

stanmoore1 commented Apr 27, 2023 • edited

stanmoore1 commented Apr 27, 2023

athomps left a comment

Choose a reason for hiding this comment

stanmoore1 commented Apr 19, 2023 •

edited

stanmoore1 commented Apr 20, 2023 •

edited

stanmoore1 commented Apr 27, 2023 •

edited