[Feature Request] KOKKOS fix rigid #1624

ag949 · 2019-08-06T21:40:03Z

Summary

Porting fix rigid to KOKKOS

Detailed Description

In the past I've benefited significantly from using LAMMPS accelerated with KOKKOS. I am trying to run large scale simulations in GPUs with rigid bodies. Is porting fix rigid planned for the future?

jaj52 · 2019-08-06T21:56:55Z

This will be super useful!

Pakketeretet2 · 2019-08-29T14:22:43Z

I would like to take a look because I am interested in it myself but I would have to bother @stanmoore1 a lot with Kokkos-related questions, since porting a complicated fix like rigid seems to be a lot harder than porting a pair style.
Are you interested in fix rigid or fix rigid/small?

ag949 · 2019-08-29T14:52:28Z

This sounds great, thank you. As you say, it doesn't look trivial. We would be interested in fix rigid.

stanmoore1 · 2019-08-29T14:54:31Z

I'm happy to answer questions about Kokkos.

Pakketeretet2 · 2019-09-04T14:34:44Z

OK, I will take a look at it. Unfortunately, I am more interested in rigid/small but I'll see what I can do for either. I think rigid/small would be more suitable for GPUs and the like because it seems more similar to what Hoomd-Blue does.
@stanmoore1, should a simple port be reasonably performant or are there special things I should worry about when porting fixes like this? Or should I just port and get it working first, and then worry about performance?

stanmoore1 · 2019-09-04T22:18:26Z

@Pakketeretet2 yes a simple port is fine, i.e. just thread over loops of atoms. After we have a port then we'll worry about performance, I can do some tuning. I used c++11 lambdas extensively in #1650 and it made porting easier since I didn't have to mess with functors or tags.

Pakketeretet2 · 2019-09-11T21:30:54Z

OK, since fix propel/self is pretty much done from a programming point of view I have started with this now.

stanmoore1 · 2019-09-11T21:32:32Z

Great, feel free to ping me either here on Github or via email if you have questions.

Pakketeretet2 · 2019-09-11T21:36:35Z

Will do. For a second I couldn't find the lambdas anywhere but I found them. They look nicer than C++11 lambdas so they were hard to spot. :)

Pakketeretet2 · 2019-09-12T01:46:36Z

The latest changes with minimize seem to cause linking problems with clang on my laptop, but it might be something dumb on my part. I'll look into it more tomorrow.

akohlmey · 2019-09-12T02:23:53Z

The latest changes with minimize seem to cause linking problems with clang on my laptop, but it might be something dumb on my part. I'll look into it more tomorrow.

You are building LAMMPS with cmake, right?
There is a small adjustment needed:

$ git diff
  diff --git a/cmake/Modules/Packages/KOKKOS.cmake b/cmake/Modules/Packages/KOKKOS.cmake
  index 0e5bf70a4..cc1e05162 100644
  --- a/cmake/Modules/Packages/KOKKOS.cmake
  +++ b/cmake/Modules/Packages/KOKKOS.cmake
  @@ -17,6 +17,8 @@ if(PKG_KOKKOS)
                            ${KOKKOS_PKG_SOURCES_DIR}/atom_vec_kokkos.cpp
                            ${KOKKOS_PKG_SOURCES_DIR}/comm_kokkos.cpp
                            ${KOKKOS_PKG_SOURCES_DIR}/comm_tiled_kokkos.cpp
  +                         ${KOKKOS_PKG_SOURCES_DIR}/min_kokkos.cpp
  +                         ${KOKKOS_PKG_SOURCES_DIR}/min_linesearch_kokkos.cpp
                            ${KOKKOS_PKG_SOURCES_DIR}/neighbor_kokkos.cpp
                            ${KOKKOS_PKG_SOURCES_DIR}/neigh_list_kokkos.cpp
                            ${KOKKOS_PKG_SOURCES_DIR}/neigh_bond_kokkos.cpp

Pakketeretet2 · 2019-09-12T12:48:59Z

Yup, that must have been it. I figured it was something like this but didn't know where to find the CMake files. Thanks!

…

On Wed, Sep 11, 2019 at 10:23 PM Axel Kohlmeyer ***@***.***> wrote: The latest changes with minimize seem to cause linking problems with clang on my laptop, but it might be something dumb on my part. I'll look into it more tomorrow. You are building LAMMPS with cmake, right? There is a small adjustment needed: $ git diff diff --git a/cmake/Modules/Packages/KOKKOS.cmake b/cmake/Modules/Packages/KOKKOS.cmake index 0e5bf70a4..cc1e05162 100644 --- a/cmake/Modules/Packages/KOKKOS.cmake +++ b/cmake/Modules/Packages/KOKKOS.cmake @@ -17,6 +17,8 @@ if(PKG_KOKKOS) ${KOKKOS_PKG_SOURCES_DIR}/atom_vec_kokkos.cpp ${KOKKOS_PKG_SOURCES_DIR}/comm_kokkos.cpp ${KOKKOS_PKG_SOURCES_DIR}/comm_tiled_kokkos.cpp + ${KOKKOS_PKG_SOURCES_DIR}/min_kokkos.cpp + ${KOKKOS_PKG_SOURCES_DIR}/min_linesearch_kokkos.cpp ${KOKKOS_PKG_SOURCES_DIR}/neighbor_kokkos.cpp ${KOKKOS_PKG_SOURCES_DIR}/neigh_list_kokkos.cpp ${KOKKOS_PKG_SOURCES_DIR}/neigh_bond_kokkos.cpp — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1624?email_source=notifications&email_token=AEFLJ7AUA6G6GXK4ID5H5B3QJGR3ZA5CNFSM4IJ2PJIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6QOFAQ#issuecomment-530637442>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEFLJ7F6KDVV2K423QAYXGTQJGR3ZANCNFSM4IJ2PJIA> .

Pakketeretet2 · 2019-09-13T15:52:30Z

@stanmoore1 In some places the rigid code calls functions in math_extra that expects arrays of (implicit) size three. I am sure Kokkos provides a way to slice an array (subview or something). Two related questions:

These functions are inline. Will it work to just call those from within a Kokkos::parallel_for or do the functions in themselves need porting?
If it works to just call them, how do I properly grab a sub-part of a Kokkos array as an array? Is that possible? So I am looking for the equivalent of angmom[ibody], where angmom is a double**, or (angmom+3*ibody) if angmom is stored as a flat double *.

stanmoore1 · 2019-09-13T16:12:22Z

Any function that is called in a Kokkos::parallel_for needs to be declared with KOKKOS_INLINE_FUNCTION so the GPU __device__ decorators are added. For example, see https://github.com/lammps/lammps/blob/master/src/KOKKOS/math_special_kokkos.h#L53.

If a Kokkos view is 2D LayoutRight, then you can get a 1D slice view using Kokkos::subview(d_view,i,Kokkos::ALL);. For a flat array, you can get a raw pointer using view.data()+offset but that circumvents Kokkos bounds checking.

stanmoore1 · 2019-09-13T16:21:10Z

For fix rigid/small that has the body struct, you can just make a Kokkos View of type body. Then the calls are simply MathExtra::transpose_matvec(rot,d_body[i].angmom,wbody);

stanmoore1 · 2019-09-13T16:35:27Z

Actually I'm not 100% sure if MathExtra::transpose_matvec(rot,d_body[i].angmom,wbody); would work, would have to try it out.

Pakketeretet2 · 2019-09-13T21:54:53Z

OK, before I can actually test any of that I stumbled upon another thing: Rigid uses a bunch of additional storage for center-of-mass positions, velocities and forces. It looks like there are no associated Kokkos containers or data masks for that yet, at least, I cannot see any masks for them in atom_masks.h nor any containers in atom_kokkos.h.

What is the preferred way to add such things? These arrays only have to be the size of nbody which will probably be significantly smaller than nlocal.

stanmoore1 · 2019-09-13T21:57:49Z

Just to be clear, are you working on fix rigid or fix rigid/small?

Pakketeretet2 · 2019-09-13T21:58:40Z

fix rigid.
EDIT: I plan to port both, but started with rigid. If you think rigid/small will be significantly easier I can also try that one first.

stanmoore1 · 2019-09-13T22:08:53Z

I would add Kokkos dualviews inside fix rigid/kk and call sync/modify directly on them. No need to add them to atom_masks.h. In other words, any array that is declared in fix_rigid.h would have a corresponding Kokkos view in fix_rigid_kokkos.h.

stanmoore1 · 2019-09-13T22:12:33Z

For example: https://github.com/lammps/lammps/blob/master/src/KOKKOS/fix_minimize_kokkos.h#L47

Pakketeretet2 · 2019-09-14T14:05:23Z

Ah, that is brilliant! So, just to make sure I fundamentally understand this; whenever you do a memoryKK->create_array or memoryKK->grow_array, Kokkos wraps that memory in a dualview so that after you have a handle to the arrays on both the host and the device? And then in device code you use d_view and in host code you use h_view, and you just have to make sure to sync it whenever required? Does Kokkos take care of memcpy'ing it to the device from host when you create an array? It seems to but I am not sure.

Do you think it makes sense for me to make a pull request to push my changes to so you can review code as we go? That might put more strain on your time than me just bugging you here though.

Overall I'm pleasantly surprised by how straightforward it seems to port to Kokkos. I will try to take notes that might be useful for me and/or others later.

Pakketeretet2 · 2019-09-15T19:20:59Z

So I now have the fix create its own Kokkos counterparts to arrays in the base fix rigid. However, when I do memoryKK->create_kokkos(k_xcm, xcm, "rigid/kk:xcm") the data in xcm is now all zeros, but the data in the host and device views is still zeros. I assume that is because there has to be an explicit synchronization. For data that has an atom mask the atomKK->sync and atomKK->modified took care of that, but how does that work for my own custom arrays?

…

On Fri, Sep 13, 2019, 18:12 Stan Moore ***@***.***> wrote: For example: https://github.com/lammps/lammps/blob/master/src/KOKKOS/fix_minimize_kokkos.h#L47 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1624?email_source=notifications&email_token=AEFLJ7BUSPNOQF4QX5Y27X3QJQF5LA5CNFSM4IJ2PJIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6WKBSY#issuecomment-531407051>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEFLJ7GUCISLLVPYYEKLGWDQJQF5LANCNFSM4IJ2PJIA> .

stanmoore1 · 2019-09-16T15:08:18Z

whenever you do a memoryKK->create_array or memoryKK->grow_array, Kokkos wraps that memory in a dualview so that after you have a handle to the arrays on both the host and the device?

This is correct. Typically memoryKK is used when you need to access the host pointer in legacy code since it takes the pointer from the host Kokkos view and assigns it to the legacy host pointer. If the array is only going to be accessed on the device, then using Kokkos::DualView directly is preferred since it skips the aliasing part.

And then in device code you use d_view and in host code you use h_view, and you just have to make sure to sync it whenever required?

Correct

Does Kokkos take care of memcpy'ing it to the device from host when you create an array? It seems to but I am not sure.

It is a little complicated. If the host view is marked as modified, it will do the create/grow on the host, otherwise it will default to the device.

However, when I do memoryKK->create_kokkos(k_xcm, xcm, "rigid/kk:xcm") the data in xcm is now all zeros, but the data in the host and device views is still zeros. I assume that is because there has to be an explicit synchronization.

By default Kokkos will initialize a view to all zeros when it is created or the new part when it is resized.

For data that has an atom mask the atomKK->sync and atomKK->modified took care of that, but how does that work for my own custom arrays?

You just call one of these:

k_view.modify<DeviceType>();
k_view.sync<DeviceType>();
k_view.modify<LMPHostType>();
k_view.sync<LMPHostType>();

stanmoore1 · 2019-09-16T15:09:26Z

I suggest creating a work in progress PR and I can review as I have time.

Pakketeretet2 · 2019-09-18T03:15:37Z

Am working on it in PR #1680.

The syncing is still not entirely clear to me. Do these do this? If so, the function names make some sense to me. Otherwise they don't. :)
k_view.modify(); --> Mark that data in k_view was modified on the device
k_view.sync(); --> Sync the data in k_view on the device (from the host)
k_view.modify(); --> Mark that the data in k_view was modified on the host
k_view.sync(); --> Mark sync the data in k_vie won the host (from the device)

stanmoore1 · 2019-09-18T14:32:18Z

Yes:

k_view.modify<DeviceType>(); --> Mark that data in k_view was modified on the device k_view.sync<DeviceType>(); --> Sync the data in k_view on the device (from the host) k_view.modify<LMPHostType>(); --> Mark that the data in k_view was modified on the host k_view.sync<LMPHostType>(); --> Mark sync the data in k_view on the host (from the device)

You never want to concurrently mark modified on both host and device. Also if you sync but nothing is modified it is just a no-op.

stanmoore1 · 2019-09-18T14:34:14Z

So basically before any kernel where the data is used (Kokkos or legacy), first sync to the execution space (host or device). Then after the data is changed by the kernel, mark it as modified on the corresponding execution space. This will ensure you always have synced data.

Pakketeretet2 · 2019-09-23T15:52:16Z

Sorry to keep bothering you @stanmoore1,
"It is a little complicated. If the host view is marked as modified, it will do the create/grow on the host, otherwise it will default to the device."
This is still not clear to me. Or rather, it is, I think, but does not answer my original question.
In FixRigid::FixRigid(), some arrays (let's take the body array as example, because that one has me stumped) are allocated and filled with data, on the host. I want to preserve these values but move them to the Kokkos array because I need them inside device kernels. However, because the host array is filled in the base constructor, the data is already in the host array by the time I would call memoryKK->create(). Is there a neat Kokkos function to move the data to the kokkos array at the creation of the Kokkos array? I can work around it using temporary arrays and memcpy's, but there are a lot of arrays and I would like to avoid all that boilerplate. If not, I suppose I can just write a simple function in FixRigid to use and then it can be moved to an appropriate location later.
EDIT: From memory_kokkos.h it seems like there is none. Did you work around this issue in another way or did you simply never encounter it?

The good news is that, when I manually fixed the body array using temporary arrays I do get the correct numbers in the thermo output for the first step, and the porting of the initial_integrate and final_integrate functions should be a lot easier than figuring all the memory stuff out.

stanmoore1 · 2019-09-24T17:23:00Z

@Pakketeretet2

In FixRigid::FixRigid(), some arrays (let's take the body array as example, because that one has me stumped) are allocated and filled with data, on the host. I want to preserve these values but move them to the Kokkos array because I need them inside device kernels. However, because the host array is filled in the base constructor, the data is already in the host array by the time I would call memoryKK->create().

Ah yes, this is tricky. Normally I would just override the grow_arrays method but it is not possible to call a virtual method in the base constructor. I think this is poor design in FixRigid so I would suggest moving this memory allocation in the constructor to an init function, then you can override the grow_arrays method like normal.

Pakketeretet2 · 2019-09-24T21:58:07Z

I think moving initialization to init() makes sense, both conceptually and in that it would solve this specific problem. However, I would not like to touch it at this point because I have not verified yet whether or not the other FixRigid variants depend on some of these flags being set in their own constructors, so for now I will just memcpy the host data to a temporary array and copy it back into the kokkos array.

stanmoore1 · 2019-09-24T22:06:05Z

For now you could just temporarily wrap the pointers into a host Kokkos view then copy up to the device, for example see https://github.com/stanmoore1/lammps/blob/fft/src/KOKKOS/fft3d_kokkos.cpp#L91

stanmoore1 · 2019-09-24T22:09:55Z

In other words you can use something like

HAT::t_int_1d h_body(body,atom->local);
k_body.h_view = h_body;
k_body.modify<LMPHostType>();
k_body.sync<DeviceType>();

Pakketeretet2 · 2019-09-24T23:39:10Z

In other words you can use something like

HAT::t_int_1d h_body(body,atom->local);
k_body.h_view = h_body;
k_body.modify<LMPHostType>();
k_body.sync<DeviceType>();

Right. I suppose this works because the host view "takes ownership" of the original array and the view is copied by value. That looks cleaner than what it is now, I will copy it in later.

Pakketeretet2 · 2019-10-01T17:58:28Z

In other words you can use something like

HAT::t_int_1d h_body(body,atom->local);
k_body.h_view = h_body;
k_body.modify<LMPHostType>();
k_body.sync<DeviceType>();

This does not seem to work. Even after a call to modify() and sync, k_body.d_view.size() is reported to be 0, and a call to memoryKK->grow_kokkos() leads to the data being overwritten with zeros again.

Turns out I need to do this:

  HAT::t_int_1d h_body(body,nmax);
  k_body.modify<LMPHostType>();
  k_body.h_view = h_body;
  k_body.d_view = create_mirror_view(DeviceType(), k_body.h_view);
  k_body.sync<DeviceType>();

Is this because my k_body has not been initialized at this point yet? As in, I cannot do memoryKK->create_kokkos(k_body, body,...) because that overwrites body but create_kokkos does seem to "couple" the d_view and h_view together, which I might have to do explicitly?
Even Kokkos::deep_copy didn't seem to do anything.

Pakketeretet2 · 2019-10-01T18:42:07Z

So I semi-abandoned the porting of the functions and instead kept everything on the host but with KOKKOS arrays just to make sure the output is identical to the original code. That seems to be the case now for the rigid example, so I will now selectively port the functions using your comments on my PR.
I hope that does not generate too much noise.

stanmoore1 · 2021-07-28T21:10:06Z

I updated #1680 to latest master. This is still on my to-do list.

ag949 added the enhancement label Aug 6, 2019

akohlmey added the volunteer_needed label Aug 10, 2019

akohlmey mentioned this issue Jul 28, 2021

Fix rigid kokkos #1680

Closed

10 tasks

akohlmey added work_in_progress and removed volunteer_needed labels Jul 28, 2021

akohlmey assigned stanmoore1 Jul 28, 2021

[Feature Request] KOKKOS fix rigid #1624

[Feature Request] KOKKOS fix rigid #1624

Comments

ag949 commented Aug 6, 2019 • edited

jaj52 commented Aug 6, 2019

Pakketeretet2 commented Aug 29, 2019

ag949 commented Aug 29, 2019

stanmoore1 commented Aug 29, 2019

Pakketeretet2 commented Sep 4, 2019

stanmoore1 commented Sep 4, 2019

Pakketeretet2 commented Sep 11, 2019

stanmoore1 commented Sep 11, 2019

Pakketeretet2 commented Sep 11, 2019

Pakketeretet2 commented Sep 12, 2019

akohlmey commented Sep 12, 2019

Pakketeretet2 commented Sep 12, 2019 via email

Pakketeretet2 commented Sep 13, 2019

stanmoore1 commented Sep 13, 2019

stanmoore1 commented Sep 13, 2019

stanmoore1 commented Sep 13, 2019

Pakketeretet2 commented Sep 13, 2019

stanmoore1 commented Sep 13, 2019

Pakketeretet2 commented Sep 13, 2019 • edited

stanmoore1 commented Sep 13, 2019 • edited

stanmoore1 commented Sep 13, 2019

Pakketeretet2 commented Sep 14, 2019

Pakketeretet2 commented Sep 15, 2019 via email

stanmoore1 commented Sep 16, 2019 • edited

stanmoore1 commented Sep 16, 2019

Pakketeretet2 commented Sep 18, 2019

stanmoore1 commented Sep 18, 2019

stanmoore1 commented Sep 18, 2019

Pakketeretet2 commented Sep 23, 2019 • edited

stanmoore1 commented Sep 24, 2019

Pakketeretet2 commented Sep 24, 2019

stanmoore1 commented Sep 24, 2019 • edited

stanmoore1 commented Sep 24, 2019 • edited

Pakketeretet2 commented Sep 24, 2019

Pakketeretet2 commented Oct 1, 2019 • edited

Pakketeretet2 commented Oct 1, 2019

stanmoore1 commented Jul 28, 2021

ag949 commented Aug 6, 2019 •

edited

Pakketeretet2 commented Sep 13, 2019 •

edited

stanmoore1 commented Sep 13, 2019 •

edited

stanmoore1 commented Sep 16, 2019 •

edited

Pakketeretet2 commented Sep 23, 2019 •

edited

stanmoore1 commented Sep 24, 2019 •

edited

stanmoore1 commented Sep 24, 2019 •

edited

Pakketeretet2 commented Oct 1, 2019 •

edited