Atomics with Serial Backend - Default should be Disable? #549

nmhamster · 2016-11-21T23:35:11Z

I understand some of the background to why we have atomics enabled for serial backend but I think the default should be that atomic operations are disabled and result in standard load/stores when serial backend is used. This would assume a Kokkos CMake/make option to re-enable if needed. Given that a good number of codes are using the serial backend on MPI-only platforms still this seems an overhead which is not needed or necessary. Removing this would enable more optimization and possibly vectorization for this case.

hcedwar · 2016-11-21T23:45:01Z

This would only be viable for a Serial-only configuration. Otherwise detected whether within a parallel region or not would swamp the cost.

nmhamster · 2016-11-21T23:47:08Z

@hcedwar I think you tell people if they are using serial backend we won't compile thread safe atomics etc in there unless they use the override. There are definitely Trilinos builds which are serial-only.

mhoemmen · 2016-11-21T23:50:22Z

What if people use Kokkos::Serial in thread-parallel tasks?

crtrott · 2016-11-21T23:59:18Z

What Carter meant was that if we enable both OpenMP and Serial, atomics are still enabled.
If Serial is the ONLY enabled backend, then we shortcut the atomics.

mhoemmen · 2016-11-22T03:35:01Z

Would it make sense for Kokkos::atomic_* to be functions of the execution space? Users would write, for example,

execution_space::atomic_add (&x[i], 42.0);

or for an execution space instance instance,

instance.atomic_add (&x[i], 42.0);

hcedwar · 2016-11-22T18:05:06Z

This would deviate way to much from the ISO/C++ standard, and from where the C++20 standard is headed for allowing atomic operations on non- std::atomic types.

Best we can do is when Kokkos is configured with ONLY the Kokkos::Serial back-end that the atomic implementations decay to non-atomic operations. This is still high-risk because an application may be using pthreads or OpenMP unknown to Kokkos, then use Kokkos atomics, then get silent data corruption, and then blame Kokkos.

Would be best to have a Kokkos configuration option that says each program process is guaranteed to be internally serial (a.k.a., MPI only for parallelism) so atomic operations should decay to non-atomic operations. Perhaps, KOKKOS_USING_SERIAL_PROCESS
Naming must conform to how issue #150 is resolved.

crtrott · 2016-11-22T18:36:59Z

Yeah this is what I was thinking as well. And I agree with Carter that the default should be safe. So the option is to turn of atomics if and only if Serial is the only configured backend.

nmhamster · 2016-11-22T18:39:42Z

Can we make that the default if only serial is configured so that all the MPI only codes running at the lab (which is quite a few right now) benefit?

crtrott · 2016-11-22T19:56:28Z

So you think the scenario Carter brought up and I talked about earlier is unlikely enough to make it worth the risk?

mhoemmen · 2016-11-22T20:38:36Z

Internal apps might very well enable OpenMP (the "one configuration to rule them all" approach), so in practice, it might be harder than this to turn off atomic updates.

nmhamster · 2016-11-23T14:55:01Z

In general I think we should try to remove the overheads for MPI only configurations as much as possible. We are paying thousands of cycles which are not needed at all in the common (MPI only) case. My gut feeling is to say that if you enable Serial we make no guarantees unless you provide an override to force us to compile thread aware operations.

mhoemmen · 2016-11-23T21:00:35Z

@nmhamster wrote:

In general I think we should try to remove the overheads for MPI only configurations as much as possible.

I agree :-) The issue is that the internal apps that want MPI only to perform well, may actually turn on OpenMP or Pthreads in Kokkos, even though they may not be using it. I'm not sure if we need to worry about that case, though. If they bother to turn on OpenMP, it's their fault if they don't use it. I don't think Kokkos can reasonably figure out globally, at configuration or compile time, whether or not users actually use an enabled execution space.

ibaned · 2016-11-25T19:11:12Z

Is there any way to disable atomics currently ? If not it would be nice to have the option, though I don't know about defaults. In my case, I get Valgrind false positives from Kokkos atomic assign operations in Tpetra in serial runs (it compares an uninitialized value to itself). Being able to compile without atomics would make debugging easier.

mhoemmen · 2016-11-25T22:29:58Z

@ibaned wrote:

In my case, I get Valgrind false positives from Kokkos atomic assign operations in Tpetra in serial runs (it compares an uninitialized value to itself).

If Kokkos::Serial is Tpetra's execution space, Tpetra does not use atomic_assign by default. Thus, these warnings are truly spurious. If Kokkos::OpenMP is Tpetra's execution space, but OpenMP is only using 1 thread, then Tpetra does use atomic_assign by default.

hcedwar · 2017-01-11T19:25:39Z

One avenue for pursuing issue #607

stanmoore1 · 2017-10-09T23:00:53Z

Right now in LAMMPS and SPARTA we check if there is only a single thread and no GPUs, and if so we template out all the Kokkos atomics manually, which is a huge pain, but has to be done for performance. I wish Kokkos provided a more convenient way to do this and atomics were automatically disabled for the Serial backend.

ibaned · 2017-10-09T23:04:42Z

Moving this out of Backlog and onto an upcoming deadline will put in on our radar and get it considered again. October release is happening now, so its on the December list.

ibaned · 2017-10-11T18:50:51Z

So, the VPIC code would also like this feature.

mhoemmen · 2017-10-11T21:15:52Z

It still confuses me why atomic operations are not a function on / of the execution space, but whatever ;-P.

ibaned · 2017-10-11T21:19:54Z

why atomic operations are not a function on / of the execution space

Since the functions don't take any parameters indicating the execution space, I think they don't have a fast way to determine what space they're in which would prevent this. I suppose if they were templated on the execution space this would be trivial to implement.

mhoemmen · 2017-10-12T04:10:37Z

@ibaned There's also no way to implement an MPI 3 RDMA back-end without atomics being a function / templated on the execution space. Even if the standard doesn't go that way, I wish Kokkos would. The two least idiomatic parts of Kokkos for me are the KOKKOS_*FUNCTION macros and atomics, precisely because they have no customization hooks.

C++ right now doesn't have a way to customize function attributes (e.g., to make them a function of a template parameter), so we're stuck with the macros. However, we could make atomics take an execution or memory space instance argument (that would cover the MPI RDMA case, among others).

ibaned · 2017-10-12T12:48:43Z

C++ right now doesn't have a way to customize function attributes (e.g., to make them a function of a template parameter), so we're stuck with the macros

Yep, this is unfortunate indeed.

However, we could make atomics take an execution or memory space instance argument

I see no issue with doing the following: change atomic functions to take execution space in some way, and create backwards compatibility wrappers that call DefaultExecutionSpace variants.

mhoemmen · 2017-10-12T22:39:52Z

@ibaned I would be happy to add the atomics overloads myself, if OK with the Kokkos folks.

ibaned · 2017-10-12T23:01:04Z

@mhoemmen I'd be wary of spending much time on it, esp. if @hcedwar or @crtrott want a different design.

[kokkos#607] [kokkos#549] A few details: - Accepting volatile pointers was necessary for compatibility with existing calls which pass in volatile pointers, hence the const_cast - Special implementations of atomic_increment were needed to get equal performance in the one application I tested (it was doing its own serial special cases before). - Compilers have a harder time matching templates as opposed to overloads, so some call sites had to be modified to specify the scalar type explicitly

ibaned · 2018-02-20T22:40:54Z

I've just implemented this, will open a PR shortly.

nmhamster added Enhancement Improve existing capability; will potentially require voting Question For Kokkos internal and external contributors and users labels Nov 21, 2016

ibaned mentioned this issue Nov 25, 2016

Tpetra::MultiVector pack & unpack uses atomic ops even for Kokkos::Serial trilinos/Trilinos#883

Closed

hcedwar added this to the Backlog milestone Feb 15, 2017

ibaned modified the milestones: Backlog, 2017 December Oct 9, 2017

hcedwar modified the milestones: 2017 December, 2018 February Nov 1, 2017

hcedwar assigned dsunder Nov 1, 2017

hcedwar modified the milestones: 2017 December, 2018 February Nov 29, 2017

ibaned assigned ibaned and unassigned dsunder Feb 7, 2018

ibaned mentioned this issue Feb 20, 2018

Serial-only atomics implementation #1426

Merged

crtrott added InDevelop and removed Question For Kokkos internal and external contributors and users labels Feb 24, 2018

ndellingwood closed this as completed Mar 7, 2018

ndellingwood mentioned this issue Mar 8, 2018

Kokkos + KokkosKernels Promotion To Version 2.6.00 trilinos/Trilinos#2351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomics with Serial Backend - Default should be Disable? #549

Atomics with Serial Backend - Default should be Disable? #549

nmhamster commented Nov 21, 2016

hcedwar commented Nov 21, 2016

nmhamster commented Nov 21, 2016

mhoemmen commented Nov 21, 2016

crtrott commented Nov 21, 2016

mhoemmen commented Nov 22, 2016

hcedwar commented Nov 22, 2016

crtrott commented Nov 22, 2016

nmhamster commented Nov 22, 2016

crtrott commented Nov 22, 2016

mhoemmen commented Nov 22, 2016

nmhamster commented Nov 23, 2016

mhoemmen commented Nov 23, 2016

ibaned commented Nov 25, 2016

mhoemmen commented Nov 25, 2016

hcedwar commented Jan 11, 2017

stanmoore1 commented Oct 9, 2017 •

edited

ibaned commented Oct 9, 2017

ibaned commented Oct 11, 2017

mhoemmen commented Oct 11, 2017

ibaned commented Oct 11, 2017

mhoemmen commented Oct 12, 2017

ibaned commented Oct 12, 2017

mhoemmen commented Oct 12, 2017

ibaned commented Oct 12, 2017 •

edited

ibaned commented Feb 20, 2018

Atomics with Serial Backend - Default should be Disable? #549

Atomics with Serial Backend - Default should be Disable? #549

Comments

nmhamster commented Nov 21, 2016

hcedwar commented Nov 21, 2016

nmhamster commented Nov 21, 2016

mhoemmen commented Nov 21, 2016

crtrott commented Nov 21, 2016

mhoemmen commented Nov 22, 2016

hcedwar commented Nov 22, 2016

crtrott commented Nov 22, 2016

nmhamster commented Nov 22, 2016

crtrott commented Nov 22, 2016

mhoemmen commented Nov 22, 2016

nmhamster commented Nov 23, 2016

mhoemmen commented Nov 23, 2016

ibaned commented Nov 25, 2016

mhoemmen commented Nov 25, 2016

hcedwar commented Jan 11, 2017

stanmoore1 commented Oct 9, 2017 • edited

ibaned commented Oct 9, 2017

ibaned commented Oct 11, 2017

mhoemmen commented Oct 11, 2017

ibaned commented Oct 11, 2017

mhoemmen commented Oct 12, 2017

ibaned commented Oct 12, 2017

mhoemmen commented Oct 12, 2017

ibaned commented Oct 12, 2017 • edited

ibaned commented Feb 20, 2018

stanmoore1 commented Oct 9, 2017 •

edited

ibaned commented Oct 12, 2017 •

edited