Query hardware acceleration for accumulates #6

jdinan · 2018-09-12T12:20:51Z

Issue

One some platforms, a subset of accumulate operations (function, op, datatype) are fast (offloaded) and the rest are slow (active messages).

Proposed Solution

Use the MPI tools interface to query whether a particular <op, datatype> combination is fast or slow.

jeffhammond · 2018-09-12T12:45:17Z

Aries can do single-precision floating-point atomics but one element at a time, whereas the host CPU can do as many as 32 ops at a time using non-atomic SIMD instructions. Given a fixed overhead to invoke a callback on the CPU, which one is faster depends on the message size. This needs to be taken into account when designing the interface.

devreal added this to the MPI-5 milestone Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query hardware acceleration for accumulates #6

Query hardware acceleration for accumulates #6

jdinan commented Sep 12, 2018

jeffhammond commented Sep 12, 2018

Query hardware acceleration for accumulates #6

Query hardware acceleration for accumulates #6

Comments

jdinan commented Sep 12, 2018

Issue

Proposed Solution

jeffhammond commented Sep 12, 2018