Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query hardware acceleration for accumulates #6

Open
jdinan opened this issue Sep 12, 2018 · 1 comment
Open

Query hardware acceleration for accumulates #6

jdinan opened this issue Sep 12, 2018 · 1 comment
Milestone

Comments

@jdinan
Copy link

jdinan commented Sep 12, 2018

Issue

One some platforms, a subset of accumulate operations (function, op, datatype) are fast (offloaded) and the rest are slow (active messages).

Proposed Solution

Use the MPI tools interface to query whether a particular <op, datatype> combination is fast or slow.

@jeffhammond
Copy link
Member

Aries can do single-precision floating-point atomics but one element at a time, whereas the host CPU can do as many as 32 ops at a time using non-atomic SIMD instructions. Given a fixed overhead to invoke a callback on the CPU, which one is faster depends on the message size. This needs to be taken into account when designing the interface.

@devreal devreal added this to the MPI-5 milestone Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants