Make a common library of GPU enabled containers and algorithms #1539

MrBurmark · 2023-08-22T17:09:18Z

Make a new library above RAJA with common views, containers without dynamic storage, and sequential algorithms that work on GPUs, a lot like this https://github.com/nvidia/libcudacxx.
These are mainly things that exist in the std library like std::array but that we can't use in device code because they are not marked host device. Another way to think of these are things that don't take an exec policy like seq_exec/cuda_exec.
The places that make sense to add these things are camp https://github.com/LLNL/camp or DESUL https://github.com/desul/desul.

Things to add to this library.

Stuff from the std library
a. array
b. vector?
c. span
d. mdspan
e. sort
f. scan
g. binary search
h. math functions (abs, min, max, sqrt, ...)
i.
Error handling from Brandon
Stuff from the cuda std library (https://nvidia.github.io/libcudacxx/)
...

Other things to think about.

Try to put host device requirements into the type system.
a. Consider having host, host device, and device versions of stuff.
b. This could allow some seq/par requirements to be checked at compile-time in a GPU build to some extent.

MrBurmark · 2023-08-30T23:09:51Z

@trws Here's an idea to potentially reduce code duplication across projects by expanding camp to have more containers/views and algorithms that are commonly used in device code.

adayton1 · 2023-08-30T23:21:11Z

These are the things we are currently use or would use. We have implementations of almost all of these in CARE.

Containers: (If needed, these could be views except for array)

Something like std::array
Something like std::vector (we don't use push_back)
Something like std::map or std::flat_map that gives binary search capability
Something like std::set or std::flat_set that gives binary search capability

Algorithms that act on scalars:

abs (I know there is fabs, but I would like a templated version that always does the right thing)
max/min (again there are fmin and fmax, but a templated version would be great, especially an initializer list and/or variadic template version to handle any number of scalars) - we definitely want something to get rid of the MAX and MIN macros that many codes have defined.
swap
copysign
Templated versions of a lot of math functions would be nice, but perhaps that is out of the scope or belongs somewhere else (maybe all of these except swap belong somewhere else)

Algorithms that act on arrays (note that these are at the level of a single thread, not launching kernels, so "sequential" I guess):

min/max/minmax
find/search
copy
is_sorted
binary_search
lower_bound
upper_bound
sort
unique

Algorithms that act on arrays and do launch kernels:

fill
copy
min/max/minmax
find/search
count
iota
accumulate
inclusive/exclusive_scan
is_sorted
sort
unique
compress (I'm not sure if there is a std algorithm equivalent, but basically given an array and a list of indices, makes a new array with only the selected indices or everything but the given indices)
sort_pairs

There's probably other algorithms that I'm missing, but this is a pretty core set.

rhornung67 added feature Enhancement labels Sep 5, 2023

rhornung67 added this to the FY24 Development milestone Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a common library of GPU enabled containers and algorithms #1539

Make a common library of GPU enabled containers and algorithms #1539

MrBurmark commented Aug 22, 2023 •

edited

Loading

MrBurmark commented Aug 30, 2023

adayton1 commented Aug 30, 2023 •

edited

Loading

Make a common library of GPU enabled containers and algorithms #1539

Make a common library of GPU enabled containers and algorithms #1539

Comments

MrBurmark commented Aug 22, 2023 • edited Loading

MrBurmark commented Aug 30, 2023

adayton1 commented Aug 30, 2023 • edited Loading

MrBurmark commented Aug 22, 2023 •

edited

Loading

adayton1 commented Aug 30, 2023 •

edited

Loading