Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a common library of GPU enabled containers and algorithms #1539

Open
MrBurmark opened this issue Aug 22, 2023 · 2 comments
Open

Make a common library of GPU enabled containers and algorithms #1539

MrBurmark opened this issue Aug 22, 2023 · 2 comments

Comments

@MrBurmark
Copy link
Member

MrBurmark commented Aug 22, 2023

Make a new library above RAJA with common views, containers without dynamic storage, and sequential algorithms that work on GPUs, a lot like this https://github.com/nvidia/libcudacxx.
These are mainly things that exist in the std library like std::array but that we can't use in device code because they are not marked host device. Another way to think of these are things that don't take an exec policy like seq_exec/cuda_exec.
The places that make sense to add these things are camp https://github.com/LLNL/camp or DESUL https://github.com/desul/desul.

Things to add to this library.

  1. Stuff from the std library
    a. array
    b. vector?
    c. span
    d. mdspan
    e. sort
    f. scan
    g. binary search
    h. math functions (abs, min, max, sqrt, ...)
    i.
  2. Error handling from Brandon
  3. Stuff from the cuda std library (https://nvidia.github.io/libcudacxx/)
  4. ...

Other things to think about.

  1. Try to put host device requirements into the type system.
    a. Consider having host, host device, and device versions of stuff.
    b. This could allow some seq/par requirements to be checked at compile-time in a GPU build to some extent.
@MrBurmark
Copy link
Member Author

@trws Here's an idea to potentially reduce code duplication across projects by expanding camp to have more containers/views and algorithms that are commonly used in device code.

@adayton1
Copy link
Member

adayton1 commented Aug 30, 2023

These are the things we are currently use or would use. We have implementations of almost all of these in CARE.

Containers: (If needed, these could be views except for array)

  • Something like std::array
  • Something like std::vector (we don't use push_back)
  • Something like std::map or std::flat_map that gives binary search capability
  • Something like std::set or std::flat_set that gives binary search capability

Algorithms that act on scalars:

  • abs (I know there is fabs, but I would like a templated version that always does the right thing)
  • max/min (again there are fmin and fmax, but a templated version would be great, especially an initializer list and/or variadic template version to handle any number of scalars) - we definitely want something to get rid of the MAX and MIN macros that many codes have defined.
  • swap
  • copysign
  • Templated versions of a lot of math functions would be nice, but perhaps that is out of the scope or belongs somewhere else (maybe all of these except swap belong somewhere else)

Algorithms that act on arrays (note that these are at the level of a single thread, not launching kernels, so "sequential" I guess):

  • min/max/minmax
  • find/search
  • copy
  • is_sorted
  • binary_search
  • lower_bound
  • upper_bound
  • sort
  • unique

Algorithms that act on arrays and do launch kernels:

  • fill
  • copy
  • min/max/minmax
  • find/search
  • count
  • iota
  • accumulate
  • inclusive/exclusive_scan
  • is_sorted
  • sort
  • unique
  • compress (I'm not sure if there is a std algorithm equivalent, but basically given an array and a list of indices, makes a new array with only the selected indices or everything but the given indices)
  • sort_pairs

There's probably other algorithms that I'm missing, but this is a pretty core set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants