Use primary context in CUDA GPU code. #3432

benmenadue · 2022-09-05T23:46:48Z

Summary

This patch changes the CUDA context creation inside LAMMPS to instead use
the "primary" context for the requested device; this then makes LAMMPS
compatible with other libraries that also use CUDA via its runtime API.

It's not a complete fix, in that other libraries that use their own context
via the driver API could still break LAMMPS (e.g. if they switch to their
own context after LAMMPS has initialised its and don't switch it back before
returning), but it at least covers packages that use the CUDA runtime
library.

Unfortunately, to make all driver calls in LAMMPS "context aware" might
require more significant changes to the interfaces as the current context
may not be available at the calls (for example, the _device_alloc functions
in nvd_memory.h that don't take a UCL_Device argument).

Related Issue(s)

Fixes: #3431

Author(s)

Ben Menadue, National Computational Infrastructure, Australian National University

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

These changes should not have any impact on backwards compatibility.

Implementation Notes

As recommended in the CUDA documentation, switches it to use cuDevicePrimaryCtxRetain and cuDevicePrimaryCtxRelease instead of cuCtxCreate and cuCtxDestroy.

Post Submission Checklist

Since LAMMPS uses the low-level driver API of CUDA, it needs to ensure that it is in the correct context when invoking such functions. At the moment it creates and switches to its own context inside `UCL_Device::set` but then assumes that the driver is still in that context for subsequent calls into CUDA; if another part of the program uses a different context (such as the CUDA runtime using the "primary" context) this will cause failures inside LAMMPS. This patch changes the context creation to instead use the primary context for the requested device. While it's not perfect, in that it still doesn't ensure that it's in the correct context before making driver API calls, it at least allows it to work with libraries that use the runtime API.

wmbrownIntel

@benmenadue I think it is OK to make this the default behavior, but please add preprocessor option for original behavior (e.g. #ifndef GERYON_UNIQUE_CONTEXT). Thanks!

… variant

akohlmey · 2022-09-09T19:14:48Z

@benmenadue I think it is OK to make this the default behavior, but please add preprocessor option for original behavior (e.g. #ifndef GERYON_UNIQUE_CONTEXT). Thanks!

@wmbrownIntel implemented your request

rbberger requested a review from ndtrung81 September 6, 2022 15:03

akohlmey self-assigned this Sep 6, 2022

akohlmey added gpu_package gpu_unit_tests Enable to trigger GPU unit tests labels Sep 6, 2022

akohlmey added this to the Stable Release Spring 2023 milestone Sep 6, 2022

Merge branch 'develop' into benmenadue/develop

0d2db98

ndtrung81 requested a review from wmbrownIntel September 7, 2022 13:42

akohlmey removed the gpu_unit_tests Enable to trigger GPU unit tests label Sep 7, 2022

wmbrownIntel reviewed Sep 8, 2022

View reviewed changes

add preprocessor flags to select between the changed and the old code…

167abe9

… variant

akohlmey added the gpu_unit_tests Enable to trigger GPU unit tests label Sep 9, 2022

akohlmey requested review from wmbrownIntel and sjplimp September 9, 2022 19:14

akohlmey removed the gpu_unit_tests Enable to trigger GPU unit tests label Sep 9, 2022

akohlmey self-requested a review September 9, 2022 19:59

akohlmey approved these changes Sep 9, 2022

View reviewed changes

wmbrownIntel approved these changes Sep 9, 2022

View reviewed changes

akohlmey merged commit 1364033 into lammps:develop Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use primary context in CUDA GPU code. #3432

Use primary context in CUDA GPU code. #3432

benmenadue commented Sep 5, 2022 •

edited

wmbrownIntel left a comment

akohlmey commented Sep 9, 2022

Use primary context in CUDA GPU code. #3432

Use primary context in CUDA GPU code. #3432

Conversation

benmenadue commented Sep 5, 2022 • edited

wmbrownIntel left a comment

Choose a reason for hiding this comment

akohlmey commented Sep 9, 2022

benmenadue commented Sep 5, 2022 •

edited