Cannot rely on scoped_current_device_fallback_t #316

eyalroz · 2022-04-13T22:19:51Z

(This was exposed while looking into #313)

In several implementations of API wrapper functions which don't take a context handle, we use the context::current::detail_::scoped_current_device_fallback_t class to make sure we have some, any, current context when performing some operation. Example: cuda::memory::device::typed_set<T>().

Unfortunately - CUDA is crueller than we thought. In some, or all, of these cases it actually requires the context in which the relevant handles or addresses were created/allocated. Like with our example. That means we have to somehow pass the relevant context (and perhaps device) handle into those functions - as parameters or via wrapper objects.

I am worried we might need to burden the memory region class with a context handle :-( ... and we may even want to hide some of the memory API which take raw pointers, or pointers + length only - since these will become quite unwieldy if they always need to take a context wrapper. Will we need to create a context memory member? Anyway, that looks like it might be a rather big change.

The functions using this class are currently:

cuda::memory::host::allocate()
cuda::memory::copy() <- this is the doozie... lots of functions depend on this one. We may also need to split this one into a same-context and different-contexts variants.
cuda::memory::pointer::detail_::get_attribute()
cuda::memory::pointer::detail_::get_attributes()

I hope there aren't any more.

The text was updated successfully, but these errors were encountered:

eyalroz · 2022-04-15T09:45:05Z

It turns out that the point is not passing the correct context. Rather, it's a combination of two requirements:

Some, any, context must be current
The context in which the allocation was made must not have been destroyed before the allocation was used - even if, supposedly, the allocation is not "context-specific" (e.g. pinned host memory, managed memory).

and this affects managed memory copies / set-ings as well.

…ext, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.

…t context, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.

eyalroz added bug task labels Apr 13, 2022

eyalroz self-assigned this Apr 13, 2022

eyalroz mentioned this issue Apr 13, 2022

examples fail on kepler GPU #313

Closed

eyalroz added the resolved-on-development label Apr 15, 2022

eyalroz mentioned this issue Apr 15, 2022

Leaking a current context and a primary context reference unit in scoped_current_device_fallback_t #317

Closed

eyalroz closed this as completed in bb59a97 May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot rely on scoped_current_device_fallback_t #316

Cannot rely on scoped_current_device_fallback_t #316

eyalroz commented Apr 13, 2022 •

edited

eyalroz commented Apr 15, 2022

Cannot rely on scoped_current_device_fallback_t #316

Cannot rely on scoped_current_device_fallback_t #316

Comments

eyalroz commented Apr 13, 2022 • edited

eyalroz commented Apr 15, 2022

eyalroz commented Apr 13, 2022 •

edited