You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context managers cannot be used to repeatedly pin/map an existing array: the call to cuMemHostUnregister is delayed by the same mechanism as device memory deallocation, hence in many cases the memory will still be pinned on subsequent context manager invocations, raising CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED.
Are there good reasons for routing finalizers that wrap cuMemHostUnregister through the deallocation queue instead of calling them immediately?
And what about cuMemHostFree, i.e., deallocation of memory that was allocated with cuda.{pinned,mapped}_array? Seems odd that a chunk of host memory has to wait in line together with objects in device memory to be freed, however I don't think I fully appreciate the implications of these events on asynchronous execution and system freezing in the case of corrupt contexts.
The text was updated successfully, but these errors were encountered:
Thanks for the report, I can reproduce. I think your assessment of the problem is correct and evidently some thought needs to go into a suitable fix. Thanks for pointing out test cases and providing initial thoughts, v. useful.
I think you are right that at exit of with cuda.pinned(arr): the arr should have been unpinned without delay. The same goes for mapped.
The delayed device deallocation is needed to avoid breaking asynchronous execution because device arrays can be created automatically and go out-of-scope in odd places. As for pinned and mapped, there are explicitly created by users so it won't have the same problem of going out-of-scope unknowingly. So, I don't think they need to have delayed cleanup.
the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).
Context managers cannot be used to repeatedly pin/map an existing array: the call to
cuMemHostUnregister
is delayed by the same mechanism as device memory deallocation, hence in many cases the memory will still be pinned on subsequent context manager invocations, raisingCUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED
.This fails:
This works:
Are there good reasons for routing finalizers that wrap
cuMemHostUnregister
through the deallocation queue instead of calling them immediately?And what about
cuMemHostFree
, i.e., deallocation of memory that was allocated withcuda.{pinned,mapped}_array
? Seems odd that a chunk of host memory has to wait in line together with objects in device memory to be freed, however I don't think I fully appreciate the implications of these events on asynchronous execution and system freezing in the case of corrupt contexts.The text was updated successfully, but these errors were encountered: