New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testhound/host array performance warning #7064
Testhound/host array performance warning #7064
Conversation
@testhound thank you for submitting this, I have added it to the queue for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the PR!
After a quick first look, I note that the warning can be triggered by copying arrays (or anything that an array can be created from) to the device:
from numba import cuda
cuda.to_device((1, 2, 3))
produces the output:
/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py:789: NumbaPerformanceWarning: Host array used in CUDA kernel will incur copy overhead to/from device.
warn(NumbaPerformanceWarning(msg))
I think this is undesirable - can the implementation be changed so that explicit copies to/from the device don't raise a warning?
It can also be triggered by setting an item of a device array:
from numba import cuda
arr = cuda.device_array(10)
arr[0] = 2
I'm not as sure as for the first case, but I think this is also undesirable, on the basis that it is more likely to be apparent to the user that setting an item of an array from host code requires a transfer / synchronization. I also think this is a case that occurs far more rarely in practice - new users are often caught out by the performance implications of launching a kernel on a host array, but I have never seen a question involving a performance issue introduced by setting an item of a device array.
A couple of other small points:
-
Tests pass locally for me with hardware and the simulator.
-
test_multithreads
triggers the warning when running the testsuite, because it launches a kernel on host arrays - it should probably be modified to not cause the warning. Presently editing it like:diff --git a/numba/cuda/tests/cudapy/test_multithreads.py b/numba/cuda/tests/cudapy/test_multithreads.py index ea7fa7eee..30afd3eb0 100644 --- a/numba/cuda/tests/cudapy/test_multithreads.py +++ b/numba/cuda/tests/cudapy/test_multithreads.py @@ -27,7 +27,7 @@ def check_concurrent_compiling(): foo[1, 1](x) return x - arrays = [np.arange(10) for i in range(10)] + arrays = [cuda.to_device(np.arange(10)) for i in range(10)] expected = np.arange(10) expected[0] += 1 with ThreadPoolExecutor(max_workers=4) as e:
still emits the warning because the copy triggers it - however, this should go away once the trigger on copying is fixed.
… want to implcitly copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the changes! Tests pass locally for me with the hardware and simulator, and warnings are no longer emitted from test_multithreads
There are a couple of suggested changes on the diff - the test setup seems not to be saving the correct config variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the updates - this is looking good, and tests are passing locally for me with hardware and the simulator.
With some manual experimentation I wasn't able to force a warning to appear that wouldn't be expected, or to initiate an implicit host / device transfer for an array without a warning. It still seems to be possible to force a transfer with a host record as a parameter, but as with setting an item, I don't believe that's a common pitfall so I don't think it needs to be addressed - if we do choose to add a mode that makes implicit transfers an error in future, we may want to include that case then.
There are a couple of small suggestions:
- One or two minor edits to comments / formatting.
- The test using managed memory needs to be skipped on Windows, because managed memory is not properly supported by Numba on Windows - sometimes it causes crashes which we're yet to get to the bottom of. There are a couple of suggestions on the diff that will skip this test on Windows.
RE test failure, I don't think it can be associated with this patch, the code paths involved are entirely separate. Issue #7136 tracks. Please just ignore the failing test for now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the updates! This still tests out OK for me locally with the hardware and simulator.
@esc / @stuartarchibald could this have a CUDA buildfarm run please?
Buildfarm ID: |
Passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @testhound . Thanks for reviewing @gmarkall .
This pull request addresses: #6904 (Produce a NumbaPerformanceWarning whenever a kernel is called on a host array).
The PR adds a new environment variable NUMBA_CUDA_WARN_ON_HOST_MEMORY_LAUNCH that determines whether a warning will be emitted.