Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testhound/host array performance warning #7064

Merged

Conversation

testhound
Copy link
Contributor

This pull request addresses: #6904 (Produce a NumbaPerformanceWarning whenever a kernel is called on a host array).

The PR adds a new environment variable NUMBA_CUDA_WARN_ON_HOST_MEMORY_LAUNCH that determines whether a warning will be emitted.

@esc esc added CUDA CUDA related issue/PR 3 - Ready for Review labels May 27, 2021
@esc esc added this to the Numba 0.54 RC milestone May 27, 2021
@esc
Copy link
Member

esc commented May 27, 2021

@testhound thank you for submitting this, I have added it to the queue for review.

Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the PR!

After a quick first look, I note that the warning can be triggered by copying arrays (or anything that an array can be created from) to the device:

from numba import cuda
cuda.to_device((1, 2, 3))

produces the output:

/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devicearray.py:789: NumbaPerformanceWarning: Host array used in CUDA kernel will incur copy overhead to/from device.
  warn(NumbaPerformanceWarning(msg))

I think this is undesirable - can the implementation be changed so that explicit copies to/from the device don't raise a warning?

It can also be triggered by setting an item of a device array:

from numba import cuda
arr = cuda.device_array(10)
arr[0] = 2

I'm not as sure as for the first case, but I think this is also undesirable, on the basis that it is more likely to be apparent to the user that setting an item of an array from host code requires a transfer / synchronization. I also think this is a case that occurs far more rarely in practice - new users are often caught out by the performance implications of launching a kernel on a host array, but I have never seen a question involving a performance issue introduced by setting an item of a device array.

A couple of other small points:

  • Tests pass locally for me with hardware and the simulator.

  • test_multithreads triggers the warning when running the testsuite, because it launches a kernel on host arrays - it should probably be modified to not cause the warning. Presently editing it like:

    diff --git a/numba/cuda/tests/cudapy/test_multithreads.py b/numba/cuda/tests/cudapy/test_multithreads.py
    index ea7fa7eee..30afd3eb0 100644
    --- a/numba/cuda/tests/cudapy/test_multithreads.py
    +++ b/numba/cuda/tests/cudapy/test_multithreads.py
    @@ -27,7 +27,7 @@ def check_concurrent_compiling():
             foo[1, 1](x)
             return x
     
    -    arrays = [np.arange(10) for i in range(10)]
    +    arrays = [cuda.to_device(np.arange(10)) for i in range(10)]
         expected = np.arange(10)
         expected[0] += 1
         with ThreadPoolExecutor(max_workers=4) as e:

    still emits the warning because the copy triggers it - however, this should go away once the trigger on copying is fixed.

docs/source/reference/envvars.rst Outdated Show resolved Hide resolved
@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels May 27, 2021
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels May 28, 2021
Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the changes! Tests pass locally for me with the hardware and simulator, and warnings are no longer emitted from test_multithreads

There are a couple of suggested changes on the diff - the test setup seems not to be saving the correct config variable.

numba/cuda/testing.py Outdated Show resolved Hide resolved
numba/cuda/testing.py Outdated Show resolved Hide resolved
@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 28, 2021
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Jun 1, 2021
Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the updates - this is looking good, and tests are passing locally for me with hardware and the simulator.

With some manual experimentation I wasn't able to force a warning to appear that wouldn't be expected, or to initiate an implicit host / device transfer for an array without a warning. It still seems to be possible to force a transfer with a host record as a parameter, but as with setting an item, I don't believe that's a common pitfall so I don't think it needs to be addressed - if we do choose to add a mode that makes implicit transfers an error in future, we may want to include that case then.

There are a couple of small suggestions:

  • One or two minor edits to comments / formatting.
  • The test using managed memory needs to be skipped on Windows, because managed memory is not properly supported by Numba on Windows - sometimes it causes crashes which we're yet to get to the bottom of. There are a couple of suggestions on the diff that will skip this test on Windows.

numba/core/config.py Outdated Show resolved Hide resolved
numba/cuda/testing.py Outdated Show resolved Hide resolved
numba/cuda/tests/cudapy/test_warning.py Show resolved Hide resolved
numba/cuda/tests/cudapy/test_warning.py Outdated Show resolved Hide resolved
@stuartarchibald
Copy link
Contributor

RE test failure, I don't think it can be associated with this patch, the code paths involved are entirely separate. Issue #7136 tracks. Please just ignore the failing test for now!

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jun 24, 2021
@gmarkall gmarkall removed the 4 - Waiting on author Waiting for author to respond to review label Jun 24, 2021
@gmarkall gmarkall added 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Jun 24, 2021
Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the updates! This still tests out OK for me locally with the hardware and simulator.

@esc / @stuartarchibald could this have a CUDA buildfarm run please?

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_78.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_yaml_78.

Passed.

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed and removed 4 - Waiting on CI Review etc done, waiting for CI to finish Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Jun 25, 2021
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @testhound . Thanks for reviewing @gmarkall .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants