Make the NRT use the "unsafe" allocation API by default. #8200

stuartarchibald · 2022-06-28T10:30:36Z

This patch adds to the NRT C API to create equivalent "safe" and
"unsafe" functions for allocation associated with meminfo. The
NRT context is updated to only use the "safe" functions if
NRT_DEBUG is enabled. This is to increase performance by only
calling memset(3) to generate debug markers at allocation time
when in NRT_DEBUG mode. The "aligned" allocators are updated to
calculate alignment offsets avoiding modulo if the alignment is
specified as a power of 2.

Fixes #7887

This patch adds to the NRT C API to create equivalent "safe" and "unsafe" functions for allocation associated with meminfo. The NRT context is updated to only use the "safe" functions if `NRT_DEBUG` is enabled. This is to increase performance by only calling `memset(3)` to generate debug markers at allocation time when in `NRT_DEBUG` mode. The "aligned" allocators are updated to calculate alignment offsets avoiding modulo if the alignment is specified as a power of 2. Fixes numba#7887

stuartarchibald · 2022-06-28T10:33:40Z

Locally, using this:

import numba as nb
import numpy as np

@nb.njit('void()')
def test():
    for i in range(1000):
        np.empty(2)

n = 20000
for x in range(n):
    test()

On main, fastest time in 10 runs was 4.17s.
On this branch, fastest time in 10 runs was 3.67s, i.e. >10% quicker.

gmarkall

I think this looks good in general. A couple of points from OOB discussion:

At present the safe and unsafe functions are duplicates only differing in the use of memset. One possible change is to have only one variant of the functions where the memset() call is wrapped in NRT_Debug(). However this would require a rebuild of NRT to use the "safe" versions, whereas the expected use case is to ask a user to use the safe version, which is much easier when the change can be made through a config flag.
Now that the performance of the memset call is not an issue, it would be better to memset the entire allocation, so that unintended uses of the values can be detected for the entire allocation. From OOB discussion I understand you will be making this change.
I haven't yet got round to measuring performance myself - I'll do that shortly.
I imagine that in most cases the returned pointer will be aligned, so I think it's reasonable to attempt to avoid the modulo - however, I would like to know if this makes any performance difference.

gmarkall · 2022-07-05T10:30:10Z

numba/core/config.py

@@ -220,7 +220,7 @@ def optional_str(x):
        # (up to and including IR generation)
        DEBUG_FRONTEND = _readenv("NUMBA_DEBUG_FRONTEND", int, 0)

-        # Enable debug prints in nrtdynmod
+        # Enable debug prints in nrtdynmod and use of "safe" API functions


I was going to suggest updating the documentation for NUMBA_DEBUG_NRT but it appears not to be documented at all. Is it worth adding a note on this to the documentation (in the list of environment variables) and/or in the Debugging Leaks section?

gmarkall · 2022-07-05T10:54:19Z

WRT benchmarking, I modified the example slightly to use timeit:

import numba as nb
import numpy as np

@nb.njit('void()')
def test():
    for i in range(1000):
        np.empty(2)


def call_test():
    n = 20000
    for x in range(n):
        test()

%timeit call_test()

I get:

1.1 s ± 5.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
833 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

before and after, which seems to be a reduction in execution time of about 24%.

gmarkall · 2022-07-05T11:00:02Z

I also tried undoing the alignment optimization - for the above benchmark it makes about 1% difference in execution time on an i7-6700K, so I think it's definitely worth keeping considering other machines (small ARM SBCs, etc) will probably have a much worse modulo operation.

* `memset` full regions with marker bytes when in "safe" mode. * Add docs for env var `NUMBA_DEBUG_NRT`.

stuartarchibald · 2022-07-05T11:32:16Z

Thanks for confirming the benchmark results @gmarkall. Thanks also for reviewing, I've responded to #8200 (review) in be17e75 by memseting the entire allocated/deallocated region with marker bytes and documenting the env var NUMBA_DEBUG_NRT.

stuartarchibald · 2022-07-05T11:35:01Z

Docs build is failing, am going to have to merge main.

This is to pick up changes in the doc build config.

gmarkall

Thanks for the updates. This now looks good!

stuartarchibald added the 2 - In Progress label Jun 28, 2022

stuartarchibald mentioned this pull request Jun 28, 2022

Bad multithreaded performance of allocations #8101

Closed

stuartarchibald added 3 - Ready for Review and removed 2 - In Progress labels Jun 28, 2022

stuartarchibald marked this pull request as ready for review June 28, 2022 12:20

stuartarchibald requested review from sklam and esc as code owners June 28, 2022 12:20

stuartarchibald added the Effort - medium Medium size effort needed label Jun 28, 2022

gmarkall self-assigned this Jun 28, 2022

gmarkall added this to the Numba 0.57 RC milestone Jun 28, 2022

gmarkall requested changes Jul 5, 2022

View reviewed changes

Respond to review comments:

be17e75

* `memset` full regions with marker bytes when in "safe" mode. * Add docs for env var `NUMBA_DEBUG_NRT`.

Merge 'main' into wip/make_nrt_unsafe_api_default

1da2112

This is to pick up changes in the doc build config.

stuartarchibald added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review labels Jul 5, 2022

gmarkall approved these changes Jul 5, 2022

View reviewed changes

gmarkall added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jul 5, 2022

sklam merged commit 216436c into numba:main Jul 6, 2022

stuartarchibald mentioned this pull request Jul 8, 2022

Make NRT stats counters optional, off by default #8156

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the NRT use the "unsafe" allocation API by default. #8200

Make the NRT use the "unsafe" allocation API by default. #8200

stuartarchibald commented Jun 28, 2022

stuartarchibald commented Jun 28, 2022

gmarkall left a comment

gmarkall Jul 5, 2022

gmarkall commented Jul 5, 2022

gmarkall commented Jul 5, 2022

stuartarchibald commented Jul 5, 2022

stuartarchibald commented Jul 5, 2022

gmarkall left a comment

Make the NRT use the "unsafe" allocation API by default. #8200

Make the NRT use the "unsafe" allocation API by default. #8200

Conversation

stuartarchibald commented Jun 28, 2022

stuartarchibald commented Jun 28, 2022

gmarkall left a comment

Choose a reason for hiding this comment

gmarkall Jul 5, 2022

Choose a reason for hiding this comment

gmarkall commented Jul 5, 2022

gmarkall commented Jul 5, 2022

stuartarchibald commented Jul 5, 2022

stuartarchibald commented Jul 5, 2022

gmarkall left a comment

Choose a reason for hiding this comment