New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the NRT use the "unsafe" allocation API by default. #8200
Make the NRT use the "unsafe" allocation API by default. #8200
Conversation
This patch adds to the NRT C API to create equivalent "safe" and "unsafe" functions for allocation associated with meminfo. The NRT context is updated to only use the "safe" functions if `NRT_DEBUG` is enabled. This is to increase performance by only calling `memset(3)` to generate debug markers at allocation time when in `NRT_DEBUG` mode. The "aligned" allocators are updated to calculate alignment offsets avoiding modulo if the alignment is specified as a power of 2. Fixes numba#7887
Locally, using this: import numba as nb
import numpy as np
@nb.njit('void()')
def test():
for i in range(1000):
np.empty(2)
n = 20000
for x in range(n):
test() On |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good in general. A couple of points from OOB discussion:
- At present the safe and unsafe functions are duplicates only differing in the use of
memset
. One possible change is to have only one variant of the functions where thememset()
call is wrapped inNRT_Debug()
. However this would require a rebuild of NRT to use the "safe" versions, whereas the expected use case is to ask a user to use the safe version, which is much easier when the change can be made through a config flag. - Now that the performance of the memset call is not an issue, it would be better to memset the entire allocation, so that unintended uses of the values can be detected for the entire allocation. From OOB discussion I understand you will be making this change.
- I haven't yet got round to measuring performance myself - I'll do that shortly.
- I imagine that in most cases the returned pointer will be aligned, so I think it's reasonable to attempt to avoid the modulo - however, I would like to know if this makes any performance difference.
@@ -220,7 +220,7 @@ def optional_str(x): | |||
# (up to and including IR generation) | |||
DEBUG_FRONTEND = _readenv("NUMBA_DEBUG_FRONTEND", int, 0) | |||
|
|||
# Enable debug prints in nrtdynmod | |||
# Enable debug prints in nrtdynmod and use of "safe" API functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest updating the documentation for NUMBA_DEBUG_NRT
but it appears not to be documented at all. Is it worth adding a note on this to the documentation (in the list of environment variables) and/or in the Debugging Leaks section?
WRT benchmarking, I modified the example slightly to use import numba as nb
import numpy as np
@nb.njit('void()')
def test():
for i in range(1000):
np.empty(2)
def call_test():
n = 20000
for x in range(n):
test()
%timeit call_test() I get:
before and after, which seems to be a reduction in execution time of about 24%. |
I also tried undoing the alignment optimization - for the above benchmark it makes about 1% difference in execution time on an i7-6700K, so I think it's definitely worth keeping considering other machines (small ARM SBCs, etc) will probably have a much worse modulo operation. |
* `memset` full regions with marker bytes when in "safe" mode. * Add docs for env var `NUMBA_DEBUG_NRT`.
Thanks for confirming the benchmark results @gmarkall. Thanks also for reviewing, I've responded to #8200 (review) in be17e75 by |
Docs build is failing, am going to have to merge |
This is to pick up changes in the doc build config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. This now looks good!
This patch adds to the NRT C API to create equivalent "safe" and
"unsafe" functions for allocation associated with meminfo. The
NRT context is updated to only use the "safe" functions if
NRT_DEBUG
is enabled. This is to increase performance by onlycalling
memset(3)
to generate debug markers at allocation timewhen in
NRT_DEBUG
mode. The "aligned" allocators are updated tocalculate alignment offsets avoiding modulo if the alignment is
specified as a power of 2.
Fixes #7887