Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the NRT use the "unsafe" allocation API by default. #8200

Merged
merged 3 commits into from Jul 6, 2022

Conversation

stuartarchibald
Copy link
Contributor

This patch adds to the NRT C API to create equivalent "safe" and
"unsafe" functions for allocation associated with meminfo. The
NRT context is updated to only use the "safe" functions if
NRT_DEBUG is enabled. This is to increase performance by only
calling memset(3) to generate debug markers at allocation time
when in NRT_DEBUG mode. The "aligned" allocators are updated to
calculate alignment offsets avoiding modulo if the alignment is
specified as a power of 2.

Fixes #7887

This patch adds to the NRT C API to create equivalent "safe" and
"unsafe" functions for allocation associated with meminfo. The
NRT context is updated to only use the "safe" functions if
`NRT_DEBUG` is enabled. This is to increase performance by only
calling `memset(3)` to generate debug markers at allocation time
when in `NRT_DEBUG` mode. The "aligned" allocators are updated to
calculate alignment offsets avoiding modulo if the alignment is
specified as a power of 2.

Fixes numba#7887
@stuartarchibald
Copy link
Contributor Author

Locally, using this:

import numba as nb
import numpy as np

@nb.njit('void()')
def test():
    for i in range(1000):
        np.empty(2)

n = 20000
for x in range(n):
    test()

On main, fastest time in 10 runs was 4.17s.
On this branch, fastest time in 10 runs was 3.67s, i.e. >10% quicker.

@stuartarchibald stuartarchibald marked this pull request as ready for review June 28, 2022 12:20
@stuartarchibald stuartarchibald added the Effort - medium Medium size effort needed label Jun 28, 2022
@gmarkall gmarkall self-assigned this Jun 28, 2022
@gmarkall gmarkall added this to the Numba 0.57 RC milestone Jun 28, 2022
Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good in general. A couple of points from OOB discussion:

  • At present the safe and unsafe functions are duplicates only differing in the use of memset. One possible change is to have only one variant of the functions where the memset() call is wrapped in NRT_Debug(). However this would require a rebuild of NRT to use the "safe" versions, whereas the expected use case is to ask a user to use the safe version, which is much easier when the change can be made through a config flag.
  • Now that the performance of the memset call is not an issue, it would be better to memset the entire allocation, so that unintended uses of the values can be detected for the entire allocation. From OOB discussion I understand you will be making this change.
  • I haven't yet got round to measuring performance myself - I'll do that shortly.
  • I imagine that in most cases the returned pointer will be aligned, so I think it's reasonable to attempt to avoid the modulo - however, I would like to know if this makes any performance difference.

@@ -220,7 +220,7 @@ def optional_str(x):
# (up to and including IR generation)
DEBUG_FRONTEND = _readenv("NUMBA_DEBUG_FRONTEND", int, 0)

# Enable debug prints in nrtdynmod
# Enable debug prints in nrtdynmod and use of "safe" API functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest updating the documentation for NUMBA_DEBUG_NRT but it appears not to be documented at all. Is it worth adding a note on this to the documentation (in the list of environment variables) and/or in the Debugging Leaks section?

@gmarkall
Copy link
Member

gmarkall commented Jul 5, 2022

WRT benchmarking, I modified the example slightly to use timeit:

import numba as nb
import numpy as np

@nb.njit('void()')
def test():
    for i in range(1000):
        np.empty(2)


def call_test():
    n = 20000
    for x in range(n):
        test()

%timeit call_test()

I get:

1.1 s ± 5.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
833 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

before and after, which seems to be a reduction in execution time of about 24%.

@gmarkall
Copy link
Member

gmarkall commented Jul 5, 2022

I also tried undoing the alignment optimization - for the above benchmark it makes about 1% difference in execution time on an i7-6700K, so I think it's definitely worth keeping considering other machines (small ARM SBCs, etc) will probably have a much worse modulo operation.

* `memset` full regions with marker bytes when in "safe" mode.
* Add docs for env var `NUMBA_DEBUG_NRT`.
@stuartarchibald
Copy link
Contributor Author

Thanks for confirming the benchmark results @gmarkall. Thanks also for reviewing, I've responded to #8200 (review) in be17e75 by memseting the entire allocated/deallocated region with marker bytes and documenting the env var NUMBA_DEBUG_NRT.

@stuartarchibald
Copy link
Contributor Author

Docs build is failing, am going to have to merge main.

This is to pick up changes in the doc build config.
@stuartarchibald stuartarchibald added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review labels Jul 5, 2022
Copy link
Member

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. This now looks good!

@gmarkall gmarkall added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Jul 5, 2022
@sklam sklam merged commit 216436c into numba:main Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of Numba Runtime Allocations
3 participants