New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Dispatcher to remove unnecessary indirection #6349
Conversation
Associated changes to types of functions to avoid casts are required. Some casts of malloc returns also added.
Mixed public / private access controls for members results in a non-standard-layout class. A standard-layout class is required for offsetof, used in the implementation of the `Dispatcher._can_compile` member in Python.
This is required in `Dispatcher_members` for the `_can_compile` member.
From PR numba#6349 / the grm-dispatcher-refactor branch.
This required moving the initialization of `exact_match_required` in `Dispatcher_call` up before any gotos to avoid a compilation error: ``` numba/_dispatcher.cpp: In function ‘PyObject* Dispatcher_call(Dispatcher*, PyObject*, PyObject*)’: numba/_dispatcher.cpp:635:1: error: jump to label ‘CLEANUP’ [-fpermissive] CLEANUP: ^~~~~~~ numba/_dispatcher.cpp:579:22: note: from here goto CLEANUP; ^~~~~~~ numba/_dispatcher.cpp:586:9: note: crosses initialization of ‘int exact_match_required’ int exact_match_required = self->can_compile ? 1 : self->exact_match_required; ^~~~~~~~~~~~~~~~~~~~ numba/_dispatcher.cpp:635:1: error: jump to label ‘CLEANUP’ [-fpermissive] CLEANUP: ^~~~~~~ numba/_dispatcher.cpp:553:18: note: from here goto CLEANUP; ^~~~~~~ numba/_dispatcher.cpp:586:9: note: crosses initialization of ‘int exact_match_required’ int exact_match_required = self->can_compile ? 1 : self->exact_match_required; ^~~~~~~~~~~~~~~~~~~~ ```
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
if (argct == 0) { | ||
// Nullary function: trivial match on first overload | ||
matches = 1; | ||
selected = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. If there are multiple definitions, it should be an ambiguous resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could there be multiple definitions with no arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that this was just moved from https://github.com/numba/numba/pull/6349/files#diff-56cb1358ce78e8b6ba8ea05ac6a688da59d236c53f039245b01cb24a15364161L31-L35)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok.
btw, there shouldn't be multiple definitions for no arguments. It will have to be a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for undertaking this refactor, it's remove a lot of indirection that was previously required due to Python 2.7/General C++ cross compatibility issues. The changes look good in general with just a few minor comments to resolve. Once done I'll push it through the build farm. Thanks again!
@stuartarchibald Many thanks for the review - I believe all comments are now addressed. |
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix-ups, looks good.
Buildfarm ID: |
Buildfarm ID: |
passed |
passed. |
@sklam given the inherent risk involved when this part of the code is changed, did you want to take a look at this too? I've reviewed and the changes seem good, build farm is happy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good.
I also did a quick benchmark on the dispatcher overhead with the following ipython script:
from numba import njit
@njit
def arity_0():
return
@njit
def arity_1(x):
return
@njit
def arity_2(x, y):
return
@njit
def arity_n(*args):
return
%timeit arity_0()
%timeit arity_1(1)
%timeit arity_2(1, 2)
%timeit arity_n(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
On master 3d18f14, i'm getting:
62.5 ns ± 0.768 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
135 ns ± 3.04 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
150 ns ± 2.72 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
407 ns ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
With this patch, I get:
59 ns ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
121 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
149 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
387 ns ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
There's a clear reduction of overhead; esp, for 0-arity function.
This PR refactors the dispatcher so that the split between the C and C++ implementations no longer exists - the entire implementation is now in C++. The consensus in recent discussions was that this split is probably an artefact of the constraints of various C and C++ compilers of years past that no longer apply. This refactoring is mostly to remove the indirection the split caused, with one or two more modern idioms applied.
I've left in my commit history whilst making the changes so that the working is apparent. In summary, the changes required:
_dispatcher.h
and_dispatcherimpl.cpp
._dispatcher.c
to_dispatcher.cpp
Dispatcher
object from_dispatcherimpl.cpp
into theDispatcherObject
in_dispatcher.cpp
- note that this was then renamed toDispatcher
.dispatcher_resolve
) they are replaced with calls to theDispatcher
methods instead (e.g.self->resolve
).string.h
etc.) with C++ ones (cstring
etc).malloc
/free
withnew
/delete
operators(char*)
as necessary)._typeof.c
is included by a C++ source, it has anextern "C"
block added as appropriate.