Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Dispatcher to remove unnecessary indirection #6349

Merged
merged 13 commits into from Nov 25, 2020

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Oct 12, 2020

This PR refactors the dispatcher so that the split between the C and C++ implementations no longer exists - the entire implementation is now in C++. The consensus in recent discussions was that this split is probably an artefact of the constraints of various C and C++ compilers of years past that no longer apply. This refactoring is mostly to remove the indirection the split caused, with one or two more modern idioms applied.

I've left in my commit history whilst making the changes so that the working is apparent. In summary, the changes required:

  • Removal of _dispatcher.h and _dispatcherimpl.cpp.
  • Renaming _dispatcher.c to _dispatcher.cpp
  • Moving the methods and members of the Dispatcher object from _dispatcherimpl.cpp into the DispatcherObject in _dispatcher.cpp - note that this was then renamed to Dispatcher.
  • Where the C proxy methods were used (e.g. dispatcher_resolve) they are replaced with calls to the Dispatcher methods instead (e.g. self->resolve).
  • Replacement of C-style includes (string.h etc.) with C++ ones (cstring etc).
  • Replacement of malloc / free with new / delete operators
  • Some other minor changes to enable compilation of the dispatcher code with a C++ (e.g. casting string literals to (char*) as necessary).
  • Now that _typeof.c is included by a C++ source, it has an extern "C" block added as appropriate.

Associated changes to types of functions to avoid casts are required.
Some casts of malloc returns also added.
Mixed public / private access controls for members results in a
non-standard-layout class. A standard-layout class is required for
offsetof, used in the implementation of the `Dispatcher._can_compile`
member in Python.
This is required in `Dispatcher_members` for the `_can_compile` member.
@gmarkall gmarkall marked this pull request as ready for review October 12, 2020 16:02
gmarkall added a commit to gmarkall/numba that referenced this pull request Oct 13, 2020
From PR numba#6349 / the grm-dispatcher-refactor branch.
This required moving the initialization of `exact_match_required` in
`Dispatcher_call` up before any gotos to avoid a compilation error:

```
numba/_dispatcher.cpp: In function ‘PyObject* Dispatcher_call(Dispatcher*, PyObject*, PyObject*)’:
numba/_dispatcher.cpp:635:1: error: jump to label ‘CLEANUP’ [-fpermissive]
 CLEANUP:
 ^~~~~~~
numba/_dispatcher.cpp:579:22: note:   from here
                 goto CLEANUP;
                      ^~~~~~~
numba/_dispatcher.cpp:586:9: note:   crosses initialization of ‘int exact_match_required’
     int exact_match_required = self->can_compile ? 1 : self->exact_match_required;
         ^~~~~~~~~~~~~~~~~~~~
numba/_dispatcher.cpp:635:1: error: jump to label ‘CLEANUP’ [-fpermissive]
 CLEANUP:
 ^~~~~~~
numba/_dispatcher.cpp:553:18: note:   from here
             goto CLEANUP;
                  ^~~~~~~
numba/_dispatcher.cpp:586:9: note:   crosses initialization of ‘int exact_match_required’
     int exact_match_required = self->can_compile ? 1 : self->exact_match_required;
         ^~~~~~~~~~~~~~~~~~~~
```
@stuartarchibald
Copy link
Contributor

/AzurePipelines run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +147 to +151
if (argct == 0) {
// Nullary function: trivial match on first overload
matches = 1;
selected = 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. If there are multiple definitions, it should be an ambiguous resolution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could there be multiple definitions with no arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok.

btw, there shouldn't be multiple definitions for no arguments. It will have to be a mistake.

@stuartarchibald stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Nov 19, 2020
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for undertaking this refactor, it's remove a lot of indirection that was previously required due to Python 2.7/General C++ cross compatibility issues. The changes look good in general with just a few minor comments to resolve. Once done I'll push it through the build farm. Thanks again!

numba/_dispatcher.cpp Show resolved Hide resolved
numba/_dispatcher.cpp Show resolved Hide resolved
numba/_dispatcher.cpp Outdated Show resolved Hide resolved
numba/_dispatcher.cpp Show resolved Hide resolved
numba/_dispatcher.cpp Show resolved Hide resolved
@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Nov 19, 2020
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Nov 19, 2020
@gmarkall
Copy link
Member Author

@stuartarchibald Many thanks for the review - I believe all comments are now addressed.

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix-ups, looks good.

@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Nov 19, 2020
@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_137.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cpu_86.

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cpu_86.

passed

@stuartarchibald
Copy link
Contributor

Buildfarm ID: numba_smoketest_cuda_137.

passed.

@stuartarchibald stuartarchibald added BuildFarm Passed For PRs that have been through the buildfarm and passed and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Nov 23, 2020
@stuartarchibald
Copy link
Contributor

@sklam given the inherent risk involved when this part of the code is changed, did you want to take a look at this too? I've reviewed and the changes seem good, build farm is happy.

Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good.

I also did a quick benchmark on the dispatcher overhead with the following ipython script:

from numba import njit


@njit
def arity_0():
    return


@njit
def arity_1(x):
    return

@njit
def arity_2(x, y):
    return


@njit
def arity_n(*args):
    return

%timeit arity_0()
%timeit arity_1(1)
%timeit arity_2(1, 2)
%timeit arity_n(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

On master 3d18f14, i'm getting:

62.5 ns ± 0.768 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
135 ns ± 3.04 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
150 ns ± 2.72 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
407 ns ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

With this patch, I get:

59 ns ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
121 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
149 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
387 ns ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

There's a clear reduction of overhead; esp, for 0-arity function.

@sklam sklam added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on CI Review etc done, waiting for CI to finish labels Nov 25, 2020
@sklam sklam merged commit 92861d0 into numba:master Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants