Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Implement two level dispatcher for execute_node #2246

Merged
merged 15 commits into from
Jun 18, 2020

Conversation

icexelloss
Copy link
Contributor

@icexelloss icexelloss commented Jun 12, 2020

What is the change

This PR implements a TwoLevelDispatcher and replace the multipledispatch.Dispatcher with TwoLevelDispatcher to define ibis.pandas.execute_node

The original dispatcher is found to be slow when
(1) dispatch the first execute_node
(2) dispatch after new signature for execute_node is registered (this happens when user uses UDF)

In this alternative dispatcher, neither of these two issue appears. For details, see docstring of TwoLevelDispatcher

How is the change tested

  • ibis/pandas/tests/test_dispatcher.py is added
  • Also replies on existing pandas backend tests.
  • Manually tested tracing and verified tracing still works.

Benchmark

A benchmark is performed by running all the pandas backend tests and record "time to dispatch" for all signatures.

Here is the top 10 slowest dispatch in the original implementation (with the corresponding time in the new implementation):
image

Here is the top 10 slowest dispatch in the new implementation (with the corresponding time in the original implementation):
image

@icexelloss icexelloss changed the title Pandas dispatch FEAT: Implement two level dispatcher for execute_node Jun 12, 2020
@jreback jreback added expressions Issues or PRs related to the expression API pandas The pandas backend labels Jun 12, 2020
@jreback jreback added this to the Next Feature Release milestone Jun 12, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conceptually looks fine. the perf of ops seems slighy slower when they are not the worst case ops (cast float64, series). is this noticeable? can we specifically register UDF ops with the 2-level and leave everything else with the 1-level (does this make it way more complicated)?

  • i think this makes sense to push upstream to multidispatch as well (ok in here until / unless this patch is accepted & released)

ibis/pandas/dispatcher.py Show resolved Hide resolved
@icexelloss
Copy link
Contributor Author

icexelloss commented Jun 15, 2020

I am leaning towards not optimize this further:

  • The performance is good enough. Even for an execution that triggers all of the first 10 slowest dispatches, the overhead is still < 0.2s. It's not perfect but I don't think it's noticeable either.

  • Support both 1-level and 2-level in a single dispatcher does make it more complicated. - for a given signature, we would first try to decide whether it's registered in 1-level or 2-level dispatchers, which by itself can be quite expensive.

  • I don't want to commit to further optimization before researching if other "multi dispatch" libraries are better suited for ibis. There is also multimethod and multimethods (yes, they are different) library that does similar things. This can potentially be more work so I'd like to keep it out of this PR.

del execute_node.funcs[ops.Add, int, MyObject]
del execute_node._meta_dispatcher.funcs[(ops.Add,)].funcs[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to consider a method for this on the dispatcher itself? (other wise this leaks into the impl).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added delitem for this

@icexelloss icexelloss requested a review from jreback June 16, 2020 15:52
@jreback jreback merged commit dcf2966 into ibis-project:master Jun 18, 2020
@icexelloss
Copy link
Contributor Author

Thanks! @jreback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants