Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Create a Global Setting for Enabling numba engine #33966

Closed
jtelleriar opened this issue May 4, 2020 · 6 comments · Fixed by #35182
Closed

Feature Request: Create a Global Setting for Enabling numba engine #33966

jtelleriar opened this issue May 4, 2020 · 6 comments · Fixed by #35182
Labels
API Design Enhancement numba numba-accelerated operations
Milestone

Comments

@jtelleriar
Copy link

Would it be possible to create a default pandas global setting to enable numba engine whenever possible?

In pandas.DataFrame.apply, .transform, etc.

Thanks!

@jtelleriar jtelleriar added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 4, 2020
@TomAugspurger TomAugspurger added API Design and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 4, 2020
@TomAugspurger
Copy link
Contributor

cc @mroeschke. Seems reasonable.

@jorisvandenbossche
Copy link
Member

We currently have pd.options.compute.use_numexpr/use_bottleneck options, so the interface could resemble that

@mroeschke
Copy link
Member

I am +0 to the idea in theory, but there are some considerations that may not make this entirely convenient for the user.

  • The numba behavior today doesn't have any "fall back" behavior like numexpr and bottleneck (I believe for those two). This was an intentional decision to make things simpler on our end.

  • The numba and cython engines usage wise are not totally interchangeable. For example for groupby.transform, the UDF signature needs to be def f(values, index, ...) (exactly) for engine='numba' and anything for engine='cython'. So functions cannot be easily reused between engines

@mroeschke
Copy link
Member

Just to be clear about the behavior for this feature:

  1. groupby.transform(..., engine='cython') & compute.use_numba = True would use the numba engine
  2. Internally we won't "fall back" to the cython behavior, mainly to reduce code complexity (and groupby.transform(..., engine='numba') already doesn't fall back as well)

@jorisvandenbossche
Copy link
Member

groupby.transform(..., engine='cython') & compute.use_numba = True would use the numba engine

If the user explicitly specifies engine="cython", can't it then override the global config? Or why would that be complex (it's not a fallback or so)

@mroeschke
Copy link
Member

engine='cython' is already the default keyword argument for the operations that have a numba option available.

@mroeschke mroeschke added the numba numba-accelerated operations label Jul 8, 2020
@jreback jreback added this to the 1.1 milestone Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement numba numba-accelerated operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants