Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falling back silently to engine='python' whenever numexpr is not installed causes a query call to break more easy #30005

Open
hwalinga opened this issue Dec 3, 2019 · 2 comments
Labels

Comments

@hwalinga
Copy link
Contributor

hwalinga commented Dec 3, 2019

query provides an interesting interface to query data from a pandas dataframe. It also opens up possibilities to have these queries be processed by different kind of engines. Today these options are 'python' and 'numexpr', where 'numexpr' is the default here.

The problematic part here is, is that numexpr is often not installed. As a fallback pandas will use the 'python' engine. This wouldn't be a problem if the difference is only speed, but the 'python' engine provides more features than the 'numexpr' which can cause somebody to write code that works perfectly on his/her machine, but can unexpectedly break on another machine where numexpr happens to be installed. This for example was the case with issue #29027.

I think such a hidden pitfall should not exist, and there should be someway to prevent this from happening or warn the user if it can happen.

Some possibilities I thought about:

  • Set engine='python' the default. Simple and probably the best solution.

  • Set engine='numexpr' the default and require an explicit engine='python' whenever numexpr is not installed. (So do not silenty fallback.)

  • Alternative, throw an warning whenever numexpr is not installed, which can be suppressed by setting engine='python' explicitly. (So fallback with sound.)

  • Instead throw an error whenever numexpr is not installed.

  • Have numexpr as a dependency.

  • Figure out if .query is called with syntax not allowed for numexpr and throw a warning whenever engine='python' is not explicitly set. I don't know if this is possible. This might be also more work in the future whenever something changes to the parser, but is probably the most ideal solution.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2019

the last is what we already do by raising a SyntaxError whenever he query cannot be parsed

it rather simple to actually fallback from this automatically

@hwalinga
Copy link
Contributor Author

hwalinga commented Dec 3, 2019

You mean like. Whenever the parser is called with invalid syntax while using the numexpr engine (implicitly), automatically fall back to see if the python engine does work.

That could be a solution. I would still add a warning to it when it autofallbacks. It's still somewhat awkward as it will autofallback even if the syntax is really wrong, and cannot work on both engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants