You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
query provides an interesting interface to query data from a pandas dataframe. It also opens up possibilities to have these queries be processed by different kind of engines. Today these options are 'python' and 'numexpr', where 'numexpr' is the default here.
The problematic part here is, is that numexpr is often not installed. As a fallback pandas will use the 'python' engine. This wouldn't be a problem if the difference is only speed, but the 'python' engine provides more features than the 'numexpr' which can cause somebody to write code that works perfectly on his/her machine, but can unexpectedly break on another machine where numexpr happens to be installed. This for example was the case with issue #29027.
I think such a hidden pitfall should not exist, and there should be someway to prevent this from happening or warn the user if it can happen.
Some possibilities I thought about:
Set engine='python' the default. Simple and probably the best solution.
Set engine='numexpr' the default and require an explicit engine='python' whenever numexpr is not installed. (So do not silenty fallback.)
Alternative, throw an warning whenever numexpr is not installed, which can be suppressed by setting engine='python' explicitly. (So fallback with sound.)
Instead throw an error whenever numexpr is not installed.
Have numexpr as a dependency.
Figure out if .query is called with syntax not allowed for numexpr and throw a warning whenever engine='python' is not explicitly set. I don't know if this is possible. This might be also more work in the future whenever something changes to the parser, but is probably the most ideal solution.
The text was updated successfully, but these errors were encountered:
You mean like. Whenever the parser is called with invalid syntax while using the numexpr engine (implicitly), automatically fall back to see if the python engine does work.
That could be a solution. I would still add a warning to it when it autofallbacks. It's still somewhat awkward as it will autofallback even if the syntax is really wrong, and cannot work on both engines.
query
provides an interesting interface to query data from a pandas dataframe. It also opens up possibilities to have these queries be processed by different kind of engines. Today these options are 'python' and 'numexpr', where 'numexpr' is the default here.The problematic part here is, is that numexpr is often not installed. As a fallback pandas will use the 'python' engine. This wouldn't be a problem if the difference is only speed, but the 'python' engine provides more features than the 'numexpr' which can cause somebody to write code that works perfectly on his/her machine, but can unexpectedly break on another machine where numexpr happens to be installed. This for example was the case with issue #29027.
I think such a hidden pitfall should not exist, and there should be someway to prevent this from happening or warn the user if it can happen.
Some possibilities I thought about:
Set engine='python' the default. Simple and probably the best solution.
Set engine='numexpr' the default and require an explicit engine='python' whenever numexpr is not installed. (So do not silenty fallback.)
Alternative, throw an warning whenever numexpr is not installed, which can be suppressed by setting engine='python' explicitly. (So fallback with sound.)
Instead throw an error whenever numexpr is not installed.
Have numexpr as a dependency.
Figure out if
.query
is called with syntax not allowed for numexpr and throw a warning whenever engine='python' is not explicitly set. I don't know if this is possible. This might be also more work in the future whenever something changes to the parser, but is probably the most ideal solution.The text was updated successfully, but these errors were encountered: