Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe.query uses numexpr even when 'compute.use_numexpr' is False #32556

Closed
teto opened this issue Mar 9, 2020 · 12 comments · Fixed by #42668
Closed

Dataframe.query uses numexpr even when 'compute.use_numexpr' is False #32556

teto opened this issue Mar 9, 2020 · 12 comments · Fixed by #42668
Assignees
Milestone

Comments

@teto
Copy link

teto commented Mar 9, 2020

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
import io

content = io.StringIO('''
time,val
212.23, 32
''')

pd.set_option('compute.use_numexpr', False)
print("use numexpr? %d" % pd.get_option('compute.use_numexpr'))

df = pd.read_csv(
    content,
    sep=',',
    usecols=['time', 'val'],
    dtype= {'val': 'Int64'},
    # parse_dates=date_cols,
)
q = "val == 32"
# passing `engine="python"` fixes it
df_dest = df.query(q, )

throws with pandas 1.0

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    df_dest = df.query(q, )
  File "/nix/store/0yyjswv58jk0fv1miyklmppvpd0p45sy-python3.7-pandas-1.0.1/lib/python3.7/sit
e-packages/pandas/core/frame.py", line 3231, in query
    res = self.eval(expr, **kwargs)
  File "/nix/store/0yyjswv58jk0fv1miyklmppvpd0p45sy-python3.7-pandas-1.0.1/lib/python3.7/sit
e-packages/pandas/core/frame.py", line 3346, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/nix/store/0yyjswv58jk0fv1miyklmppvpd0p45sy-python3.7-pandas-1.0.1/lib/python3.7/sit
e-packages/pandas/core/computation/eval.py", line 337, in eval
    ret = eng_inst.evaluate()
  File "/nix/store/0yyjswv58jk0fv1miyklmppvpd0p45sy-python3.7-pandas-1.0.1/lib/python3.7/sit
e-packages/pandas/core/computation/engines.py", line 73, in evaluate
    res = self._evaluate()
  File "/nix/store/0yyjswv58jk0fv1miyklmppvpd0p45sy-python3.7-pandas-1.0.1/lib/python3.7/sit
e-packages/pandas/core/computation/engines.py", line 114, in _evaluate
    return ne.evaluate(s, local_dict=scope)
  File "/nix/store/k6wr7w16q0aq9mfkf0w15wxlhx76sxqc-python3.7-numexpr-2.7.1/lib/python3.7/si
te-packages/numexpr/necompiler.py", line 822, in evaluate
    zip(names, arguments)]
  File "/nix/store/k6wr7w16q0aq9mfkf0w15wxlhx76sxqc-python3.7-numexpr-2.7.1/lib/python3.7/si
te-packages/numexpr/necompiler.py", line 821, in <listcomp>
    signature = [(name, getType(arg)) for (name, arg) in
  File "/nix/store/k6wr7w16q0aq9mfkf0w15wxlhx76sxqc-python3.7-numexpr-2.7.1/lib/python3.7/si
te-packages/numexpr/necompiler.py", line 703, in getType
    raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object

Problem description

Dataframe.query fails when using UInt* dtypes see #25369 (comment).
My first idea to solve this was to disable numexpr but this isn't enough, one has to pass engine="python" to forcefully disable numexpr when calling query

@glemaitre
Copy link
Contributor

I stumbled into this issue as well. I am adding a minimal example which can be used as a non-regression test:

import pandas as pd
pd.set_option("compute.use_numexpr", False)
df = pd.DataFrame(
    {"A": [True, False, True, False, None, None],
     "B": [1, 2, 3, 4, 5, 6]}
)
df.query('A.isnull()')

@glemaitre
Copy link
Contributor

glemaitre commented May 4, 2020

After checking #33006 I am not sure what is the right fix and what was the initial semantic behind compute.use_numexpr. The doc is quite general regarding the impact of the option. However, maybe it was intended for evaluate only. This said it would be really handy to have an option to force the engine globally for the query method.

@jreback Since you commented on the PR, maybe you can have some insights.

@quangngd
Copy link
Contributor

quangngd commented May 7, 2020

I stumbled into this issue as well. I am adding a minimal example which can be used as a non-regression test:

import pandas as pd
pd.set_option("compute.use_numexpr", False)
df = pd.DataFrame(
    {"A": [True, False, True, False, None, None],
     "B": [1, 2, 3, 4, 5, 6]}
)
df.query('A.isnull()')

@glemaitre
This test could be fixed by either fixing this issue or #25369.
If the dtype bug is fixed (numexpr works), the test could silently pass but this issue still remains, aka numexpr is still being used despite pd.set_option("compute.use_numexpr", False) has been set.

@jreback jreback added this to the Contributions Welcome milestone Nov 26, 2020
@rhshadrach
Copy link
Member

rhshadrach commented Nov 27, 2020

The following demonstrates issue and raises on master:

from pandas.core.computation.eval import _check_engine
with pd.option_context('compute.use_numexpr', False):
    result = _check_engine(None)
    assert result == 'python'

Can resolve this as in #33006 with the above code as a test.

@yashhd
Copy link

yashhd commented Feb 8, 2021

Hello, I'm Yash and I've been asked to contribute to an open source project for my Software Development course. I'd be interested to work on this issue and it'd be great if I could be assigned to this. Thank you.

@jreback
Copy link
Contributor

jreback commented Feb 8, 2021

you can assign yourself with /take

@yashhd
Copy link

yashhd commented Feb 11, 2021

@jreback I tried doing that but I'm still unable to assign myself.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2021

@yashhd
Copy link

yashhd commented Feb 18, 2021

@jreback Thank you so much. I'll have a look.

@yashhd
Copy link

yashhd commented Feb 25, 2021

take

@taytzehao
Copy link
Contributor

take

@taytzehao taytzehao removed their assignment Apr 12, 2021
@saehuihwang
Copy link
Contributor

take

@jreback jreback modified the milestones: Contributions Welcome, 1.4 Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment