Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query abs(column) == [1, 2] fails #31848

Open
klieret opened this issue Feb 10, 2020 · 4 comments
Open

Query abs(column) == [1, 2] fails #31848

klieret opened this issue Feb 10, 2020 · 4 comments
Labels
Bug expressions pd.eval, query

Comments

@klieret
Copy link
Contributor

klieret commented Feb 10, 2020

Code Sample, a copy-pastable example if possible

df  = pd.DataFrame({"col": [1, -1, 2, -2]})

# Works:
df.query("abs(col) == 1")

# Works:
df.query("col == [1, -1, 2, -2]")

# Fails:
df.query("abs(col) == [1, 2]")

traceback:

ValueError                                Traceback (most recent call last)
<ipython-input-42-1b78845d9a47> in <module>()
----> 1 df.query("abs(col) == [1, 2]")

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in query(self, expr, inplace, **kwargs)
   3229         kwargs["level"] = kwargs.pop("level", 0) + 1
   3230         kwargs["target"] = None
-> 3231         res = self.eval(expr, **kwargs)
   3232 
   3233         try:

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   3344         kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
   3345 
-> 3346         return _eval(expr, inplace=inplace, **kwargs)
   3347 
   3348     def select_dtypes(self, include=None, exclude=None) -> "DataFrame":

~/.local/lib/python3.6/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    335         eng = _engines[engine]
    336         eng_inst = eng(parsed_expr)
--> 337         ret = eng_inst.evaluate()
    338 
    339         if parsed_expr.assigner is None:

~/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py in evaluate(self)
     71 
     72         # make sure no names in resolvers and locals/globals clash
---> 73         res = self._evaluate()
     74         return reconstruct_object(
     75             self.result_type, res, self.aligned_axes, self.expr.terms.return_type

~/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py in _evaluate(self)
    112         scope = env.full_scope
    113         _check_ne_builtin_clash(self.expr)
--> 114         return ne.evaluate(s, local_dict=scope)
    115 
    116 

~/.local/lib/python3.6/site-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    832     _numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs)
    833     with evaluate_lock:
--> 834         return compiled_ex(*arguments, **kwargs)
    835 
    836 

ValueError: operands could not be broadcast together with shapes (4,) (2,) 

Problem description

A value error is raised for the query

Expected Output

Same as df.query("col == [1, -1, 2, -2]")

Output of pd.show_versions()


INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.2.final.0
python-bits      : 64
OS               : Linux
OS-release       : 2.6.32-754.23.1.el6.x86_64
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.1
numpy            : 1.17.1
pytz             : 2018.3
dateutil         : 2.6.1
pip              : 19.1.1
setuptools       : 38.5.1
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : 1.7.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.1.1
html5lib         : 0.9999999
pymysql          : None
psycopg2         : None
jinja2           : 2.10
IPython          : 6.2.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.1.1
matplotlib       : 3.1.1
numexpr          : 2.6.9
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : None
tables           : 3.5.1
tabulate         : 0.8.3
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : 0.45.1
@klieret klieret changed the title Query abs(column) == [1, 2] fails Query abs(column) == [1, 2] fails Feb 10, 2020
@klieret klieret changed the title Query abs(column) == [1, 2] fails Query abs(column) == [1, 2] fails Feb 10, 2020
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 10, 2020

It throws

ValueError: operands could not be broadcast together with shapes (4,) (2,)

Indeed, your DataFrame is of length 4, and you're comparing it to a list of length 2.

As for why

df.query("abs(col) == 1")

works, see broadcasting

@klieret
Copy link
Contributor Author

klieret commented Feb 10, 2020

Thanks for your quick reply, @MarcoGorelli

However, this works:

df.query("col == [1, 2]")

returning the first and third row, so this doesn't just seem to be about broadcasting. The column == [...] syntax rather seems to be checking whether the value of the column is in the list (like the isin method).

See also this link: https://jereze.com/fr/code/pandas-query-filter-list/

So having the corresponding example with abs(col) not working still seems like a bug to me.

@MarcoGorelli
Copy link
Member

However, this works:

df.query("col == [1, 2]")

Good catch, thanks for the report @klieret !

@MarcoGorelli
Copy link
Member

In pandas/core/computation/expr.py, we have:

>>> left
abs(col)
>>> is_term(left)
False

but

>>> left
col
>>> is_term(left)
True

so in the latter case, op is getting converted from == to .isin while in the former case it isn't

@jbrockmendel jbrockmendel added the expressions pd.eval, query label Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug expressions pd.eval, query
Projects
None yet
Development

No branches or pull requests

3 participants