New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.query() does not support column name 'class' #18221

Closed
michaelaye opened this Issue Nov 10, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@michaelaye
Contributor

michaelaye commented Nov 10, 2017

Code Sample, a copy-pastable example if possible

indices_to_plot = df.query('class>0')

Problem description

Above code results in this error traceback:

Traceback (most recent call last):

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-33-6e077c50ac68>", line 2, in <module>
    indices_to_plot = df.query('class>0')

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/pandas/core/frame.py", line 2297, in query
    res = self.eval(expr, **kwargs)

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/pandas/core/frame.py", line 2366, in eval
    return _eval(expr, inplace=inplace, **kwargs)

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 290, in eval
    truediv=truediv)

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 732, in __init__
    self.terms = self.parse()

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 749, in parse
    return self._visitor.visit(self.expr)

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 310, in visit
    node = ast.fix_missing_locations(ast.parse(clean))

  File "/Users/klay6683/miniconda3/envs/stable/lib/python3.6/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)

  File "<unknown>", line 1
    class >0
          ^
SyntaxError: invalid syntax

My column names are "occ_id, class, et, radius, lon, width, type" and if I execute this query on another column, it works fine:

indices_to_plot = df.query('et>0')

Only the column named 'class' seems to fail.

Expected Output

Sub selection of the dataframe according to the query.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.27.3
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: 0.9.6
IPython: 6.2.1
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Nov 10, 2017

While the error message could be better, I'm not sure this is something we can support (easily) - pandas and numexpr use the python parser to evaluate these expressions, and class is of course a reserved word in python.

@michaelaye

This comment has been minimized.

Contributor

michaelaye commented Nov 10, 2017

Understood, though this paragraph from the docstring made me believe it should work:

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query.

If it's impossible to use any reserved keywords as column names for query it should be explicitly called out in the docstring, I think.

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Nov 10, 2017

If it's impossible to use any reserved keywords as column names for query it should be explicitly called out in the docstring, I think.

Yes agreed, we may also want to wrap the parsing in a try/catch to bubble up a more directed error. PR welcome!

@chris-b1 chris-b1 added this to the Next Major Release milestone Nov 10, 2017

@jreback jreback modified the milestones: Next Major Release, 0.21.1, 0.22.0 Nov 12, 2017

jreback added a commit that referenced this issue Nov 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment