BUG: query modifies the frame when you compare with `=` #8664

Closed
TomAugspurger opened this Issue Oct 28, 2014 · 7 comments

Comments

Projects
None yet
5 participants
Contributor

TomAugspurger commented Oct 28, 2014

I messed up and used = instead of == in a query.

df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
df.query('a=1')

That raises a ValueError. But df was modified.

In [15]: df
Out[15]:
   a  b
0  1  a
1  1  b
2  1  c

versions:

pandas: 0.15.0-6-g403f38d
bottleneck: None
tables: None
numexpr: 2.3.1

Can't look right now.

jreback added the Bug label Oct 28, 2014

jreback added the API Design label Nov 16, 2014

Contributor

dxe4 commented Dec 1, 2014

@TomAugspurger "a" is marked as "assigner" here https://github.com/pydata/pandas/blob/e463818e69edee64a277ce97c89999f0c898dd7c/pandas/computation/eval.py#L239 so env.target[parsed_expr.assigner] = ret does df['a'] = 1. Not sure if i can help further with this.

Contributor

dxe4 commented Dec 2, 2014

it looks like eval is designed to mutate, but query is not meant to mutate. I can add an extra kwarg in eval indicating if you can mutate or not to fix this, if that makes sense. (query calls eval, eval mutates and returns None, so then query raises a value error)

jorisvandenbossche added this to the 0.17.0 milestone Feb 25, 2015

Hi @jorisvandenbossche , I'm here now on Github! I last communicated with you about this bug on Stackoverflow http://stackoverflow.com/questions/28714469/bug-in-pandas-query-method/ about the bug wherein the query() method would modify the data in a column if the query statement used an '=' symbol. I'd be keen to contribute to fixing this bug, however, if I do work on it, I can't guarantee to a strict timeline and may not have a lot of time to commit to it especially since at work I have moved on to a slightly different assignment where I am not doing much data analysis and thus not using pandas anymore (for now at least). I also have a fair amount of other projects outside of work that I have already undertaken. But I would say that I enjoy using python and am quite familar with it, so if solving this bug is within my reach, I definitely like to contribute to this !

May I know what are the steps I need to take to be able to start assisting with this debug? Like do I have to clone the entire development repo and then create a branch or something? What's the flow like at the moment?

Contributor

jreback commented Mar 6, 2015

see docs here

  • you fork the repo (only do this once)
  • create a branch
  • add tests which repro the problem
  • make a fix
  • tests should now pass
  • submit a pull-request

@jreback Thank you for pointing me to the docs and for describing the flow. I'll set to work on this, and if I have any issues, I'll raise them up to you guys.

@jreback jreback modified the milestone: 0.17.1, Next Major Release Oct 11, 2015

@jreback jreback modified the milestone: Next Major Release, 0.17.1 Nov 13, 2015

@jreback jreback modified the milestone: 0.18.0, Next Major Release Dec 9, 2015

Contributor

jreback commented Jan 4, 2016

closed by #11149

jreback closed this Jan 4, 2016

@tsstchoi tsstchoi added a commit to tsstchoi/pandas that referenced this issue Feb 26, 2016

@tsstchoi tsstchoi Update what's new page with relevant issues for df.query() with in-pl…
…ace operator

The relevant issues added are :
BUG: query with invalid dtypes should fallback to python engine #10486
BUG: query modifies the frame when you compare with `=` #8664
414e30f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment