Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: query modifies the frame when you compare with = #8664

Closed
TomAugspurger opened this issue Oct 28, 2014 · 7 comments
Closed

BUG: query modifies the frame when you compare with = #8664

TomAugspurger opened this issue Oct 28, 2014 · 7 comments

Comments

@TomAugspurger
Copy link
Contributor

I messed up and used = instead of == in a query.

df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
df.query('a=1')

That raises a ValueError. But df was modified.

In [15]: df
Out[15]:
   a  b
0  1  a
1  1  b
2  1  c

versions:

pandas: 0.15.0-6-g403f38d
bottleneck: None
tables: None
numexpr: 2.3.1

Can't look right now.

@dxe4
Copy link
Contributor

dxe4 commented Dec 1, 2014

@TomAugspurger "a" is marked as "assigner" here https://github.com/pydata/pandas/blob/e463818e69edee64a277ce97c89999f0c898dd7c/pandas/computation/eval.py#L239 so env.target[parsed_expr.assigner] = ret does df['a'] = 1. Not sure if i can help further with this.

@dxe4
Copy link
Contributor

dxe4 commented Dec 2, 2014

it looks like eval is designed to mutate, but query is not meant to mutate. I can add an extra kwarg in eval indicating if you can mutate or not to fix this, if that makes sense. (query calls eval, eval mutates and returns None, so then query raises a value error)

@kohaugustine
Copy link

Hi @jorisvandenbossche , I'm here now on Github! I last communicated with you about this bug on Stackoverflow http://stackoverflow.com/questions/28714469/bug-in-pandas-query-method/ about the bug wherein the query() method would modify the data in a column if the query statement used an '=' symbol. I'd be keen to contribute to fixing this bug, however, if I do work on it, I can't guarantee to a strict timeline and may not have a lot of time to commit to it especially since at work I have moved on to a slightly different assignment where I am not doing much data analysis and thus not using pandas anymore (for now at least). I also have a fair amount of other projects outside of work that I have already undertaken. But I would say that I enjoy using python and am quite familar with it, so if solving this bug is within my reach, I definitely like to contribute to this !

@kohaugustine
Copy link

May I know what are the steps I need to take to be able to start assisting with this debug? Like do I have to clone the entire development repo and then create a branch or something? What's the flow like at the moment?

@jreback
Copy link
Contributor

jreback commented Mar 6, 2015

see docs here

  • you fork the repo (only do this once)
  • create a branch
  • add tests which repro the problem
  • make a fix
  • tests should now pass
  • submit a pull-request

@kohaugustine
Copy link

@jreback Thank you for pointing me to the docs and for describing the flow. I'll set to work on this, and if I have any issues, I'll raise them up to you guys.

@jreback jreback modified the milestones: 0.17.1, Next Major Release Oct 11, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 9, 2015
@jreback
Copy link
Contributor

jreback commented Jan 4, 2016

closed by #11149

@jreback jreback closed this as completed Jan 4, 2016
tsstchoi added a commit to tsstchoi/pandas that referenced this issue Feb 26, 2016
…ace operator

The relevant issues added are :
BUG: query with invalid dtypes should fallback to python engine pandas-dev#10486
BUG: query modifies the frame when you compare with `=` pandas-dev#8664
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants