Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: DataFrame.eval, inplace option #9297

Closed
shoyer opened this issue Jan 19, 2015 · 9 comments
Closed

API: DataFrame.eval, inplace option #9297

shoyer opened this issue Jan 19, 2015 · 9 comments
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Jan 19, 2015

Building off of #9229, it's generally much better to create new dataframes rather than to modify existing ones. It's both more chainable and less prone to the bugs/errors associated with mutable state.

So it seems to me that a saner API design would be to make assignment operations with eval/query return a new frame rather than modify the existing one, i.e., df.eval('x = y ** 2') should be equivalent to df.assign(x = lambda df: df.y ** 2) (as currently defined in #9239).

In my experience, something like df.query('x = 0') is more likely intended to be equivalent to df.query('x == 0') rather than df['x'] = 0, which has the unfortunate effect of also modifying the original data. In fact, I have made this exact mistake before -- and apparently others have as well (see #8664).

Thoughts? The query/eval syntax is marked as experimental, so I think we have the freedom to change this.

CC @TomAugspurger @cpcloud @jreback

@jreback
Copy link
Contributor

jreback commented Jan 19, 2015

pretty sure that this is converted to == or raises I don't recall which
query never should assign - iirc their is a another issue about this where it is reported - so this is a dupe issue

@jreback
Copy link
Contributor

jreback commented Jan 19, 2015

dupe of #8664 (this is a bug an not an API issue).

@jreback jreback closed this as completed Jan 19, 2015
@cpcloud
Copy link
Member

cpcloud commented Jan 19, 2015

I think this is about eval mostly and that it should use the new assign, not the issue that assigning in query is a bug. Reopening.

@cpcloud cpcloud reopened this Jan 19, 2015
@cpcloud
Copy link
Member

cpcloud commented Jan 19, 2015

@jreback I know you're on an issue closing spree, feel free to "reclose". 😊

@jreback
Copy link
Contributor

jreback commented Jan 19, 2015

oh ok. Yeh the query using = is a bug (and covered by #8864). It think eval is fine.

So @shoyer maybe clarify what this issue is addressing.

@cpcloud
Copy link
Member

cpcloud commented Jan 19, 2015

I think he's saying that instead of updating the frame and returning None, we should return a new frame with that column assigned to it. This would be a breaking api change, so I think an inplace flag would be nice to allow people minimal breakage. We could default to True for a few releases and then change to False

@jreback
Copy link
Contributor

jreback commented Jan 19, 2015

ok, I can buy that for a $1

@jreback jreback changed the title API: using = with DataFrame.eval/query should return a new frame instead of being in-place API: DataFrame.eval, inplace option Jan 19, 2015
@shoyer
Copy link
Member Author

shoyer commented Jan 20, 2015

@cpcloud, thanks for your help clarifying. This is indeed my idea here. (Using query instead of eval in the example did not help :).)

@jreback jreback added this to the Next Major Release milestone Oct 8, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 9, 2015
@jreback
Copy link
Contributor

jreback commented Jan 4, 2016

closed by #11149

@jreback jreback closed this as completed Jan 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants