API: DataFrame.eval, inplace option #9297

Closed
shoyer opened this Issue Jan 19, 2015 · 9 comments

Comments

Projects
None yet
3 participants
Member

shoyer commented Jan 19, 2015

Building off of #9229, it's generally much better to create new dataframes rather than to modify existing ones. It's both more chainable and less prone to the bugs/errors associated with mutable state.

So it seems to me that a saner API design would be to make assignment operations with eval/query return a new frame rather than modify the existing one, i.e., df.eval('x = y ** 2') should be equivalent to df.assign(x = lambda df: df.y ** 2) (as currently defined in #9239).

In my experience, something like df.query('x = 0') is more likely intended to be equivalent to df.query('x == 0') rather than df['x'] = 0, which has the unfortunate effect of also modifying the original data. In fact, I have made this exact mistake before -- and apparently others have as well (see #8664).

Thoughts? The query/eval syntax is marked as experimental, so I think we have the freedom to change this.

CC @TomAugspurger @cpcloud @jreback

shoyer added the API Design label Jan 19, 2015

Contributor

jreback commented Jan 19, 2015

pretty sure that this is converted to == or raises I don't recall which
query never should assign - iirc their is a another issue about this where it is reported - so this is a dupe issue

Contributor

jreback commented Jan 19, 2015

dupe of #8664 (this is a bug an not an API issue).

jreback closed this Jan 19, 2015

Member

cpcloud commented Jan 19, 2015

I think this is about eval mostly and that it should use the new assign, not the issue that assigning in query is a bug. Reopening.

cpcloud reopened this Jan 19, 2015

Member

cpcloud commented Jan 19, 2015

@jreback I know you're on an issue closing spree, feel free to "reclose". 😊

Contributor

jreback commented Jan 19, 2015

oh ok. Yeh the query using = is a bug (and covered by #8864). It think eval is fine.

So @shoyer maybe clarify what this issue is addressing.

Member

cpcloud commented Jan 19, 2015

I think he's saying that instead of updating the frame and returning None, we should return a new frame with that column assigned to it. This would be a breaking api change, so I think an inplace flag would be nice to allow people minimal breakage. We could default to True for a few releases and then change to False

Contributor

jreback commented Jan 19, 2015

ok, I can buy that for a $1

jreback changed the title from API: using = with DataFrame.eval/query should return a new frame instead of being in-place to API: DataFrame.eval, inplace option Jan 19, 2015

Member

shoyer commented Jan 20, 2015

@cpcloud, thanks for your help clarifying. This is indeed my idea here. (Using query instead of eval in the example did not help :).)

jreback added this to the Next Major Release milestone Oct 8, 2015

@jreback jreback modified the milestone: 0.18.0, Next Major Release Dec 9, 2015

Contributor

jreback commented Jan 4, 2016

closed by #11149

jreback closed this Jan 4, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment