Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
API: DataFrame.eval, inplace option #9297
Comments
shoyer
added the
API Design
label
Jan 19, 2015
|
pretty sure that this is converted to == or raises I don't recall which |
|
dupe of #8664 (this is a bug an not an API issue). |
jreback
closed this
Jan 19, 2015
|
I think this is about eval mostly and that it should use the new assign, not the issue that assigning in query is a bug. Reopening. |
cpcloud
reopened this
Jan 19, 2015
|
@jreback I know you're on an issue closing spree, feel free to "reclose". |
|
I think he's saying that instead of updating the frame and returning None, we should return a new frame with that column assigned to it. This would be a breaking api change, so I think an inplace flag would be nice to allow people minimal breakage. We could default to True for a few releases and then change to False |
|
ok, I can buy that for a $1 |
jreback
changed the title from
API: using = with DataFrame.eval/query should return a new frame instead of being in-place to API: DataFrame.eval, inplace option
Jan 19, 2015
|
@cpcloud, thanks for your help clarifying. This is indeed my idea here. (Using query instead of eval in the example did not help :).) |
jreback
added this to the
Next Major Release
milestone
Oct 8, 2015
jreback
modified the milestone: 0.18.0, Next Major Release
Dec 9, 2015
|
closed by #11149 |
shoyer commentedJan 19, 2015
Building off of #9229, it's generally much better to create new dataframes rather than to modify existing ones. It's both more chainable and less prone to the bugs/errors associated with mutable state.
So it seems to me that a saner API design would be to make assignment operations with eval/query return a new frame rather than modify the existing one, i.e.,
df.eval('x = y ** 2')should be equivalent todf.assign(x = lambda df: df.y ** 2)(as currently defined in #9239).In my experience, something like
df.query('x = 0')is more likely intended to be equivalent todf.query('x == 0')rather thandf['x'] = 0, which has the unfortunate effect of also modifying the original data. In fact, I have made this exact mistake before -- and apparently others have as well (see #8664).Thoughts? The
query/evalsyntax is marked as experimental, so I think we have the freedom to change this.CC @TomAugspurger @cpcloud @jreback