New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: allow dependent assignment? #14207

Closed
chris-b1 opened this Issue Sep 12, 2016 · 19 comments

Comments

Projects
None yet
8 participants
@chris-b1
Contributor

chris-b1 commented Sep 12, 2016

Since kwarg order will be guaranteed in python 3.6, we could allow this...though would only want to try with a 3.6+ version check, otherwise code could more or less randomly work/not work, depending on dict order.

In [44]: df = pd.DataFrame({'a': [1,2,3]})

In [45]: df.assign(b=1, c=lambda x: x['b'] * 2)
@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Sep 12, 2016

Contributor

xref: #9777 (comment)

This would be a great, but I want to think a bit about exposing a feature that would so easily lead to subtle differences between 3.5 and 3.6. Not sure how best to handle it :/

Either way, we could probably deprecate the sorting of keys (documentation only; can't print a warning each time someone uses .assign).

Contributor

TomAugspurger commented Sep 12, 2016

xref: #9777 (comment)

This would be a great, but I want to think a bit about exposing a feature that would so easily lead to subtle differences between 3.5 and 3.6. Not sure how best to handle it :/

Either way, we could probably deprecate the sorting of keys (documentation only; can't print a warning each time someone uses .assign).

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Sep 12, 2016

Contributor

FYI, here's the current docs (in a warning block):

Since the function signature of assign is **kwargs, a dictionary, the order of the new columns in the resulting DataFrame cannot be guaranteed to match the order you pass in. To make things predictable, items are inserted alphabetically (by key) at the end of the DataFrame.
All expressions are computed first, and then assigned. So you can't refer to another column being assigned in the same call to assign.

Contributor

TomAugspurger commented Sep 12, 2016

FYI, here's the current docs (in a warning block):

Since the function signature of assign is **kwargs, a dictionary, the order of the new columns in the resulting DataFrame cannot be guaranteed to match the order you pass in. To make things predictable, items are inserted alphabetically (by key) at the end of the DataFrame.
All expressions are computed first, and then assigned. So you can't refer to another column being assigned in the same call to assign.

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Sep 12, 2016

Contributor

Yeah, the back-compat is awkward, maybe this could be opt-in from an option, so it would at least raise on older pythons, although that's ugly too.
pd.set_option('ordered_assign', True) # raises on py < 3.6

Contributor

chris-b1 commented Sep 12, 2016

Yeah, the back-compat is awkward, maybe this could be opt-in from an option, so it would at least raise on older pythons, although that's ugly too.
pd.set_option('ordered_assign', True) # raises on py < 3.6

@jreback jreback added the Python 3.6 label Nov 17, 2016

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Feb 7, 2017

Contributor

@mrocklin thoughts on this? IIRC you wanted this initially to make things easier for dask? How much would break for you / dask users if we just removed the sorting of keys?

Contributor

TomAugspurger commented Feb 7, 2017

@mrocklin thoughts on this? IIRC you wanted this initially to make things easier for dask? How much would break for you / dask users if we just removed the sorting of keys?

@mrocklin

This comment has been minimized.

Show comment
Hide comment
@mrocklin

mrocklin Feb 9, 2017

Contributor

I think that Pandas should make whatever decisions make sense for Pandas. Dask.dataframe will adapt as necessary. I suspect that this won't cause a problem. Dask.dataframe will just try the assign operation on an empty dataframe to determine the column ordering. Whatever Pandas does we'll probably do automatically.

Contributor

mrocklin commented Feb 9, 2017

I think that Pandas should make whatever decisions make sense for Pandas. Dask.dataframe will adapt as necessary. I suspect that this won't cause a problem. Dask.dataframe will just try the assign operation on an empty dataframe to determine the column ordering. Whatever Pandas does we'll probably do automatically.

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Feb 9, 2017

Member

Ultimately, I think this we should switch to this behavior and stop sorting.

But I don't see much of a rush. It might be better to wait until Python 3.6 is more widely used, even until pandas 2.0.

Member

shoyer commented Feb 9, 2017

Ultimately, I think this we should switch to this behavior and stop sorting.

But I don't see much of a rush. It might be better to wait until Python 3.6 is more widely used, even until pandas 2.0.

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Feb 9, 2017

Member

I would also be comfortable just changing this for Python 3.6+. My guess is that there aren't that many folks using Python 3.6 in production yet, so this will probably cause minimal issues.

Member

shoyer commented Feb 9, 2017

I would also be comfortable just changing this for Python 3.6+. My guess is that there aren't that many folks using Python 3.6 in production yet, so this will probably cause minimal issues.

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone Sep 21, 2017

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Sep 21, 2017

Contributor

I think we should sort this out for 0.21. I propose

  • No change to 2.7 - 3.5; order is sorted by key
  • For 3.6+, we use the original order the user passed in
  • We don't allow dependent assignment yet

Maybe in the future, when there's less 3.5 and lower code around, we can allow for dependent assignment.

Contributor

TomAugspurger commented Sep 21, 2017

I think we should sort this out for 0.21. I propose

  • No change to 2.7 - 3.5; order is sorted by key
  • For 3.6+, we use the original order the user passed in
  • We don't allow dependent assignment yet

Maybe in the future, when there's less 3.5 and lower code around, we can allow for dependent assignment.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Sep 22, 2017

Member

+1 on your summary

Member

jorisvandenbossche commented Sep 22, 2017

+1 on your summary

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Sep 22, 2017

Contributor

sgtm on @TomAugspurger summary as well.

Contributor

jreback commented Sep 22, 2017

sgtm on @TomAugspurger summary as well.

@bobhaffner

This comment has been minimized.

Show comment
Hide comment
@bobhaffner

bobhaffner Sep 22, 2017

Contributor

HI All, Tom said this was a good one for new contributors so I'm going after it.

Contributor

bobhaffner commented Sep 22, 2017

HI All, Tom said this was a good one for new contributors so I'm going after it.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Sep 25, 2017

Member

Keep this open for the second part: allow dependent assignment somewhere in the future?

Member

jorisvandenbossche commented Sep 25, 2017

Keep this open for the second part: allow dependent assignment somewhere in the future?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Sep 25, 2017

Contributor

I would create a new separate issue if you really want to. That is a long time from now though.

Contributor

jreback commented Sep 25, 2017

I would create a new separate issue if you really want to. That is a long time from now though.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Sep 25, 2017

Member

The title and description in the top post is exactly this, so no need to create a new one I think, if we want an issue for it

Member

jorisvandenbossche commented Sep 25, 2017

The title and description in the top post is exactly this, so no need to create a new one I think, if we want an issue for it

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Dec 15, 2017

Contributor

If you didn't see, as of python 3.7 dictionaries are ordered, as part of the language https://mail.python.org/pipermail/python-dev/2017-December/151283.html

My objection to allowing dependent assignment now was that it'd (potentially) lead to subtle bugs / confusing errors between versions of python. But since Python itself is opening the door, let's slip through too :) I think we should allow dependent assignment in 0.22.

Contributor

TomAugspurger commented Dec 15, 2017

If you didn't see, as of python 3.7 dictionaries are ordered, as part of the language https://mail.python.org/pipermail/python-dev/2017-December/151283.html

My objection to allowing dependent assignment now was that it'd (potentially) lead to subtle bugs / confusing errors between versions of python. But since Python itself is opening the door, let's slip through too :) I think we should allow dependent assignment in 0.22.

@datajanko

This comment has been minimized.

Show comment
Hide comment
@datajanko

datajanko Dec 17, 2017

Contributor

Obviously, I am not good at looking through existing issues. Please accept my aplogogies.

As of #18797 I am also interested in letting assign be able to accept dependent kwargs. I just performed the (in my opinion) naive/direct implementation (after adapting the test cases accordingly) and it works fine - for callables where references to the "depenendet columns" are evaluated lazily. Something like df.assign(b=1, c=df['b']) will - of course - not work.

So if it's okay, that dependent assignments only work for certain callables, I'd be happy to issue a pull request in the next days.

Contributor

datajanko commented Dec 17, 2017

Obviously, I am not good at looking through existing issues. Please accept my aplogogies.

As of #18797 I am also interested in letting assign be able to accept dependent kwargs. I just performed the (in my opinion) naive/direct implementation (after adapting the test cases accordingly) and it works fine - for callables where references to the "depenendet columns" are evaluated lazily. Something like df.assign(b=1, c=df['b']) will - of course - not work.

So if it's okay, that dependent assignments only work for certain callables, I'd be happy to issue a pull request in the next days.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Dec 18, 2017

Contributor

if it's okay, that dependent assignments only work for certain callables

Yes, that sounds right. You'll need to use a callable to refer to newly created columns in the .assign.

Contributor

TomAugspurger commented Dec 18, 2017

if it's okay, that dependent assignments only work for certain callables

Yes, that sounds right. You'll need to use a callable to refer to newly created columns in the .assign.

datajanko pushed a commit to datajanko/pandas that referenced this issue Dec 18, 2017

datajanko
ENH: df.assign accepting dependent **kwargs (pandas-dev#14207)
Specifically, 'df.assign(b=1, c=lambda x:x['b'])'
does not throw an exception in python 3.6 and above.
Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.
@datajanko

This comment has been minimized.

Show comment
Hide comment
@datajanko

datajanko Dec 18, 2017

Contributor

before issuing pull request, I obviously need to provide an entry in the what's new file. I assume other api changes is the right place, correct? Sorry for asking again, but this is may first coding contribution for an open source project.

Contributor

datajanko commented Dec 18, 2017

before issuing pull request, I obviously need to provide an entry in the what's new file. I assume other api changes is the right place, correct? Sorry for asking again, but this is may first coding contribution for an open source project.

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Dec 19, 2017

Contributor
Contributor

chris-b1 commented Dec 19, 2017

datajanko pushed a commit to datajanko/pandas that referenced this issue Dec 19, 2017

datajanko
ENH: df.assign accepting dependent **kwargs (pandas-dev#14207)
Specifically, 'df.assign(b=1, c=lambda x:x['b'])'
does not throw an exception in python 3.6 and above.
Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.

datajanko pushed a commit to datajanko/pandas that referenced this issue Dec 23, 2017

datajanko
ENH: df.assign accepting dependent **kwargs (pandas-dev#14207)
Specifically, 'df.assign(b=1, c=lambda x:x['b'])'
does not throw an exception in python 3.6 and above.
Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.

@jreback jreback modified the milestones: Someday, 0.23.0 Dec 23, 2017

datajanko pushed a commit to datajanko/pandas that referenced this issue Jan 9, 2018

datajanko
ENH: df.assign accepting dependent **kwargs (pandas-dev#14207)
Specifically, 'df.assign(b=1, c=lambda x:x['b'])'
does not throw an exception in python 3.6 and above.
Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.

datajanko pushed a commit to datajanko/pandas that referenced this issue Feb 9, 2018

datajanko
ENH: df.assign accepting dependent **kwargs (pandas-dev#14207)
Specifically, 'df.assign(b=1, c=lambda x:x['b'])'
does not throw an exception in python 3.6 and above.
Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.

populates dsintro and frame.py with examples and warning

- adds example to frame.py
- reworked warning in dsintro
- reworked Notes in frame.py

Remains open:

frame.py probably is responsible vor travis not passing: doc test that requires python 3.6

jreback added a commit that referenced this issue Feb 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment