Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current implementation of DataFrame.apply where passed function has side effects is a real "gotcha" #10222

Closed
jrenner opened this issue May 28, 2015 · 6 comments
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jrenner
Copy link

jrenner commented May 28, 2015

From the documentation for DataFrame.apply:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

I am sure there are well-thought out reasons why it is currently implemented this way, but this behavior can be a real "gotcha". I grant that having side effects inside an apply function is not good standard practice, but I would argue that there are times when it is a good solution to a problem. ( I can elaborate on this if needed). Thankfully this behavior is documented, but I think it is reasonable to expect that most users will not always be mindful of secondary notes that exist throughout the documentation, and the source of problems caused by this behavior is not at all obvious.

While I don't understand the engineering issues regarding the fast/slow path, I would suggest that some better solution to the problem be introduced.

@jreback
Copy link
Contributor

jreback commented May 28, 2015

there is no reason I have ever seen to actually modify something in an apply. that said, you are welcome to try to fix this if you want. It is doable.

@jreback
Copy link
Contributor

jreback commented May 28, 2015

xref #6753, #2936

@jrenner
Copy link
Author

jrenner commented May 28, 2015

Thanks, I will look into it and see if it something I can tackle

@jreback jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 29, 2015
@jreback
Copy link
Contributor

jreback commented Jan 27, 2016

dupe of #2936

@mpschr
Copy link

mpschr commented Feb 9, 2017

Just stumbled upon this. I don't mean to sound harsh below but I must vocalize a bit my suprise ;)

Given the popularity of the pandas library, we can assume that this 'feature' will seriously impact scientific results and business decisions if the apply function does transform data (twice) and it goes unnoticed - easily possible in settings where many groups are formed in big data sets.

As for possible improvements:

  • The code should emit a warning as it does with deprecation. Although there is a warning somewhere in the doc, this is highly unnatural behaviour and nobody will expect this.
  • Additionally after the groupby has decided on fast or slow path, why not start over with only one path in order to avoid confusion?

I hope the person knowing this piece of code in and out reads this.

Greetings!

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

@mpschr well, a pull-request to do either/both would be great! (both are not very difficult).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants