Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Updating a dataframe counting the cells that will be updated. #6891

Closed
ccsv opened this issue Apr 16, 2014 · 6 comments
Closed

ENH: Updating a dataframe counting the cells that will be updated. #6891

ccsv opened this issue Apr 16, 2014 · 6 comments
Labels

Comments

@ccsv
Copy link

ccsv commented Apr 16, 2014

Counting the cells that were updated when an update is performed to a dataframe.

Currently there are no commands to count the number of cells being updated so you can put it in a log or something when df.update(df2) is performed.

See:
http://stackoverflow.com/questions/23102757/python-pandas-updating-dataframe-and-counting-the-number-of-cells-updated

@danielballan
Copy link
Contributor

It's unclear how this information should be returned to the user. Can you suggest an example of how the usage would work?

Offhand, I think the user should just compare the two DataFrames to get that information. The little-used df.align() might come in handy here.

@jreback
Copy link
Contributor

jreback commented Apr 16, 2014

I suppose could add an argument like
return_counts

but seems a very narrow use case

@cpcloud
Copy link
Member

cpcloud commented Apr 16, 2014

@ccsv Can you give a bit more detail about your use case?

@ccsv
Copy link
Author

ccsv commented Apr 17, 2014

@jreback @cpcloud @danielballan
I would say either a list of cells that were changed or counts.
The reason for this is if you are working with multiple people and someone made an accidental update you can ID data to 'back out' or erase. I can use this function to create a log of my changes.

@jtratner
Copy link
Contributor

This would add a lot of complexity to the internals for, arguably, little gain. If we were to do this, I'd suggest adding a separate method that takes in two DataFrames and returns a description of the differences between them. It's unlikely that we'd produce anything more memory efficient anyways.
That said, I'm not sure there's value in providing some sugar over a few lines of code.

@jreback
Copy link
Contributor

jreback commented May 5, 2014

a cookbook entry would be nice for this

@jreback jreback closed this as completed May 5, 2014
@jreback jreback added the Docs label May 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants