Don't make dropping missing rows a default behavior for HDF append()? #9382

nickeubank · 2015-01-31T19:55:54Z

Hi All,

At the moment, the default behavior for the HDF append() function ( docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.HDFStore.append.html?highlight=append#pandas.HDFStore.append ) is to silently drop all rows that are all NaN except for the index.

As I understand it from a PyData exchange with Jeff, the reason is that people working with panels often have sparse datasets, so this is a very reasonable default.

However, while I appreciate the appeal for time-series analysis, I think this is a dangerous default. The main reason is that the assumption is that if an index has a value but the columns do not, there is no meaningful data in the row. But while true in a time series context -- where it's easy to reconstruct the index values that are dropped -- if indexes contain information like userIDs, sensor codes, place names, etc., the index itself is meaningful, and not easy to reconstruct. Thus the default behavior is potentially deleting user data without a warning.

Given the trade-off between a default that may lead to inefficient storage (dropna = False) and one that potentially erases user data (dropna = True), I think we should error on the side of data preservation.

nickeubank · 2015-02-05T21:07:46Z

I'm a little new to the open-source world -- should I be doing something more than waiting for input at this point, and if none comes, should I do nothing, or make changes? Thanks!

jreback · 2015-02-05T21:28:49Z

well you can go ahead and make a pull request if you would like

nickeubank · 2015-02-05T22:51:56Z

OK -- Do you have a position John? I know you did the hard work of creating this, so I don't want to adjust without your input!

jreback · 2015-02-05T23:00:27Z

I think changing the default is ok

you will have to adjust some tests
pls provide a release note that shows the prior and new behavior as this is an api change

nickeubank · 2015-02-12T22:38:39Z

OK, great. This will be my first edit on a big project, so will likely take a few days to figure out how to do it right, but i'm on it!

nickeubank · 2015-02-13T19:41:51Z

Submitted as Pull Request #9484

Where do I add notes for API change?

jreback · 2015-02-13T20:27:58Z

you would need to add a mini section in the whatsnew for 0.16.0 under api changes

nickeubank · 2015-02-13T21:29:53Z

Great, done! Thanks for the hand-holding!

jreback added API Design IO HDF5 read_hdf, HDFStore labels Feb 2, 2015

nickeubank mentioned this issue Feb 13, 2015

Default values for dropna to "False" (issue 9382) #9484

Closed

nickeubank mentioned this issue May 9, 2015

Default values for dropna to "False" (issue 9382) #10097

Merged

jreback added this to the 0.17.0 milestone May 10, 2015

jreback closed this as completed in #10097 Jul 31, 2015

rhshadrach mentioned this issue May 6, 2023

PDEP-11: Change default of dropna to False #53094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't make dropping missing rows a default behavior for HDF append()? #9382

Don't make dropping missing rows a default behavior for HDF append()? #9382

nickeubank commented Jan 31, 2015

nickeubank commented Feb 5, 2015

jreback commented Feb 5, 2015

nickeubank commented Feb 5, 2015

jreback commented Feb 5, 2015

nickeubank commented Feb 12, 2015

nickeubank commented Feb 13, 2015

jreback commented Feb 13, 2015

nickeubank commented Feb 13, 2015

Don't make dropping missing rows a default behavior for HDF append()? #9382

Don't make dropping missing rows a default behavior for HDF append()? #9382

Comments

nickeubank commented Jan 31, 2015

nickeubank commented Feb 5, 2015

jreback commented Feb 5, 2015

nickeubank commented Feb 5, 2015

jreback commented Feb 5, 2015

nickeubank commented Feb 12, 2015

nickeubank commented Feb 13, 2015

jreback commented Feb 13, 2015

nickeubank commented Feb 13, 2015