Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359

chrish42 · 2016-06-03T23:07:02Z

closes Automatic detection of HDF5 dataset identifier fails when data contains categoricals #13231
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

…ork when storing a dataframe that contains categorical data.

codecov-io · 2016-06-03T23:37:49Z

Current coverage is 84.23%

Merging #13359 into master will increase coverage by <.01%

@@             master     #13359   diff @@
==========================================
  Files           138        138          
  Lines         50724      50736    +12   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42726      42737    +11   
- Misses         7998       7999     +1   
  Partials          0          0

Powered by Codecov. Last updated by 103f7d3...e7c8313

jreback · 2016-06-04T16:14:47Z

pandas/io/tests/test_pytables.py

@@ -4877,13 +4877,26 @@ def test_read_nokey(self):
        df = DataFrame(np.random.rand(4, 5),
                       index=list('abcd'),
                       columns=list('ABCDE'))
+        # Categorical dtype not supported for "fixed" format. So no need


jreback · 2016-06-04T16:18:05Z

looks good. I think there is a failing case that needs to be tested.

chrish42 · 2016-06-04T20:09:22Z

pandas/io/pytables.py

-            key = keys[0]
+            groups = store.groups()
+            if len(groups) == 0:
+                raise ValueError('No dataset in HDF file.')


Not sure that ValueError is the right exception here, but at least it's the same type as the exception raised by 0.18.1. (Although the message for 0.18.1 in this case is "key must be provided when HDF file contains multiple datasets.", which is a bit confusing.) And by the way, the exception raised when trying to do pd.read_hdf('empty.h5', 'some_key') is (sensibly) "KeyError: 'No object named some_key in the file'". But raising KeyError for the case where key=None and we are trying to automatically figure out the (single, valid) key in the file seems wrong to me. Let me know if you can think of a better exception (or set of exceptions) for this.

I think this is ok. The user is explicity trying to do something which doesn't work. Its the same type of error when you have multiple keys, so its consistent. (and ok)

jreback · 2016-06-05T14:08:29Z

thanks @chrish42

chrish42 added 5 commits June 2, 2016 16:46

Use if-expression.

02f90d5

Add test that fails for GitHub bug pandas-dev#13231

b3a5773

Tweak comment to be clearer.

2f41aef

Make logic that detects if there is only one dataset in a HDF5 file w…

df10016

…ork when storing a dataframe that contains categorical data.

Add changelog entry.

e7c8313

jreback reviewed Jun 4, 2016
View reviewed changes

jreback added Bug IO HDF5 read_hdf, HDFStore labels Jun 4, 2016

chrish42 added 2 commits June 4, 2016 12:11

Formatting fixes.

611aa28

Raise a better exception when the HDF file is empty and kwy=None.

e839638

chrish42 reviewed Jun 4, 2016
View reviewed changes

jreback added this to the 0.18.2 milestone Jun 5, 2016

jreback closed this in 5a9b498 Jun 5, 2016

chrish42 deleted the gh13231 branch June 5, 2016 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359

Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359

chrish42 commented Jun 3, 2016

codecov-io commented Jun 3, 2016 •

edited

Loading

jreback Jun 4, 2016

jreback commented Jun 4, 2016

chrish42 Jun 4, 2016

jreback Jun 5, 2016

jreback commented Jun 5, 2016

Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359

Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359

Conversation

chrish42 commented Jun 3, 2016

codecov-io commented Jun 3, 2016 • edited Loading

Current coverage is 84.23%

jreback Jun 4, 2016

Choose a reason for hiding this comment

jreback commented Jun 4, 2016

chrish42 Jun 4, 2016

Choose a reason for hiding this comment

jreback Jun 5, 2016

Choose a reason for hiding this comment

jreback commented Jun 5, 2016

codecov-io commented Jun 3, 2016 •

edited

Loading