Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359

Closed
wants to merge 7 commits into from

Conversation

chrish42
Copy link
Contributor

@chrish42 chrish42 commented Jun 3, 2016

@codecov-io
Copy link

codecov-io commented Jun 3, 2016

Current coverage is 84.23%

Merging #13359 into master will increase coverage by <.01%

@@             master     #13359   diff @@
==========================================
  Files           138        138          
  Lines         50724      50736    +12   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42726      42737    +11   
- Misses         7998       7999     +1   
  Partials          0          0          

Powered by Codecov. Last updated by 103f7d3...e7c8313

@@ -4877,13 +4877,26 @@ def test_read_nokey(self):
df = DataFrame(np.random.rand(4, 5),
index=list('abcd'),
columns=list('ABCDE'))
# Categorical dtype not supported for "fixed" format. So no need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

@jreback jreback added Bug IO HDF5 read_hdf, HDFStore labels Jun 4, 2016
@jreback
Copy link
Contributor

jreback commented Jun 4, 2016

looks good. I think there is a failing case that needs to be tested.

key = keys[0]
groups = store.groups()
if len(groups) == 0:
raise ValueError('No dataset in HDF file.')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that ValueError is the right exception here, but at least it's the same type as the exception raised by 0.18.1. (Although the message for 0.18.1 in this case is "key must be provided when HDF file contains multiple datasets.", which is a bit confusing.) And by the way, the exception raised when trying to do pd.read_hdf('empty.h5', 'some_key') is (sensibly) "KeyError: 'No object named some_key in the file'". But raising KeyError for the case where key=None and we are trying to automatically figure out the (single, valid) key in the file seems wrong to me. Let me know if you can think of a better exception (or set of exceptions) for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok. The user is explicity trying to do something which doesn't work. Its the same type of error when you have multiple keys, so its consistent. (and ok)

@jreback jreback added this to the 0.18.2 milestone Jun 5, 2016
@jreback jreback closed this in 5a9b498 Jun 5, 2016
@jreback
Copy link
Contributor

jreback commented Jun 5, 2016

thanks @chrish42

@chrish42 chrish42 deleted the gh13231 branch June 5, 2016 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic detection of HDF5 dataset identifier fails when data contains categoricals
3 participants