BUG: fix categories in HDFStore not filtering correctly (#13322) #13792

shawnheide · 2016-07-25T23:21:30Z

closes categories in HDFStore don't filter correctly #13322
tests passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Fixed bug in pytables.py where series with categories were not being filtered properly.

jreback · 2016-07-25T23:32:13Z

always start with the tests

shawnheide · 2016-07-25T23:34:24Z

Sorry Jeff, I'm not sure what you mean. Are you saying I should add a unit test for this?

jreback · 2016-07-25T23:38:24Z

ANY change requires a test.

jreback · 2016-07-25T23:39:15Z

at a minimum, tests that reproduce the original issue. You FIRST right the tests, make sure they replicate the failure, THEN write code to fix. You have fixed it when your tests then pass and nothing else breaks.

shawnheide · 2016-07-26T00:09:41Z

Thanks for filling me in Jeff. Sorry for having to spell it out for me, but this is the first non-documentation PR I've submitted for pandas. I couldn't find any tests related to pytables but it seems like tests/frame/test_misc_api might be a good spot. Any insight would be appreciated. Thanks.

This is the code I wrote to test the fix before I submitted the pull:

obsids = ['ESP_012345_6789', 'ESP_987654_3210']
imgids = ['APF00006np', 'APF0001imm']
data = [4.3, 9.8]

df = pd.DataFrame(dict(obsids=obsids, imgids=imgids, data=data))
df.to_hdf('testdf_no_cats.hdf', 'df',format='t', data_columns=True)

df.obsids = df.obsids.astype('category')
df.imgids = df.imgids.astype('category')
df.to_hdf('testdf_with_cats.hdf', 'df',format='t', data_columns=True)

# df without categories
df2 = pd.read_hdf('testdf_no_cats.hdf', 'df', where='obsids=B')
assert(len(df2) == 0)

# df with categories
df3 = pd.read_hdf('testdf_with_cats.hdf', 'df', where='obsids=B')
assert(len(df3) == 0)

jreback · 2016-07-26T00:41:09Z

https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_pytables.py

shawnheide · 2016-07-26T02:24:58Z

Thanks Jeff, I completely missed that, I incorrectly assumed everything was in the one tests folder. Just submitted a new commit after confirming the test passed with my changes.

codecov-io · 2016-07-26T07:06:11Z

Current coverage is 85.23% (diff: 100%)

Merging #13792 into master will increase coverage by <.01%

@@             master     #13792   diff @@
==========================================
  Files           140        140          
  Lines         50420      50422     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42975      42977     +2   
  Misses         7445       7445          
  Partials          0          0

Powered by Codecov. Last update cc216ad...a461208

jreback · 2016-07-27T10:32:52Z

pandas/io/tests/test_pytables.py

@@ -237,6 +237,30 @@ def roundtrip(key, obj, **kwargs):
        finally:
            safe_remove(path)

+    def test_conv_categorical(self):
+


move this right after test_categorical, and rename to test_categorical_conversion

jreback · 2016-07-27T10:33:21Z

pls add a whatsnew in bug fix section.

jreback · 2016-07-27T10:35:25Z

pandas/computation/pytables.py

@@ -197,7 +197,10 @@ def stringify(value):
            return TermValue(int(v), v, kind)
        elif meta == u('category'):
            metadata = com._values_from_object(self.metadata)
-            result = metadata.searchsorted(v, side='left')


I would do this like:

result = metadata.searchsorted(v, side='left') if not result and v not in metadata: result = -1

then the 'in' is only done if the result is not found (as opposed to always)

shawnheide · 2016-07-27T20:33:49Z

@jreback Thanks for all of the feedback. I really appreciate you showing me how to do things the right way. I made the changes you requested and rebased the commit.

jreback · 2016-07-27T21:58:47Z

doc/source/whatsnew/v0.19.0.txt

@@ -786,3 +786,4 @@ Bug Fixes
 - Bugs in ``Index.difference`` and ``DataFrame.join`` raise in Python3 when using mixed-integer indexes (:issue:`13432`, :issue:`12814`)

 - Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)
+- Bug in ``pd.read_hdf()`` returns incorrect result when HDF Store contains a DataFrame with a categorical column (:issue:`13792`)


be more specific; this only doesn't work when the query has NO values that are found in the categorical.

…3322)

jreback · 2016-07-29T00:20:39Z

thanks @shawnheide

jreback added the IO HDF5 read_hdf, HDFStore label Jul 25, 2016

shawnheide force-pushed the BUG_13322 branch from 21dd48b to 61f470d Compare July 26, 2016 02:18

jreback reviewed Jul 27, 2016
View reviewed changes

jreback added the Bug label Jul 27, 2016

jreback reviewed Jul 27, 2016
View reviewed changes

shawnheide force-pushed the BUG_13322 branch from 61f470d to 5c5c6ab Compare July 27, 2016 20:31

jreback reviewed Jul 27, 2016
View reviewed changes

BUG: fix categories in HDFStore not filtering correctly (pandas-dev#1…

a461208

…3322)

shawnheide force-pushed the BUG_13322 branch from 5c5c6ab to a461208 Compare July 28, 2016 00:40

jreback closed this in e908733 Jul 29, 2016

jreback added this to the 0.19.0 milestone Jul 29, 2016

shawnheide deleted the BUG_13322 branch August 10, 2016 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix categories in HDFStore not filtering correctly (#13322) #13792

BUG: fix categories in HDFStore not filtering correctly (#13322) #13792

shawnheide commented Jul 25, 2016 •

edited

jreback commented Jul 25, 2016

shawnheide commented Jul 25, 2016

jreback commented Jul 25, 2016

jreback commented Jul 25, 2016

shawnheide commented Jul 26, 2016

jreback commented Jul 26, 2016

shawnheide commented Jul 26, 2016

codecov-io commented Jul 26, 2016 •

edited

jreback Jul 27, 2016

jreback commented Jul 27, 2016

jreback Jul 27, 2016

shawnheide commented Jul 27, 2016

jreback Jul 27, 2016

jreback commented Jul 29, 2016

BUG: fix categories in HDFStore not filtering correctly (#13322) #13792

BUG: fix categories in HDFStore not filtering correctly (#13322) #13792

Conversation

shawnheide commented Jul 25, 2016 • edited

jreback commented Jul 25, 2016

shawnheide commented Jul 25, 2016

jreback commented Jul 25, 2016

jreback commented Jul 25, 2016

shawnheide commented Jul 26, 2016

jreback commented Jul 26, 2016

shawnheide commented Jul 26, 2016

codecov-io commented Jul 26, 2016 • edited

Current coverage is 85.23% (diff: 100%)

jreback Jul 27, 2016

Choose a reason for hiding this comment

jreback commented Jul 27, 2016

jreback Jul 27, 2016

Choose a reason for hiding this comment

shawnheide commented Jul 27, 2016

jreback Jul 27, 2016

Choose a reason for hiding this comment

jreback commented Jul 29, 2016

shawnheide commented Jul 25, 2016 •

edited

codecov-io commented Jul 26, 2016 •

edited