Skip to content

Conversation

topper-123
Copy link
Contributor

Bug when indexing with .loc and the index is a CategoricalIndex with integer or float categories.

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch from 49eb424 to 5167024 Compare November 28, 2019 23:03
@pep8speaks
Copy link

pep8speaks commented Nov 28, 2019

Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-11 07:13:21 UTC

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch from 5167024 to 364f61e Compare November 28, 2019 23:05
try:
return self.categories._convert_scalar_indexer(key, kind=kind)
except TypeError:
self._invalid_indexer("label", key=key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this TypeError hit?

Copy link
Contributor Author

@topper-123 topper-123 Nov 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was hit when you have pd.Series([1,2], index=pd.CategoricalIndex(["a", "b"])).loc[1], i.e. wrong indexing type. It's the same error type as in s = pd.Series([1,2], index=["a", "b"]).loc[1].

I've added a test for it.

@jreback jreback added Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 29, 2019
@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch 2 times, most recently from 4751ca7 to f2274d4 Compare November 30, 2019 11:38
@jreback jreback added this to the 1.0 milestone Dec 1, 2019
if self.categories._defer_to_indexing:
return self.categories._convert_scalar_indexer(key, kind=kind)

if kind == "loc" or self.categories._defer_to_indexing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might not need the _defer_to_indexing any longer, as I think the only way to get here is kind=='loc'

@topper-123
Copy link
Contributor Author

Ill be abroad untill the weekend, will look at it then. I think it would cause a error with categories made from datetimeindex, but I’ll check.

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch 3 times, most recently from e9f5832 to 746d6d4 Compare December 7, 2019 23:29
@topper-123
Copy link
Contributor Author

I`ve updated the PR: I´ve removed _defer_to_indexing and added some tests.

result = df.loc[idx_values[0]]
expected = Series(["foo"], index=["A"], name=idx_values[0])
tm.assert_series_equal(result, expected)
# list selection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a blank line between cases

np.array([1, 2, 3], dtype=dtype)
for dtype in [
np.int8,
np.int16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any way to use the fixtures (or definitions) in pandas/conftest.py for some of this

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch from 746d6d4 to 7316414 Compare December 11, 2019 05:29
@topper-123
Copy link
Contributor Author

I've updated and use the dtypes from pandas/conftest.py instead.

[1.5, 2.5, 3.5],
[-1.5, -2.5, -3.5],
# numpy int/uint
*[np.array([1, 2, 3], dtype=dtype) for dtype in conftest.ALL_INT_DTYPES],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! nice comprehensive test

@jreback jreback merged commit 27836e9 into pandas-dev:master Dec 11, 2019
@jreback
Copy link
Contributor

jreback commented Dec 11, 2019

thanks @topper-123 very nice!

@topper-123 topper-123 deleted the categorical_indexing_with_non_string_categories branch December 11, 2019 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't .loc[label] on a CategoricalIndex with labels being integer
3 participants