Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CategoricalIndex #7629

Closed
jankatins opened this issue Jul 1, 2014 · 9 comments · Fixed by #9741
Closed

Implement CategoricalIndex #7629

jankatins opened this issue Jul 1, 2014 · 9 comments · Fixed by #9741
Labels
Categorical Categorical Data Type Enhancement Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@jankatins
Copy link
Contributor

When #7217 becomes available, it would be nice to also have a 'CategoricalIndex' so that the sorting behaviour based on levels will be preserved when a categorical becomes a index:

cats = pd.Categorical([1,2,3,4], levels=[4,2,3,1])
strings = ["a","b","c","d"]
values = [4,2,3,1]
df = pd.DataFrame({"strings":strings, "values":values}, index=cats)
df.index
# This should sort by levels but does not as there is no CategoricalIndex!
df.sort_index()

CC: @shoyer

@jreback jreback changed the title Implement a ' CategoricalIndex' Implement CategoricalIndex Jul 1, 2014
@jreback jreback added this to the 0.15.0 milestone Jul 1, 2014
@shoyer
Copy link
Member

shoyer commented Aug 20, 2014

It occurs to me now (probably already obvious to others) that the implementation of Categorical is basically the same as a single level MultiIndex. So that would probably be a good starting point.

@jankatins
Copy link
Contributor Author

I haven't looked into MultiIndex but in my mind it makes more sense to model it after a Int/string index or normal numpy backed ones, as Categorical is in my mind a np.array and so should simple get all the methods to make it work as a "backend" of an Index. The only difference to a string is that i can be differently sorted ("one" < "two" < "three") and that the categorical should be accessible like in Series.cat.

So this might be as easy as coping the Index type for string, make that accept only a Categoricalas input, make sure that the Index constructor maps to that and then see what breaks with a AttributeError.

@mrocklin
Copy link
Contributor

I would also like to see a categorical index. This is important when trying to avoid object dtypes for efficiency's sake.

@shoyer
Copy link
Member

shoyer commented Mar 17, 2015

In all seriousness, this should not be very hard to do -- much easier than my IntervalIndex PR (which I will finish eventually).

The main unresolved question is how to handle reindexing with .loc. Should the new index always be a CategoricalIndex? Even if some values do not have matching categories in the original categorical? For CategoricalIndex and IntervalIndex, I think we'll need to define a special rule that using .loc with values not found in the index is not possible.

@mrocklin
Copy link
Contributor

This is fairly important to me. What is the best way to push on this?

@TomAugspurger
Copy link
Contributor

I've got some time over the next couple nights. Let me give it a shot quick.

We've got an RC for 0.16.0 out now. Would we shove this in for 0.16.0? Nobody really uses RCs anyway :) so I'm not sure we'd get any bug reports.

@jreback
Copy link
Contributor

jreback commented Mar 18, 2015

this might be possible for 0.16.1. 0.16.0 is coming out in 2 days.

as @shoyer mentions, this is not that difficult, but some api decisions to make

@TomAugspurger
Copy link
Contributor

Fair enough. It may be best to just to skip a 0.16.1 and do a a 0.17 soon after 0.16. This could potentially break API (implicitly, since I don't think the behavior of pd.Index(categorical) was documented).

@jreback
Copy link
Contributor

jreback commented Mar 19, 2015

we'll see how it goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Enhancement Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants