Skip to content

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 13, 2014

No description provided.

@shoyer
Copy link
Member Author

shoyer commented Aug 13, 2014

closing until I make more progress

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is more used like a _constructor(...).

@jankatins
Copy link
Contributor

Maybe put in the _finalize... thingies which are used in pandas/numpy? Each method would need a change to call finalize then on new categoricals...

@jreback
Copy link
Contributor

jreback commented Aug 13, 2014

@shoyer so you want an n-dim codes array (rather than 1-d). How would this be used?
conceptually this is the 'same' as a DataFrame of categorical series (that maybe share levels, for 2-d). What is your usecase for this (aside from xray)?

@shoyer
Copy link
Member Author

shoyer commented Aug 13, 2014

@jreback I expect that allowing for n-dimensional categoricals could lead to much higher performance for a DataFrame with multiple columns of the same type, in the same way that multi-dimensional arrays are positive for performance for other dtypes.

e.g., you could call unstack on a Categorical series essentially for free (since it's just a numpy reshape under the covers). (though I'm not sure enough about the pandas Block system to be sure about this)

@jreback
Copy link
Contributor

jreback commented Aug 13, 2014

@shoyer ok, that is reasonable. And in fact right now Categoricals are kept in completely separate blocks regardless of their internal structure. It is possible to combine them, so your soln would make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants