Feature Request: categorical.reset_order #9190

jseabold · 2015-01-02T22:30:24Z

I was thinking about trying to add a to_unordered method, then I thought maybe a reset_order (or reorder?) with an optional drop keyword à la reset_index makes more sense. I didn't see if this was possible yet, so this is also a question. Is this possible via some other syntactic sugar? I might see if I can hack this together at some point unless someone beats me to it. My motivation for this is that I'm getting all ordered or unordered factors using read_stata. This could be "fixed" there by taking a list in addition to the boolean convert to ordered or whatever, but I think a method like this would be generally useful, plus I never peak at data before reading it.

The text was updated successfully, but these errors were encountered:

jreback · 2015-01-02T22:40:54Z

http://pandas.pydata.org/pandas-docs/stable/categorical.html#reordering ?
assume you saw this: http://pandas.pydata.org/pandas-docs/stable/io.html#io-stata-categorical

jseabold · 2015-01-05T15:28:44Z

Thanks, yeah that's helpful. AFAICT, it still doesn't look like there's a way to just drop the ordering though correct?

Re: read_stata, yeah I saw that, but it's an all or nothing proposition. No way to pass a list. Would be an easy fix. I'll look at it.

Unrelated, I'm sure it was discussed ad nauseam but I was also surprised that ordered is the default for Categorical. In my experience, unordered is more common.

jreback · 2015-01-05T19:00:01Z

In [1]: import pandas as pd                                    

In [2]: s = pd.Series(list('aabcd'),dtype='category')          

In [3]: s                                                      
Out[3]:                                                        
0    a                                                         
1    a                                                         
2    b                                                         
3    c                                                         
4    d                                                         
dtype: category                                                
Categories (4, object): [a < b < c < d]                        

In [4]: s.cat.ordered                                          
Out[4]: True                                                   

In [5]: s.cat.ordered = False                                  

In [6]: s                                                      
Out[6]:                                                        
0    a                                                         
1    a                                                         
2    b                                                         
3    c                                                         
4    d                                                         
dtype: category                                                
Categories (4, object): [a, b, c, d]

just set the ordered flag to drop the ordering.

cc @JanSchulz .... do you recall the exact discussion w.r.t. ordered being True by default?

jorisvandenbossche · 2015-01-05T23:04:17Z

@jreback Your example above of resetting the order can maybe be added to the docs (I didn't directly see this now in the docs. They speak about setting the order and sorting, but I did not find this)

jseabold · 2015-01-05T23:22:04Z

Oh, nice. I still think a method for for changing the state of the object
would be nice too. Methods are more or less self-documenting.

bashtage · 2015-01-06T15:15:09Z

The default to ordered=True was due to the fact that Stata only stores numerical data so that it is always possibly to order according to the numeric values, and that it was trivial to drop the ordering if needed, but non-trivial to re-assign it if read in as unordered.

Oops. this was only w.r.t read_stata, not categorical creation, if that is the nature of the above question.

jankatins · 2015-01-06T21:02:33Z

In the discussion, we wanted to have a ordered categorical when the underlying data had an order, which is the case in most cases (ints, strings, ... are all orderable). So ordered actually defaults to false, but the default is not used in most cases...:

    def __init__(self, values, categories=None, ordered=None, name=None, fastpath=False,
                 levels=None):
    [...]
    # case without explicit categories
                # If the underlying data structure was sortable, and the user doesn't want to
                # "forget" this order, the categorical also is sorted/ordered
                if ordered is None:
                    ordered = True
    # case with explicit categories
            # if we got categories, we can assume that the order is intended
            # if ordered is unspecified
            if ordered is None:
                ordered = True
    [...]
    self.ordered = False if ordered is None else ordered

regarding a drop_ordering (or remove_ordering?): I'm not so sure what would be the expected results:

just the same as ordered=False or
also remove any ordering of the categories and 'resort' to the default order, which is defined on the individual elements (e.g. categories.sort())?

jseabold · 2015-01-07T00:12:36Z

@bashtage Yeah, I just meant with Categorical in general.

@JanSchulz Hmm. I think the rationale for the default should be based on what's more common in the real world. My prior is that unordered factors are much more common. R defaults to unordered unless ordered=TRUE. Do people complain about this? Seems sane to me.

Re: drop ordering, it would just be the same as ordered=False. If the defaults don't change, I suspect people are going to be calling this a lot.

jankatins · 2015-01-22T12:38:19Z

@jreback, @jseabold: I've no real preference on that default (i.e. I understand both rationales), but if that should change we should do it as early as possible as that's an API change...

jankatins · 2015-01-22T12:39:15Z

If stata has ordered==False read_stata should build categoricals with a different default.

bashtage · 2015-01-22T14:57:19Z

Stata's datafile format does not explicitly allow a determination of whether a labeled variable ordered or not - only the end user has this information. The primary reasons to import as an ordered categorical is that

Order can be trivially removed. As an aside, I'm not sure I even understand the issue of having an unordered categorical stored as an ordered, aside from mental bookeeping by the end user.
The ordinal information contained in the Stata data, if useful, is lost if imported as an unordered categorical

jankatins · 2015-01-24T22:28:36Z

This topic is now also in #9347: should s.cat.order setable or only readable. If the latter, then a explicit as_unordered() or as_ordered() (or some other method) makes sense.

jorisvandenbossche added Docs Categorical Categorical Data Type labels Jan 5, 2015

jreback added this to the 0.16.0 milestone Jan 5, 2015

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

This was referenced Mar 6, 2015

Categorical: don't sort the categoricals if Categorical(..., ordered=False) #9347

Closed

API: deprecate setting of .ordered directly (GH9347, GH9190) #9611

Closed

API: deprecate setting of .ordered directly (GH9347, GH9190) #9622

Merged

jreback closed this as completed in #9622 Mar 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: categorical.reset_order #9190

Feature Request: categorical.reset_order #9190

jseabold commented Jan 2, 2015

jreback commented Jan 2, 2015

jseabold commented Jan 5, 2015

jreback commented Jan 5, 2015

jorisvandenbossche commented Jan 5, 2015

jseabold commented Jan 5, 2015

bashtage commented Jan 6, 2015

jankatins commented Jan 6, 2015

jseabold commented Jan 7, 2015

jankatins commented Jan 22, 2015

jankatins commented Jan 22, 2015

bashtage commented Jan 22, 2015

jankatins commented Jan 24, 2015

Feature Request: categorical.reset_order #9190

Feature Request: categorical.reset_order #9190

Comments

jseabold commented Jan 2, 2015

jreback commented Jan 2, 2015

jseabold commented Jan 5, 2015

jreback commented Jan 5, 2015

jorisvandenbossche commented Jan 5, 2015

jseabold commented Jan 5, 2015

bashtage commented Jan 6, 2015

jankatins commented Jan 6, 2015

jseabold commented Jan 7, 2015

jankatins commented Jan 22, 2015

jankatins commented Jan 22, 2015

bashtage commented Jan 22, 2015

jankatins commented Jan 24, 2015