You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This solution feels overly complicated for such a simple task.
Describe the solution you'd like
There are a few options for what nicer code might look like.
If ordered and unordered categoricals had different dtypes (as in R with factor vs. ordered), then I could just write people.select_dtypes("ordered"). Unfortunately, this would have breaking changes for all other code that assumes the dtype of ordered categoricals.
If dataframe-level .cat.* methods existed, I could write something like
is_ordered=people.cat.ordered# should return [False, pd.NA, True]people.loc[:, is_ordered&pd.notnull(is_ordered)]
A variation on this might be to have more specialized equivalents of .api.types.is_categorical_dtype(), perhaps .api.types.is_ordered_categorical_dtype() and .api.types.is_unordered_categorical_dtype().
API breaking implications
The first option mentioned above has API breaking changes; the other two options do not.
I am new to pandas, is there a huge performance overhead for apply on DataFrame? If not, then for the specialized version, we can stay consistent with the input of is_categorical_dtype(). A similar boolean array can be achieved with df.apply(pd.api.types.is_ordered_categorical_dtype), though it will be without the pd.NA to signal non-categorical columns.
To retrieve the columns we can then simply do this: people.loc[:, people.apply(pd.api.types.is_ordered_categorical_dtype)]
Is your feature request related to a problem?
I'd like to be able to easily select only ordered categorical columns, or only unordered categorical columns, from a dataframe.
Example
Here's an example dataset:
Here,
eye_color
is an unordered categorical column,age_group
is an ordered categorical column, andage
is numeric. I want just theage_group
column.My best attempt at selecting ordered categorical columns is
This solution feels overly complicated for such a simple task.
Describe the solution you'd like
There are a few options for what nicer code might look like.
If ordered and unordered categoricals had different dtypes (as in R with
factor
vs.ordered
), then I could just writepeople.select_dtypes("ordered")
. Unfortunately, this would have breaking changes for all other code that assumes the dtype of ordered categoricals.If dataframe-level
.cat.*
methods existed, I could write something likeA variation on this might be to have more specialized equivalents of
.api.types.is_categorical_dtype()
, perhaps.api.types.is_ordered_categorical_dtype()
and.api.types.is_unordered_categorical_dtype()
.API breaking implications
The first option mentioned above has API breaking changes; the other two options do not.
Additional context
I asked the internet for better solutions; no response so far.
The text was updated successfully, but these errors were encountered: