Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/DO NOT MERGE: Categorical improvements #7444

Closed
wants to merge 10 commits into from
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,5 @@ doc/source/vbench
doc/source/vbench.rst
doc/source/index.rst
doc/build/html/index.html
# Windows specific leftover:
doc/tmp.sv
56 changes: 55 additions & 1 deletion doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ Time series-related
Series.tz_localize

String handling
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~
``Series.str`` can be used to access the values of the series as
strings and apply several methods to it. Due to implementation
details the methods show up here as methods of the
Expand Down Expand Up @@ -468,6 +468,60 @@ details the methods show up here as methods of the
StringMethods.upper
StringMethods.get_dummies

.. _api.categorical:

Categorical
~~~~~~~~~~~

.. currentmodule:: pandas.core.categorical

If the Series is of dtype ``category``, ``Series.cat`` can be used to access the the underlying
``Categorical``. This data type is similar to the otherwise underlying numpy array
and has the following usable methods and properties (all available as
``Series.cat.<method_or_property>``).


.. autosummary::
:toctree: generated/

Categorical
Categorical.from_codes
Categorical.levels
Categorical.ordered
Categorical.reorder_levels
Categorical.remove_unused_levels
Categorical.min
Categorical.max
Categorical.mode
Categorical.describe

``np.asarray(categorical)`` works by implementing the array interface. Be aware, that this converts
the Categorical back to a numpy array, so levels and order information is not preserved!

.. autosummary::
:toctree: generated/

Categorical.__array__

To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
are also introduced.

.. autosummary::
:toctree: generated/

Categorical.from_array
Categorical.get_values
Categorical.copy
Categorical.dtype
Categorical.ndim
Categorical.sort
Categorical.equals
Categorical.unique
Categorical.order
Categorical.argsort
Categorical.fillna


Plotting
~~~~~~~~
.. currentmodule:: pandas
Expand Down
8 changes: 7 additions & 1 deletion doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1574,7 +1574,8 @@ dtypes:
'float64': np.arange(4.0, 7.0),
'bool1': [True, False, True],
'bool2': [False, True, False],
'dates': pd.date_range('now', periods=3).values})
'dates': pd.date_range('now', periods=3).values}),
'category': pd.Categorical(list("ABC))
df['tdeltas'] = df.dates.diff()
df['uint64'] = np.arange(3, 6).astype('u8')
df['other_dates'] = pd.date_range('20130101', periods=3).values
Expand Down Expand Up @@ -1630,6 +1631,11 @@ All numpy dtypes are subclasses of ``numpy.generic``:

subdtypes(np.generic)

.. note::

Pandas also defines an additional ``category`` dtype, which is not integrated into the normal
numpy hierarchy and wont show up with the above function.

.. note::

The ``include`` and ``exclude`` parameters must be non-string sequences.
Loading