Skip to content

Commit

Permalink
Backport PR #52763 on branch 2.0.x (BUG: interchange categorical_colu…
Browse files Browse the repository at this point in the history
…mn_to_series() should not accept only PandasColumn) (#52793)

Backport PR #52763: BUG: interchange categorical_column_to_series() should not accept only PandasColumn

Co-authored-by: Marco Edward Gorelli <33491632+MarcoGorelli@users.noreply.github.com>
  • Loading branch information
meeseeksmachine and MarcoGorelli committed Apr 20, 2023
1 parent dd8533e commit b49d6de
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 4 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Bug fixes
- Bug in :attr:`Series.dt.days` that would overflow ``int32`` number of days (:issue:`52391`)
- Bug in :class:`arrays.DatetimeArray` constructor returning an incorrect unit when passed a non-nanosecond numpy datetime array (:issue:`52555`)
- Bug in :func:`Series.median` with :class:`ArrowDtype` returning an approximate median (:issue:`52679`)
- Bug in :func:`api.interchange.from_dataframe` was unnecessarily raising on-categorical dtypes (:issue:`49889`)
- Bug in :func:`pandas.testing.assert_series_equal` where ``check_dtype=False`` would still raise for datetime or timedelta types with different resolutions (:issue:`52449`)
- Bug in :func:`read_csv` casting PyArrow datetimes to NumPy when ``dtype_backend="pyarrow"`` and ``parse_dates`` is set causing a performance bottleneck in the process (:issue:`52546`)
- Bug in :func:`to_datetime` and :func:`to_timedelta` when trying to convert numeric data with a :class:`ArrowDtype` (:issue:`52425`)
Expand Down
13 changes: 9 additions & 4 deletions pandas/core/interchange/from_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
import numpy as np

import pandas as pd
from pandas.core.interchange.column import PandasColumn
from pandas.core.interchange.dataframe_protocol import (
Buffer,
Column,
Expand Down Expand Up @@ -181,9 +180,15 @@ def categorical_column_to_series(col: Column) -> tuple[pd.Series, Any]:
raise NotImplementedError("Non-dictionary categoricals not supported yet")

cat_column = categorical["categories"]
# for mypy/pyright
assert isinstance(cat_column, PandasColumn), "categories must be a PandasColumn"
categories = np.array(cat_column._col)
if hasattr(cat_column, "_col"):
# Item "Column" of "Optional[Column]" has no attribute "_col"
# Item "None" of "Optional[Column]" has no attribute "_col"
categories = np.array(cat_column._col) # type: ignore[union-attr]
else:
raise NotImplementedError(
"Interchanging categorical columns isn't supported yet, and our "
"fallback of using the `col._col` attribute (a ndarray) failed."
)
buffers = col.get_buffers()

codes_buff, codes_dtype = buffers["data"]
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/interchange/test_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,21 @@ def test_categorical_dtype(data):
tm.assert_frame_equal(df, from_dataframe(df.__dataframe__()))


def test_categorical_pyarrow():
# GH 49889
pa = pytest.importorskip("pyarrow", "11.0.0")

arr = ["Mon", "Tue", "Mon", "Wed", "Mon", "Thu", "Fri", "Sat", "Sun"]
table = pa.table({"weekday": pa.array(arr).dictionary_encode()})
exchange_df = table.__dataframe__()
result = from_dataframe(exchange_df)
weekday = pd.Categorical(
arr, categories=["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
)
expected = pd.DataFrame({"weekday": weekday})
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"data", [int_data, uint_data, float_data, bool_data, datetime_data]
)
Expand Down

0 comments on commit b49d6de

Please sign in to comment.