Skip to content

Commit

Permalink
Backport PR #54913 on branch 2.1.x (REGR: drop_duplicates raising for…
Browse files Browse the repository at this point in the history
… arrow strings) (#54955)

Backport PR #54913: REGR: drop_duplicates raising for arrow strings

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
  • Loading branch information
meeseeksmachine and phofl committed Sep 2, 2023
1 parent eac5483 commit 06a1581
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Fixed regressions
- Fixed regression in :func:`merge` when merging over a PyArrow string index (:issue:`54894`)
- Fixed regression in :func:`read_csv` when ``usecols`` is given and ``dtypes`` is a dict for ``engine="python"`` (:issue:`54868`)
- Fixed regression in :meth:`DataFrame.__setitem__` raising ``AssertionError`` when setting a :class:`Series` with a partial :class:`MultiIndex` (:issue:`54875`)
- Fixed regression in :meth:`Series.drop_duplicates` for PyArrow strings (:issue:`54904`)
- Fixed regression in :meth:`Series.value_counts` raising for numeric data if ``bins`` was specified (:issue:`54857`)
- Fixed regression when comparing a :class:`Series` with ``datetime64`` dtype with ``None`` (:issue:`54870`)

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -1000,7 +1000,7 @@ def duplicated(
duplicated : ndarray[bool]
"""
if hasattr(values, "dtype"):
if isinstance(values.dtype, ArrowDtype):
if isinstance(values.dtype, ArrowDtype) and values.dtype.kind in "ifub":
values = values._to_masked() # type: ignore[union-attr]

if isinstance(values.dtype, BaseMaskedDtype):
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/series/methods/test_drop_duplicates.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import numpy as np
import pytest

import pandas as pd
from pandas import (
Categorical,
Series,
Expand Down Expand Up @@ -256,3 +257,11 @@ def test_duplicated_arrow_dtype(self):
result = ser.drop_duplicates()
expected = Series([True, False, None], dtype="bool[pyarrow]")
tm.assert_series_equal(result, expected)

def test_drop_duplicates_arrow_strings(self):
# GH#54904
pa = pytest.importorskip("pyarrow")
ser = Series(["a", "a"], dtype=pd.ArrowDtype(pa.string()))
result = ser.drop_duplicates()
expecetd = Series(["a"], dtype=pd.ArrowDtype(pa.string()))
tm.assert_series_equal(result, expecetd)

0 comments on commit 06a1581

Please sign in to comment.