Skip to content

Commit

Permalink
Backport PR #54895 on branch 2.1.x (REGR: Merge raising when left mer…
Browse files Browse the repository at this point in the history
…ging on arrow string index) (#54933)

Backport PR #54895: REGR: Merge raising when left merging on arrow string index

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
  • Loading branch information
meeseeksmachine and phofl committed Sep 1, 2023
1 parent 6fcd7bd commit eac5483
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 2 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ including other versions of pandas.

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :func:`merge` when merging over a PyArrow string index (:issue:`54894`)
- Fixed regression in :func:`read_csv` when ``usecols`` is given and ``dtypes`` is a dict for ``engine="python"`` (:issue:`54868`)
- Fixed regression in :meth:`DataFrame.__setitem__` raising ``AssertionError`` when setting a :class:`Series` with a partial :class:`MultiIndex` (:issue:`54875`)
- Fixed regression in :meth:`Series.value_counts` raising for numeric data if ``bins`` was specified (:issue:`54857`)
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -2433,8 +2433,12 @@ def _factorize_keys(
length = len(dc.dictionary)

llab, rlab, count = (
pc.fill_null(dc.indices[slice(len_lk)], length).to_numpy(),
pc.fill_null(dc.indices[slice(len_lk, None)], length).to_numpy(),
pc.fill_null(dc.indices[slice(len_lk)], length)
.to_numpy()
.astype(np.intp, copy=False),
pc.fill_null(dc.indices[slice(len_lk, None)], length)
.to_numpy()
.astype(np.intp, copy=False),
len(dc.dictionary),
)
if how == "right":
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -2870,3 +2870,15 @@ def test_merge_ea_int_and_float_numpy():

result = df2.merge(df1)
tm.assert_frame_equal(result, expected.astype("float64"))


def test_merge_arrow_string_index():
# GH#54894
pytest.importorskip("pyarrow")
left = DataFrame({"a": ["a", "b"]}, dtype="string[pyarrow]")
right = DataFrame({"b": 1}, index=Index(["a", "c"], dtype="string[pyarrow]"))
result = left.merge(right, left_on="a", right_index=True, how="left")
expected = DataFrame(
{"a": Series(["a", "b"], dtype="string[pyarrow]"), "b": [1, np.nan]}
)
tm.assert_frame_equal(result, expected)

0 comments on commit eac5483

Please sign in to comment.