Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.str.split(expand=True) for ArrowDtype(pa.string()) #53532

Merged
merged 5 commits into from
Jun 7, 2023

Conversation

lukemanley
Copy link
Member

main:

In [1]: import pandas as pd

In [2]: import pyarrow as pa

In [3]: ser = pd.Series(["a", "a|b", "a|b|c"], dtype=pd.ArrowDtype(pa.string()))

In [4]: ser.str.split("|", expand=True)
Out[4]: 
   0
0  a
1  a
2  a

PR:

Out[4]: 
   0     1     2
0  a  <NA>  <NA>
1  a     b  <NA>
2  a     b     c

@lukemanley lukemanley added Bug Strings String extension data type and string data Arrow pyarrow functionality labels Jun 6, 2023
@lukemanley lukemanley added this to the 2.0.3 milestone Jun 6, 2023
row = np.append(row, nulls)
new_values.append(row)
pa_type = result._pa_array.type
result._pa_array = pa.array(new_values, type=pa_type)
Copy link
Member

@mroeschke mroeschke Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, could you change result._pa_array = (and the instanced above) to result = ArrowExtensionArray(...)? (so we can ensure the result._pa_array is always a pa.chunked_array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, good catch. updated

@mroeschke mroeschke merged commit 66468ce into pandas-dev:main Jun 7, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lumberbot-app
Copy link

lumberbot-app bot commented Jun 7, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 66468cecba6a1b17d2e12ff79da0b87f16118726
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #53532: BUG: Series.str.split(expand=True) for ArrowDtype(pa.string())'
  1. Push to a named branch:
git push YOURFORK 2.0.x:auto-backport-of-pr-53532-on-2.0.x
  1. Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #53532 on branch 2.0.x (BUG: Series.str.split(expand=True) for ArrowDtype(pa.string()))"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

lukemanley added a commit to lukemanley/pandas that referenced this pull request Jun 7, 2023
mroeschke pushed a commit that referenced this pull request Jun 7, 2023
…) for ArrowDtype(pa.string())) (#53549)

* Backport PR #53532: BUG: Series.str.split(expand=True) for ArrowDtype(pa.string())

* _pa_array -> _data
@lukemanley lukemanley deleted the series-str-split-expand-arrow branch June 8, 2023 02:31
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…s-dev#53532)

* BUG: Series.str.split(expand=True) for ArrowDtype(pa.string())

* whatsnew

* min versions

* ensure ArrowExtensionArray
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants