Skip to content

BUG: Replace on Series/DataFrame stops replacing after first NA #57865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 20, 2024

Conversation

asishm
Copy link
Member

@asishm asishm commented Mar 16, 2024

The issue was the line a = a[mask] was triggered only for ndarray and doesn't hit when dtype='string', but the np.place logic applied as long as the result was an ndarray.

The old behavior had things like (depending on the number and location of NAs)

In [2]: s = pd.Series(['m', 'm', pd.NA, 'm', 'm', 'm'], dtype='string')

In [3]: s.replace({'m': 't'}, regex=True)
Out[3]:
0       t
1       t
2    <NA>
3       m
4       t
5       t
dtype: string

In [4]: s = pd.Series(['m', 'm', pd.NA, pd.NA, 'm', 'm', 'm'], dtype='string')

In [5]: s.replace({'m': 't'}, regex=True)
Out[5]:
0       t
1       t
2    <NA>
3    <NA>
4       m
5       m
6       t
dtype: string

In [6]: s = pd.Series(['m', 'm', pd.NA, 'm', 'm', pd.NA, 'm', 'm'], dtype='string')

In [7]: s.replace({'m': 't'}, regex=True)
Out[7]:
0       t
1       t
2    <NA>
3       m
4       t
5    <NA>
6       t
7       m
dtype: string

@asishm asishm changed the title Replace mask bug regex BUG: Replace on Series/DataFrame stops replacing after first NA Mar 16, 2024
@asishm
Copy link
Member Author

asishm commented Mar 16, 2024

pre-commit.ci autofix

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method labels Mar 20, 2024
@mroeschke mroeschke added this to the 3.0 milestone Mar 20, 2024
@mroeschke mroeschke merged commit 0f7ded2 into pandas-dev:main Mar 20, 2024
@mroeschke
Copy link
Member

Thanks @asishm

@asishm asishm deleted the replace_mask_bug_regex branch March 24, 2024 07:39
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…as-dev#57865)

* update test for GH#56599

* bug: ser/df.replace only replaces first occurence with NAs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add whatsnew

* fmt fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@rhshadrach
Copy link
Member

Along with other string type fixes, I think it'd be good to backport this to 2.3.x. Any objection @mroeschke @jorisvandenbossche?

@mroeschke
Copy link
Member

No objection from me

@rhshadrach
Copy link
Member

@meeseeksdev backport to 2.3.x

Copy link

lumberbot-app bot commented Jul 25, 2025

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 0f7ded2a3a637b312f6ad454fd6c0b89d3d3e7aa
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #57865: BUG: Replace on Series/DataFrame stops replacing after first NA'
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-57865-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #57865 on branch 2.3.x (BUG: Replace on Series/DataFrame stops replacing after first NA)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method Still Needs Manual Backport
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.replace with regex on StringDtype column with NA values stops replacing after first NA
3 participants