Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: replacement works for object but not string dtype #35977

Closed
2 of 3 tasks
yunkypunky opened this issue Aug 29, 2020 · 6 comments · Fixed by #41343
Closed
2 of 3 tasks

BUG: replacement works for object but not string dtype #35977

yunkypunky opened this issue Aug 29, 2020 · 6 comments · Fixed by #41343
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays replace replace method Strings String extension data type and string data
Milestone

Comments

@yunkypunky
Copy link

yunkypunky commented Aug 29, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

import pandas as pd
import numpy as np

a = pd.DataFrame({'a': ['a', 'b', 'c'], 'b': ['d', '', '']}, dtype='object')
b = pd.DataFrame({'a': ['a', 'b', 'c'], 'b': ['d', '', '']}, dtype='string')

print(a)
a.replace(r'^\s*$', pd.NA, regex=True, inplace=True)
print(a)

print(b)
b.replace(r'^\s*$', pd.NA, regex=True, inplace=True)
print(b)

Problem description

replace(r'^\s*$', pd.NA, regex=True, inplace=True) works on object dtype, but not on stringdtype

Expected Output

Same on both replacement

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

@yunkypunky yunkypunky added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 29, 2020
@dsaxton dsaxton added Needs Info Clarification about behavior needed to assess issue and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 29, 2020
@yunkypunky

This comment has been minimized.

@MarcoGorelli MarcoGorelli mentioned this issue Aug 29, 2020
3 tasks
@dsaxton
Copy link
Member

dsaxton commented Aug 29, 2020

@yunkypunky Thanks for the report. This could be a real bug (haven't looked too closely yet) so I think it's worth leaving open for now.

@dsaxton dsaxton reopened this Aug 29, 2020
@dsaxton dsaxton changed the title BUG: BUG: inplace replacement works for object but not string dtype Aug 29, 2020
@dsaxton dsaxton added Bug Strings String extension data type and string data NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Info Clarification about behavior needed to assess issue labels Aug 29, 2020
@yunkypunky
Copy link
Author

I'm using Python 3.8.5 with Pandas 1.1.1

@cgangwar11
Copy link
Contributor

This bug is coming because ExtensionBlock doesn't support replacement by regex.
`In [4]: is_extension_array_dtype(pd.Series(["d","d","f"],dtype='string').dtype)
Out[4]: True

In [5]: is_extension_array_dtype(pd.Series(["d","d","f"],dtype='object').dtype)
Out[5]: False
`
Since variable a calls replace method of ObjectBlock but variable b calls replace method of ExtensionBlock . Hence we see different output because ExtensionBlock doesn't support replacement by regex. Possible solution is to override replace method of superclass block with correct logic

@dsaxton
Copy link
Member

dsaxton commented Aug 30, 2020

@cgangwar11 Agreed that extension dtypes should also support replacement with a regex, PR would be welcome

@dsaxton dsaxton changed the title BUG: inplace replacement works for object but not string dtype BUG: replacement works for object but not string dtype Aug 30, 2020
@simonjayhawkins simonjayhawkins added the replace replace method label Sep 10, 2020
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 9, 2021
@simonjayhawkins
Copy link
Member

since [09e2036] DEPR: CategoricalBlock; combine Block.replace methods (#40527) this no longer silently fails

>>> b.replace(r'^\s*$', pd.NA, regex=True, inplace=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/simon/pandas/pandas/core/frame.py", line 5202, in replace
    return super().replace(
  File "/home/simon/pandas/pandas/core/generic.py", line 6686, in replace
    new_data = self._mgr.replace(
  File "/home/simon/pandas/pandas/core/internals/managers.py", line 414, in replace
    return self.apply(
  File "/home/simon/pandas/pandas/core/internals/managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/simon/pandas/pandas/core/internals/blocks.py", line 684, in replace
    return self._replace_regex(to_replace, value, inplace=inplace)
  File "/home/simon/pandas/pandas/core/internals/blocks.py", line 757, in _replace_regex
    replace_regex(new_values, rx, value, mask)
  File "/home/simon/pandas/pandas/core/array_algos/replace.py", line 152, in replace_regex
    f = np.vectorize(re_replacer, otypes=[values.dtype])
  File "/home/simon/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2070, in __init__
    otypes = ''.join([_nx.dtype(x).char for x in otypes])
  File "/home/simon/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2070, in <listcomp>
    otypes = ''.join([_nx.dtype(x).char for x in otypes])
TypeError: Cannot interpret 'StringDtype' as a data type

@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays replace replace method Strings String extension data type and string data
Projects
None yet
4 participants