Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.str.replace/translate resulting dtype is different from pandas if an invalid pattern was passed #5970

Open
3 tasks done
dchigarev opened this issue Apr 11, 2023 · 0 comments
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@dchigarev
Copy link
Collaborator

dchigarev commented Apr 11, 2023

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import pandas
import modin.pandas as pd

pd_ser = pandas.Series(["a", "b", "ab"])
md_ser = pd.Series(["a", "b", "ab"])

md_res = md_ser.str.replace(pat=None, repl="some", regex=False)
pd_res = pd_ser.str.replace(pat=None, repl="some", regex=False)

print(f"modin dtypes:\n{md_res.dtypes}\n") # object
print(f"pandas dtypes:\n{pd_res.dtypes}\n") # float64

print(f"modin internal dtypes:\n{md_res._to_pandas().dtypes}") # float64 (doesn't match modin's cache)

Issue Description

Here we pass None as a pattern that produces a Series full of nans as the result (instead of expected Series of strings):

>>> pd.Series(["a", "b"]).str.replace(None, "b", regex=False)
0   NaN
1   NaN
dtype: float64

Modin can't infer new dtype dynamically and just copies it:

str_replace = Map.register(_str_map("replace"), dtypes="copy", shape_hint="column")

(Everything works well if the pattern is not None)

Expected Behavior

To match pandas behavior

Error Logs

Replace this line with the error backtrace (if applicable).

Installed Versions

Replace this line with the output of pd.show_versions()

--
Note: this issue is part of an epic #3804

@dchigarev dchigarev added bug 🦗 Something isn't working pandas concordance 🐼 Functionality that does not match pandas P2 Minor bugs or low-priority feature requests labels Apr 11, 2023
@dchigarev dchigarev changed the title BUG: Series.str.replace resulting dtype is different from pandas if an invalid pattern was passed BUG: Series.str.replace/translate resulting dtype is different from pandas if an invalid pattern was passed Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

No branches or pull requests

1 participant