BUG: eq raises NotImplementedError for new arrow string dtype #56008

phofl · 2023-11-16T22:56:46Z

Reproducible Example

ser = pd.Series([False, True])
ser2 = pd.Series(["a", "b"], dtype="string[pyarrow_numpy]")
ser == ser2

Issue Description

This shouldn't raise if we want to emulate object dtype behavior

Expected Behavior

all False return value
cc @jorisvandenbossche thoughts?

Installed Versions

Replace this line with the output of pd.show_versions()

jorisvandenbossche · 2023-11-17T08:19:22Z

Regardless of exactly emulating object dtype, we should maybe ask ourselves the more general question: how does pandas handle equality comparisons of incompatible dtypes in general? (or at least with the non-arrow dtypes)

At the moment, it seems we just never raise an error, with one exception for comparing two categoricals (based on a quick experiment, didn't include every possible dtype):

serieses =[
    pd.Series(["a", "b"]),
    pd.Series(["a", "b"], dtype="string"),
    pd.Series([True, False]),
    pd.Series([True, False], dtype="boolean"),
    pd.Series([1, 2]),
    pd.Series([1, 2], dtype="Int64"),
    pd.Series([0.1, 0.2]),
    pd.Series([0.1, 0.2], dtype="Float64"),
    pd.Series(pd.date_range("2012-01-01", periods=2)),
    pd.Series(pd.timedelta_range("1 days", periods=2)),
    pd.Series(pd.period_range("2012", periods=2)),
    pd.Series(['a', 'b'], dtype="category"),
    pd.Series([1, 2], dtype="category"),
]

for left in serieses:
    for right in serieses:
        try:
            left == right
        except:
            print(f"Exception for {left.dtype} and {right.dtype}")

gives

Exception for category and category
Exception for category and category

So if we want to keep that "rule" consistent, then I think the new default string dtype should also never raise in comparisons, but give Falses instead.

jbrockmendel · 2023-11-18T05:56:55Z

in general we follow python semantics for non-comparable dtypes: == returns all-False, != returns all-True, and inequalities raise. A boilerplate version of this logic is in ops.invalid_comparison

phofl added Bug Strings String extension data type and string data Arrow pyarrow functionality labels Nov 16, 2023

phofl added this to the 3.0 milestone Nov 16, 2023

jorisvandenbossche modified the milestones: 3.0, 2.2 Nov 18, 2023

phofl mentioned this issue Nov 29, 2023

BUG: __eq__ raising for new arrow string dtype for incompatible objects #56245

Merged

5 tasks

phofl closed this as completed in #56245 Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: eq raises NotImplementedError for new arrow string dtype #56008

BUG: eq raises NotImplementedError for new arrow string dtype #56008

phofl commented Nov 16, 2023 •

edited by jorisvandenbossche

Loading

jorisvandenbossche commented Nov 17, 2023

jbrockmendel commented Nov 18, 2023

BUG: __eq__ raises NotImplementedError for new arrow string dtype #56008

BUG: __eq__ raises NotImplementedError for new arrow string dtype #56008

Comments

phofl commented Nov 16, 2023 • edited by jorisvandenbossche Loading

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

jorisvandenbossche commented Nov 17, 2023

jbrockmendel commented Nov 18, 2023

BUG: eq raises NotImplementedError for new arrow string dtype #56008

BUG: eq raises NotImplementedError for new arrow string dtype #56008

phofl commented Nov 16, 2023 •

edited by jorisvandenbossche

Loading