REGR: Series.nlargest with masked arrays #42838

jbrockmendel · 2021-07-31T23:31:25Z

closes BUG: nlargest raises TypeError "No matching signature found" on Float64Dtype Series, versions >1.3.0 #42816
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

doc/source/whatsnew/v1.3.2.rst

pandas/tests/series/methods/test_nlargest.py

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

…-42816

jreback · 2021-08-03T19:31:48Z

pandas/core/algorithms.py

@@ -1255,6 +1259,18 @@ def compute(self, method: str) -> Series:

        dropped = self.obj.dropna()

+        if is_extension_array_dtype(dropped.dtype):


why is this prefereable to hanlding on L1281 with the other dtypes?

bc ensure_data does the wrong thing with MaskedArrays, np.asarray(obj) defaults to object dtype.

we could kludge _ensure_data to work in cases where we dont have any pd.NAs, but doing it here lets us handle cases with NAs too.

bc ensure_data does the wrong thing with MaskedArrays, np.asarray(obj) defaults to object dtype.

this is very unfortunate. I don't really like this approach here. i suppose its ok for a backport though this is na experimental type and so i don't consider this regression to be a big deal.

the more i work on it, the more i want to nuke NA from space

ok what is involved with changing this to a non-recursive formulation though? e.g. ensure_data should be able to handle NA (if not we will have other issues)

Three options:

have ensure_data special-case MaskedArray cases with no NAs, in which case they can just use the ndarray. This fixes the regression (cases with NAs didnt use to work IIUC). kluuuuudge

make algos.kth_smallest support object dtype (and not choke on pd.NA)

make the EA implement its own nlargest

I decided the approach here was less kludgy than those in part bc this function uses obj.dropna(), so the MaskedArray case actually is much simpler to implement than an arbitrary EA

why is 1 a kludge?

bc its special-casing for MaskedArray and special casing for not values.isna().any()

worse than that, it isnt not values.isna().any() but not values._mask.any()

jreback · 2021-08-10T21:45:24Z

not super thrilled with this, but does fix.

jreback · 2021-08-10T21:45:37Z

@meeseeksdev backport 1.3.x

lumberbot-app · 2021-08-10T21:45:50Z

Something went wrong ... Please have a look at my logs.

simonjayhawkins · 2021-08-11T10:14:51Z

doc/source/whatsnew/v1.3.2.rst

@@ -23,6 +23,7 @@ Fixed regressions
 - Fixed regression where :meth:`pandas.read_csv` raised a ``ValueError`` when parameters ``names`` and ``prefix`` were both set to None (:issue:`42387`)
 - Fixed regression in comparisons between :class:`Timestamp` object and ``datetime64`` objects outside the implementation bounds for nanosecond ``datetime64`` (:issue:`42794`)
 - Fixed regression in :meth:`.Styler.highlight_min` and :meth:`.Styler.highlight_max` where ``pandas.NA`` was not successfully ignored (:issue:`42650`)
+- Regression in :meth:`Series.nlargest` and :meth:`Series.nsmallest` with nullable integer or float dtype (:issue:`41816`)


will need to change to #42816. will do that after backport is merged.

done in #42983

… arrays) (#42975) * Backport PR #42838: REGR: Series.nlargest with masked arrays * fix final import Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>

REGR: Series.nlargest with masked arrays

0c1ab8f

jbrockmendel added NA - MaskedArrays Related to pd.NA and nullable extension arrays Regression Functionality that used to work in a prior pandas version labels Aug 1, 2021

simonjayhawkins added this to the 1.3.2 milestone Aug 2, 2021

simonjayhawkins reviewed Aug 2, 2021

View reviewed changes

doc/source/whatsnew/v1.3.2.rst Outdated Show resolved Hide resolved

simonjayhawkins reviewed Aug 2, 2021

View reviewed changes

pandas/tests/series/methods/test_nlargest.py Outdated Show resolved Hide resolved

jbrockmendel and others added 4 commits August 2, 2021 06:54

Update doc/source/whatsnew/v1.3.2.rst

183dacc

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

Merge branch 'master' into regr-42816

8a62160

update test with case including NA

7340098

Merge branch 'regr-42816' of github.com:jbrockmendel/pandas into regr…

6c64ba0

…-42816

jreback requested changes Aug 3, 2021

View reviewed changes

jbrockmendel added 2 commits August 3, 2021 16:35

Merge branch 'master' into regr-42816

1975cb2

Merge remote-tracking branch 'upstream/master' into regr-42816

8180d39

jbrockmendel mentioned this pull request Aug 9, 2021

BUG: Groupby min/max with nullable dtypes #42567

Merged

4 tasks

Merge branch 'master' into regr-42816

03dea8d

jreback approved these changes Aug 10, 2021

View reviewed changes

jreback merged commit 61cbb73 into pandas-dev:master Aug 10, 2021

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 10, 2021

Backport PR pandas-dev#42838: REGR: Series.nlargest with masked arrays

f1b60ad

meeseeksmachine mentioned this pull request Aug 10, 2021

Backport PR #42838 on branch 1.3.x (REGR: Series.nlargest with masked arrays) #42975

Merged

jbrockmendel deleted the regr-42816 branch August 11, 2021 05:18

simonjayhawkins reviewed Aug 11, 2021

View reviewed changes

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021

REGR: Series.nlargest with masked arrays (pandas-dev#42838)

9fb581d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Series.nlargest with masked arrays #42838

REGR: Series.nlargest with masked arrays #42838

jbrockmendel commented Jul 31, 2021

jreback Aug 3, 2021

jbrockmendel Aug 3, 2021

jreback Aug 4, 2021

jbrockmendel Aug 4, 2021

jreback Aug 5, 2021

jbrockmendel Aug 5, 2021

jreback Aug 5, 2021

jbrockmendel Aug 5, 2021

jbrockmendel Aug 5, 2021

jreback commented Aug 10, 2021 •

edited

jreback commented Aug 10, 2021

lumberbot-app bot commented Aug 10, 2021

simonjayhawkins Aug 11, 2021 •

edited

simonjayhawkins Aug 11, 2021

		@@ -1255,6 +1259,18 @@ def compute(self, method: str) -> Series:

		dropped = self.obj.dropna()

		if is_extension_array_dtype(dropped.dtype):

REGR: Series.nlargest with masked arrays #42838

REGR: Series.nlargest with masked arrays #42838

Conversation

jbrockmendel commented Jul 31, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 10, 2021 • edited

jreback commented Aug 10, 2021

lumberbot-app bot commented Aug 10, 2021

simonjayhawkins Aug 11, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 10, 2021 •

edited

simonjayhawkins Aug 11, 2021 •

edited