BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514

jbrockmendel · 2021-11-18T16:51:09Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Same yak-shaving as #44495 (which turned out to be a dead end for this particular yak, but still a perf bump)

pandas/_libs/missing.pyx

jorisvandenbossche

The setitem change / bug fix is unrelated to the constructor fix? Or it's because you are testing that through setitem as well?

jorisvandenbossche · 2021-11-21T21:10:53Z

pandas/core/arrays/floating.py

+        mask2 = isna(values)
+        if not (mask == mask2).all():
+            # e.g. if we have a timedelta64("NaT")
+            raise TypeError(f"{values.dtype} cannot be converted to a FloatingDtype")


Alternatively, could it be libmissing.is_numeric_na that already raises on encountering a "non-numeric NA"? (is there a use case for is_numeric_na to not be strict about this, i.e. to get a "partial" mask?)

Were you planning to address this one?

planning to address in a follow-up

pandas/core/arrays/integer.py

jbrockmendel · 2021-11-21T21:19:33Z

The setitem change / bug fix is unrelated to the constructor fix? Or it's because you are testing that through setitem as well?

The setitem bug was identified first and the cause tracked back to the constructor.

jreback · 2021-11-21T23:40:17Z

ok to merge, @jorisvandenbossche comments could be afollow up (or here is ok too)

jbrockmendel · 2021-11-21T23:53:18Z

Let’s do as follow up, these constructors will need plenty more work.

…

On Sun, Nov 21, 2021 at 3:40 PM Jeff Reback ***@***.***> wrote: ok to merge, @jorisvandenbossche <https://github.com/jorisvandenbossche> comments could be afollow up (or here is ok too) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#44514 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5UM6HWBAVF7UB36P6IPDTUNF7O3ANCNFSM5IKC63JQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

jorisvandenbossche · 2021-11-22T07:19:10Z

pandas/tests/extension/base/setitem.py

@@ -357,6 +364,31 @@ def test_setitem_series(self, data, full_indexer):
        )
        self.assert_series_equal(result, expected)

+    def test_setitem_frame_2d_values(self, data, using_array_manager, request):


Can you move this test out of the extension base tests, or remove the need to use using_array_manager? (this is not defined by external users of those tests, and would be a bit annoying to replicate)

pandas/tests/frame/indexing/test_indexing.py

jbrockmendel · 2021-11-24T18:43:24Z

gentle ping; this is a blocker for fixing a bug in Series.where, which in turn should allow us to share some more Block methods.

jorisvandenbossche · 2021-11-25T21:46:00Z

pandas/tests/extension/base/setitem.py

+        df = pd.DataFrame({"A": data})
+
+        # Avoiding using_array_manager fixture
+        #  https://github.com/pandas-dev/pandas/pull/44514#discussion_r754002410


Thanks for changing to not use the fixture. But generally, is this actually needed to have as base extension test? (since the fix was inside the Blocks code, it's not really testing a specific behaviour that the EA needs to have?)

is this actually needed to have as base extension test?

This seems like the best way to systematically test it for all EA dtypes.

it's not really testing a specific behaviour that the EA needs to have?

That seems like it applies to most tests that aren't directly testing EA methods.

jorisvandenbossche · 2021-11-25T21:47:09Z

Two questions, but feel free to merge as well

jbrockmendel · 2021-11-27T22:04:01Z

updated to raise inside is_numeric_na

jreback · 2021-11-28T01:26:13Z

@jorisvandenbossche if you want to look or can merge

jbrockmendel · 2021-11-30T00:00:05Z

i think comments have all been addressed here? bugfix follow-up is ready.

jorisvandenbossche · 2021-12-06T07:46:27Z

This seems to give a big slowdown in some benchmarks (eg 10x in https://pandas.pydata.org/speed/pandas/#groupby.Cumulative.time_frame_transform?python=3.8&Cython=0.29.24&p-dtype='Float64'&p-method='cumsum'&commits=f5107e41-12afff15). Can you take a look?

jbrockmendel · 2021-12-06T20:11:28Z

Yah, looks like a lot of time is being taken in is_numeric_na.

from asv_bench.benchmarks.groupby import *
self = Cumulative()
self.setup("Float64", "cumsum")

%timeit self.time_frame_transform("Float64", "cumsum")

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.791    1.791 {built-in method builtins.exec}
        1    0.000    0.000    1.791    1.791 <string>:1(<module>)
       10    0.014    0.001    1.791    0.179 groupby.py:569(time_frame_transform)
       10    0.000    0.000    1.773    0.177 generic.py:1179(transform)
       10    0.000    0.000    1.773    0.177 groupby.py:1608(_transform)
       10    0.000    0.000    1.773    0.177 groupby.py:3117(cumsum)
       10    0.000    0.000    1.773    0.177 generic.py:1103(_cython_transform)
       10    0.000    0.000    1.762    0.176 managers.py:1266(grouped_reduce)
       50    0.000    0.000    1.762    0.035 blocks.py:381(apply)
       50    0.000    0.000    1.758    0.035 generic.py:1118(arr_func)
       50    0.000    0.000    1.758    0.035 ops.py:919(_cython_operation)
       50    0.000    0.000    1.699    0.034 ops.py:587(cython_operation)
       50    0.001    0.000    1.697    0.034 ops.py:319(_ea_wrap_cython_operation)
       50    0.000    0.000    1.559    0.031 ops.py:376(_reconstruct_ea_result)
       50    0.000    0.000    1.558    0.031 floating.py:263(_from_sequence)
       50    0.001    0.000    1.557    0.031 floating.py:85(coerce_to_array)
       50    1.551    0.031    1.551    0.031 {pandas._libs.missing.is_numeric_na}
       50    0.000    0.000    0.134    0.003 ops.py:434(_cython_op_ndim_compat)

Looks like we are calling is_numeric_na in cases where we have float64 dtype, so can just use np.isnan. Should be an easy patch.

jbrockmendel added 3 commits November 18, 2021 08:42

BUG: IntegerArray/FloatingArray constructors mismatched NAs

ac09146

Whatsnew, GH ref

1166725

mypy fixup

21b6977

jbrockmendel mentioned this pull request Nov 18, 2021

REF: implement _maybe_squeeze_arg #44520

Merged

4 tasks

Merge branch 'master' into bug-nullable-construction

742d321

jreback requested changes Nov 20, 2021

View reviewed changes

pandas/_libs/missing.pyx Outdated Show resolved Hide resolved

jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Nov 20, 2021

jbrockmendel added 6 commits November 20, 2021 11:14

Merge branch 'master' into bug-nullable-construction

020c4e2

xfail on old numpy

a4d89ce

xfail ArrayManager

d322af3

update tested expception message for py310

67d615d

Merge branch 'master' into bug-nullable-construction

6350d8e

xfail on later numpy

117aef7

jorisvandenbossche requested changes Nov 21, 2021

View reviewed changes

jreback added this to the 1.4 milestone Nov 21, 2021

jreback approved these changes Nov 21, 2021

View reviewed changes

jorisvandenbossche requested changes Nov 22, 2021

View reviewed changes

jbrockmendel added 3 commits November 22, 2021 07:19

Merge branch 'master' into bug-nullable-construction

0f17b88

use decorator

2a2f8d2

Merge branch 'master' into bug-nullable-construction

4be676d

Merge branch 'master' into bug-nullable-construction

9cd5047

jorisvandenbossche reviewed Nov 25, 2021

View reviewed changes

Merge branch 'master' into bug-nullable-construction

0d77cea

jbrockmendel added 2 commits November 26, 2021 19:06

raise in is_numeric_na

48a4531

fixup unused import

745d24f

This was referenced Nov 30, 2021

REF: hold PeriodArray in NDArrayBackedExtensionBlock #44681

Merged

BUG: Series.where with incompatible NA value #44697

Merged

BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. #44703

Closed

jreback merged commit 6b84ee7 into pandas-dev:master Dec 1, 2021

jbrockmendel deleted the bug-nullable-construction branch December 2, 2021 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514

BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514

jbrockmendel commented Nov 18, 2021 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche Nov 21, 2021

jbrockmendel Nov 21, 2021

jorisvandenbossche Nov 25, 2021

jbrockmendel Nov 26, 2021

jbrockmendel commented Nov 21, 2021

jreback commented Nov 21, 2021

jbrockmendel commented Nov 21, 2021 via email

jorisvandenbossche Nov 22, 2021

jbrockmendel Nov 22, 2021

jbrockmendel commented Nov 24, 2021

jorisvandenbossche Nov 25, 2021

jbrockmendel Nov 26, 2021

jorisvandenbossche commented Nov 25, 2021

jbrockmendel commented Nov 27, 2021

jreback commented Nov 28, 2021

jbrockmendel commented Nov 30, 2021

jorisvandenbossche commented Dec 6, 2021

jbrockmendel commented Dec 6, 2021

BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514

BUG: IntegerArray/FloatingArray constructors mismatched NAs #44514

Conversation

jbrockmendel commented Nov 18, 2021 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 21, 2021

jreback commented Nov 21, 2021

jbrockmendel commented Nov 21, 2021 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 25, 2021

jbrockmendel commented Nov 27, 2021

jreback commented Nov 28, 2021

jbrockmendel commented Nov 30, 2021

jorisvandenbossche commented Dec 6, 2021

jbrockmendel commented Dec 6, 2021

jbrockmendel commented Nov 18, 2021 •

edited

Loading