CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 #47393

mroeschke · 2022-06-16T21:54:07Z

closes CI: nighlty numpy broke ci #47391 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.

…ast to float64

phofl · 2022-06-17T06:22:43Z

Most of the remaining ones look like things we want to change anyway? Saw one test that was not supposed to raise a FutureWarning

mroeschke · 2022-06-17T17:19:07Z

Most of the remaining ones look like things we want to change anyway? Saw one test that was not supposed to raise a FutureWarning

Correct, the numpy RuntimeWarnings align with our 1.4 deprecation of converting np.nan to i8 dtype: #45136

The additional length change should also be backwards compatible so I think these changes can be backported so 1.4.3 and be compatible with numpy 1.24

jbrockmendel · 2022-06-17T17:52:53Z

pandas/core/dtypes/cast.py

+        if is_integer_dtype(dtype) and isna(value):
+            if not length:
+                # GH 47391: numpy > 1.24 will raise filling np.nan into int dtypes
+                return np.array([], dtype=dtype)


this seems like a weird case. how does it happen? is it clear that we'd want to prioritize the dtype as being "right" instead of the value?

I posted an example of the state that's reached in numpy/numpy#21784

Namely before this change, length=0 would pass all the if checks down to np.empty(0, dtype=integer).fill(np.nan) which in numpy < 1.24 would just return np.array([], dtype=integer) but in numpy >=1.24 will raise

I think the question (also for me from NumPy!) is which pandas code runs into this path. For pandas, the question is what the actual result should be (in the future). For me the question is how bad it will be if that code path breaks. Because especially if it is bad, we may want to make sure it doesn't break yet (from within NumPy).

(At this point I suspect that at least the NaN case may need a work-around in NumPy as well.)

So an example test where this is hit is

s = Series([], index=pd.date_range(start="2018-01-01", periods=0), dtype=int) result = s.apply(lambda x: x) tm.assert_series_equal(result, s)

so I think when operating over these empty Series/DataFrames, the value representation is np.nan when construct_1d_arraylike_from_scalar(np.nan, length=0, dtype=dtype) is called.

construct_1d_arraylike_from_scalar has a if length and is_integer_dtype(dtype) and isna(value) condition to ensure that integer dtypes were not coerced to float64 (because we relied on np.empty(0, dtype=integer).fill(np.nan) == np.array([], dtype=integer)) for these empty Series/DataFrames.

So for these empty-like cases in pandas, it appears pandas was relying on np.empty(0, dtype=integer).full(np.nan) to preserve integer dtypes. Having pandas explicitly preserve the dtype for these empty cases e.g. np.array([], dtype=integer) is an okay change to make on our end IMO.

sgtm. (but why not keep this bit unchanged and just put the subarr.fill(value) on L1715 inside a if length: to skip for all zero-length arrays?)

Ah sure I can make the change there instead

jreback · 2022-06-18T04:15:12Z

 =================================== FAILURES ===================================
______________ TestMergeDtypes.test_merge_on_ints_floats_warning _______________
[gw1] linux -- Python 3.10.4 /usr/share/miniconda/envs/test/bin/python
self = <pandas.tests.reshape.merge.test_merge.TestMergeDtypes object at 0x7f2644979420>
    def test_merge_on_ints_floats_warning(self):
        # GH 16[57](https://github.com/pandas-dev/pandas/runs/6940154831?check_suite_focus=true#step:8:59)2
        # merge will produce a warning when merging on int and
        # float columns where the float values are not exactly
        # equal to their int representation
        A = DataFrame({"X": [1, 2, 3]})
        B = DataFrame({"Y": [1.1, 2.5, 3.0]})
        expected = DataFrame({"X": [3], "Y": [3.0]})
        with tm.assert_produces_warning(UserWarning):
            result = A.merge(B, left_on="X", right_on="Y")
            tm.assert_frame_equal(result, expected)
        with tm.assert_produces_warning(UserWarning):
            result = B.merge(A, left_on="Y", right_on="X")
            tm.assert_frame_equal(result, expected[["Y", "X"]])
        # test no warning if float has NaNs
        B = DataFrame({"Y": [np.nan, np.nan, 3.0]})
      with tm.assert_produces_warning(None):

this is failing

jreback · 2022-06-19T00:38:27Z

pandas/core/reshape/merge.py

-                            "are not equal to their int representation.",
-                            UserWarning,
-                        )
+                    # GH 47391 numpy > 1.24 will raise a RuntimeError for nan -> int


for 1.5 we ought to actually remove the nans first

So in 1.5. add a deprecation noting that nans will be dropped?

no i mean i think u can remove the nans before comparing to avoid the warning (this is all internal anyhow)

Ah gotcha. Yeah can clean this for 1.5 in a separate PR

simonjayhawkins · 2022-06-21T18:04:42Z

pandas/core/dtypes/cast.py

@@ -1696,7 +1696,7 @@ def construct_1d_arraylike_from_scalar(

    else:

-        if length and is_integer_dtype(dtype) and isna(value):
+        if is_integer_dtype(dtype) and isna(value):


I think still need the length here as this part of the code is this logic to determine the the dtype

i.e. revert to original.

simonjayhawkins · 2022-06-22T09:49:37Z

will merge later today if no objections

jreback

lgtm

…uct_1d_arraylike_from_scalar cast to float64

simonjayhawkins · 2022-06-22T12:07:29Z

Thanks @mroeschke

…construct_1d_arraylike_from_scalar cast to float64) (#47460) * Backport PR #47393: CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 Co-authored-by: Matthew Roeschke <emailformattr@gmail.com> Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

…ast to float64 (pandas-dev#47393)

mroeschke added 3 commits June 16, 2022 14:53

CI/TST: Don't require length for construct_1d_arraylike_from_scalar c…

ceffe6d

…ast to float64

Just short circuit

d324332

Add errstate

cdcf033

Move errstate

1a82a00

mroeschke added this to the 1.4.3 milestone Jun 17, 2022

mroeschke added the Compat pandas objects compatability with Numpy or Python functions label Jun 17, 2022

jbrockmendel reviewed Jun 17, 2022

View reviewed changes

mroeschke marked this pull request as ready for review June 17, 2022 20:18

mroeschke added 4 commits June 17, 2022 22:34

Merge remote-tracking branch 'upstream/main' into ci/fix/numpy-dev

e63d61e

Add errstate to merge

536786f

Fix typo

106449d

Merge remote-tracking branch 'upstream/main' into ci/fix/numpy-dev

2131095

jreback reviewed Jun 19, 2022

View reviewed changes

mroeschke added 2 commits June 21, 2022 10:31

Merge remote-tracking branch 'upstream/main' into ci/fix/numpy-dev

df267f9

Move length check

ffd1171

simonjayhawkins reviewed Jun 21, 2022

View reviewed changes

Add back length check

57d43c2

jreback approved these changes Jun 22, 2022

View reviewed changes

simonjayhawkins merged commit 2f3ac16 into pandas-dev:main Jun 22, 2022

meeseeksmachine mentioned this pull request Jun 22, 2022

Backport PR #47393 on branch 1.4.x (CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64) #47460

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 22, 2022

Backport PR pandas-dev#47393: CI/TST: Don't require length for constr…

a86fc33

…uct_1d_arraylike_from_scalar cast to float64

mroeschke deleted the ci/fix/numpy-dev branch June 22, 2022 16:47

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

CI/TST: Don't require length for construct_1d_arraylike_from_scalar c…

09b676c

…ast to float64 (pandas-dev#47393)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 #47393

CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 #47393

mroeschke commented Jun 16, 2022 •

edited

phofl commented Jun 17, 2022

mroeschke commented Jun 17, 2022

jbrockmendel Jun 17, 2022

mroeschke Jun 17, 2022

seberg Jun 17, 2022

mroeschke Jun 17, 2022

mroeschke Jun 17, 2022

simonjayhawkins Jun 20, 2022

mroeschke Jun 21, 2022

jreback commented Jun 18, 2022

jreback Jun 19, 2022

mroeschke Jun 19, 2022

jreback Jun 19, 2022

mroeschke Jun 19, 2022

simonjayhawkins Jun 21, 2022

simonjayhawkins Jun 21, 2022

simonjayhawkins commented Jun 22, 2022

jreback left a comment

simonjayhawkins commented Jun 22, 2022

CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 #47393

CI/TST: Don't require length for construct_1d_arraylike_from_scalar cast to float64 #47393

Conversation

mroeschke commented Jun 16, 2022 • edited

phofl commented Jun 17, 2022

mroeschke commented Jun 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Jun 22, 2022

jreback left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Jun 22, 2022

mroeschke commented Jun 16, 2022 •

edited