REGR: Series.mod behaves different with numexpr #36552

simonjayhawkins · 2020-09-22T19:17:36Z

closes BUG: Series.__mod__ behaves different with >1e4 rows #36047
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

simonjayhawkins · 2020-09-22T19:18:54Z

will add tests if reverting change in #30147 without any new test failures see #30147 (comment)

simonjayhawkins · 2020-09-22T19:27:02Z

pandas/core/ops/methods.py

@@ -171,8 +171,6 @@ def _create_methods(cls, arith_method, comp_method, bool_method, special):
        mul=arith_method(cls, operator.mul, special),
        truediv=arith_method(cls, operator.truediv, special),
        floordiv=arith_method(cls, operator.floordiv, special),
-        # Causes a floating point exception in the tests when numexpr enabled,
-        # so for now no speedup


This message should have been moved in #19649

I think Ok if we have not had reports of floating point exceptions since enabled.

pandas/core/computation/expressions.py

jorisvandenbossche · 2020-09-23T12:14:03Z

@simonjayhawkins example with rmod:

In [41]: pd.Series([-5] * 10001) % 24  
Out[41]: 
0       -5
1       -5
2       -5
3       -5
4       -5
        ..
9996    -5
9997    -5
9998    -5
9999    -5
10000   -5
Length: 10001, dtype: int64

In [42]: -5 % pd.Series([24] * 10001) 
Out[42]: 
0       -5
1       -5
2       -5
3       -5
4       -5
        ..
9996    -5
9997    -5
9998    -5
9999    -5
10000   -5
Length: 10001, dtype: int64

But not sure how it comes this also regressed compared to 1.0, as we already had rmod: "%", before as well (maybe in the end rmod also ends up calling mod ?)

jorisvandenbossche · 2020-09-23T12:14:54Z

(sorry, see now you only asked about (r)floordiv)

jorisvandenbossche

Can you add some tests for those cases?

simonjayhawkins · 2020-09-23T12:45:02Z

But not sure how it comes this also regressed compared to 1.0, as we already had rmod: "%", before as well (maybe in the end rmod also ends up calling mod ?)

rmod was already failing, so not a regression. I couldn't find any issues so probably not so used so much, considering we have had 2 reports on the mod issue in a short space of time.

jorisvandenbossche · 2020-09-23T12:54:02Z

I don't see any failure here:

In [4]: pd.__version__                                                                                                                                                                                             
Out[4]: '1.0.3'

In [5]: pd.Series([-5] * 10001) % 24                                                                                                                                                                               
Out[5]: 
0        19
1        19
2        19
3        19
4        19
         ..
9996     19
9997     19
9998     19
9999     19
10000    19
Length: 10001, dtype: int64

In [6]: -5 % pd.Series([24] * 10001)                                                                                                                                                                               
Out[6]: 
0        19
1        19
2        19
3        19
4        19
         ..
9996     19
9997     19
9998     19
9999     19
10000    19
Length: 10001, dtype: int64

and both are failing on master

simonjayhawkins · 2020-09-23T13:37:12Z

for regressions, i check against 1.0.5.

>>> pd.__version__
'1.0.5'
>>> -5 % pd.Series([24] * 10001)
0       -5
1       -5
2       -5
3       -5
4       -5
        ..
9996    -5
9997    -5
9998    -5
9999    -5
10000   -5
Length: 10001, dtype: int64
>>>

i'll try to find the cause of that regression. it was indeed working on 0.25.3 also.

jorisvandenbossche · 2020-09-23T13:55:14Z

Whoops, I simply did not have numexpr installed in my 1.0 environment ..

simonjayhawkins · 2020-09-23T14:26:18Z

Whoops, I simply did not have numexpr installed in my 1.0 environment ..

And I didn't in my 0.25.3! 😄

jorisvandenbossche · 2020-09-23T14:37:59Z

Optional dependencies used in core operations ... :-)

Anyway, rmod is less used, so most important is fixing mod itself, which is undoubtly a regression

simonjayhawkins · 2020-09-23T14:46:58Z

Anyway, rmod is less used, so most important is fixing mod itself, which is undoubtly a regression

my plan for this PR was simply to revert the change the caused the regression.

I not yet looked, but there may be special handling which is why floordiv and rfloordiv and not affected.

I will add tests for all 4 ops and xfail the rmod one and can then look to a proper fix (which could also be backportable of course)

jorisvandenbossche · 2020-09-23T14:49:18Z

Sounds good!

simonjayhawkins · 2020-09-23T18:54:41Z

unrelated failure (also on #34918)

jorisvandenbossche · 2020-09-23T19:06:34Z

pandas/tests/test_expressions.py

+    @pytest.mark.parametrize("scalar", [-5, 5])
+    def test_python_semantics_with_numexpr_installed(self, op, box, tester, scalar):
+        # https://github.com/pandas-dev/pandas/issues/36047
+        expr._MIN_ELEMENTS = 0


I see it is done like that in other tests as well (so doesn't need to be fixed here), but that doesn't seem a very clean way to patch this, as it will influence other tests as well

this is an class with a teardown_method that resets _MIN_ELEMENTS

but yes we should replace with a fixture.

Ah, didn't see the teardown/setup, only the import at the top ..

jorisvandenbossche · 2020-09-23T19:07:36Z

pandas/tests/test_expressions.py

+        expr.set_use_numexpr(False)
+        expected = method(scalar)
+        expr.set_use_numexpr(True)
+        tester(result, expected)


Can you also test against a manually constructed expected result?
(or do you know if we have elsewhere a test that covers this behaviour with negative numbers for modulo ?)

i'll look though the tests again to see what we already have.

the method i used here is inspired by run_arithmetic, run_binary and also run_frame

I kept the object sizes small here so could do elementwise comparion similar to code in #36552 (comment) wdyt?

It's certainly good to test the equivalency of numpy vs numexpr behaviour. But here we discovered a specific case where they are not equal, and I think it is useful to also (in addition, can also be a separate test) test this corner case of negative values with modulo.
We might change the implementation in the future (eg assume we would take on numexpr as required dependency and always use it), and then we wouldn't necessarily catch this case.

The object size shouldn't matter since you did expr._MIN_ELEMENTS = 0? (so it's fine to use a small example for which it is easy to manually create the expected result?)

The object size shouldn't matter

I meant iterating through in Python was not a resource issue.

I have added an element-wise comparison but tests are failing at the moment

>>> pd.__version__ '1.2.0.dev0+499.g3b12293cea' >>> >>> 5 // 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero >>> >>> 5 // np.array([0, 0])[0] <stdin>:1: RuntimeWarning: divide by zero encountered in long_scalars 0 >>> >>> 5 // np.array(0) <stdin>:1: RuntimeWarning: divide by zero encountered in floor_divide 0 >>> >>> 5 // np.array([-1, 0, 1]) array([-5, 0, 5], dtype=int32) >>> >>> 5 // pd.Series([-1, 0, 1]) 0 -5.0 1 inf 2 5.0 dtype: float64 >>> >>> -5 // np.array([-1, 0, 1]) array([ 5, 0, -5], dtype=int32) >>> >>> -5 // pd.Series([-1, 0, 1]) 0 5.0 1 -inf 2 -5.0 dtype: float64 >>>

fixed test. this discrepancy that has surfaced is not related to the regression issue, will investigate further.

I think pandas is doing the right thing here? other than returning a non-integer result for floordiv. maybe should return IntegerArray in future

>>> 5 / np.array([-1, 0, 1]) <stdin>:1: RuntimeWarning: divide by zero encountered in true_divide array([-5., inf, 5.]) >>> >>> 5 / pd.Series([-1, 0, 1]) 0 -5.0 1 inf 2 5.0 dtype: float64 >>>

Can you also test against a manually constructed expected result?

still outstanding

jreback · 2020-09-26T01:28:23Z

this looks ready @simonjayhawkins (non-withstanding your comment above, which looks separte). not worried about the single failure which is old on travis.

simonjayhawkins · 2020-09-28T10:12:18Z

Not quite ready. review request #36552 (comment) remains outstanding.

I can't block by requesting changes on my own PR, moving to 1.1.4 so release note needs to be moved. closing for now instead. Will re-open after 1.1.3 release.

I've started the final release readiness checks. merging now would require going back to square one.

jorisvandenbossche · 2020-09-28T10:17:23Z

@simonjayhawkins since this is basically ready, and fixing a serious regression (silently wrong result), I would still include it if possible.

I think the only outstanding request was to have a more manual test checking correctness? That can certainly be left for a follow-up PR and shouldn't block 1.1.3 (we also didn't have such test now, so I mainly see this as a nice-to-have)

simonjayhawkins · 2020-09-28T10:25:54Z

Yeah a couple of reports about this. so getting this in would be good. (I closed this since often never seem to get round to nice to haves!) reopening to check ci status

jreback

no objection to merging for 1.1.3 (minor comment but nbd)

jreback · 2020-09-30T12:04:53Z

pandas/tests/test_expressions.py

+        "box, tester",
+        [
+            (DataFrame, tm.assert_frame_equal),
+            (Series, tm.assert_series_equal),


we don't need to usually add the tester as assert_equal handles this

good point. a bit belt and braces. will change so we re-run ci b4 merge.

jreback · 2020-09-30T18:38:05Z

lgtm merge when ready (ignore the arm failure)

simonjayhawkins · 2020-09-30T20:28:22Z

@meeseeksdev backport 1.1.x

…with numexpr

…pr (#36750) Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

REGR: Series.__mod__ behaves different with numexpr

41bb091

simonjayhawkins commented Sep 22, 2020

View reviewed changes

simonjayhawkins added Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version labels Sep 22, 2020

simonjayhawkins added this to the 1.1.3 milestone Sep 22, 2020

jbrockmendel reviewed Sep 22, 2020

View reviewed changes

pandas/core/computation/expressions.py Show resolved Hide resolved

jorisvandenbossche reviewed Sep 23, 2020

View reviewed changes

simonjayhawkins added 3 commits September 23, 2020 15:59

Merge remote-tracking branch 'upstream/master' into numexpr-regr

14e8b92

add test

1e2a086

release note

4527d23

simonjayhawkins marked this pull request as ready for review September 23, 2020 18:54

jorisvandenbossche reviewed Sep 23, 2020

View reviewed changes

simonjayhawkins added 3 commits September 25, 2020 13:00

Merge remote-tracking branch 'upstream/master' into numexpr-regr

ccc7eb6

compare result element-wise with Python

85db057

fix test

ff9e85b

simonjayhawkins closed this Sep 28, 2020

simonjayhawkins reopened this Sep 28, 2020

simonjayhawkins mentioned this pull request Sep 28, 2020

RLS: 1.1.3 #36186

Closed

jreback approved these changes Sep 30, 2020

View reviewed changes

simonjayhawkins added 2 commits September 30, 2020 18:54

Merge remote-tracking branch 'upstream/master' into numexpr-regr

b904d44

use tm.assert_equal

4f9b91d

jreback approved these changes Sep 30, 2020

View reviewed changes

simonjayhawkins merged commit e659627 into pandas-dev:master Sep 30, 2020

meeseeksmachine mentioned this pull request Sep 30, 2020

Backport PR #36552 on branch 1.1.x (REGR: Series.__mod__ behaves different with numexpr) #36750

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Sep 30, 2020

Backport PR pandas-dev#36552: REGR: Series.__mod__ behaves different …

33263f4

…with numexpr

simonjayhawkins deleted the numexpr-regr branch September 30, 2020 20:29

simonjayhawkins added a commit that referenced this pull request Oct 1, 2020

Backport PR #36552: REGR: Series.__mod__ behaves different with numex…

00ae553

…pr (#36750) Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

REGR: Series.__mod__ behaves different with numexpr (pandas-dev#36552)

50229e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Series.mod behaves different with numexpr #36552

REGR: Series.mod behaves different with numexpr #36552

simonjayhawkins commented Sep 22, 2020

simonjayhawkins commented Sep 22, 2020 •

edited

simonjayhawkins Sep 22, 2020

jorisvandenbossche commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

jorisvandenbossche left a comment

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche Sep 23, 2020

simonjayhawkins Sep 23, 2020

simonjayhawkins Sep 23, 2020

jorisvandenbossche Sep 23, 2020

jorisvandenbossche Sep 23, 2020

simonjayhawkins Sep 23, 2020

jorisvandenbossche Sep 24, 2020 •

edited

simonjayhawkins Sep 25, 2020

simonjayhawkins Sep 25, 2020

simonjayhawkins Sep 25, 2020

jreback commented Sep 26, 2020

simonjayhawkins commented Sep 28, 2020

jorisvandenbossche commented Sep 28, 2020

simonjayhawkins commented Sep 28, 2020

jreback left a comment

jreback Sep 30, 2020

simonjayhawkins Sep 30, 2020

jreback commented Sep 30, 2020

simonjayhawkins commented Sep 30, 2020

REGR: Series.__mod__ behaves different with numexpr #36552

REGR: Series.__mod__ behaves different with numexpr #36552

Conversation

simonjayhawkins commented Sep 22, 2020

simonjayhawkins commented Sep 22, 2020 • edited

Choose a reason for hiding this comment

jorisvandenbossche commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

jorisvandenbossche commented Sep 23, 2020

simonjayhawkins commented Sep 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Sep 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 26, 2020

simonjayhawkins commented Sep 28, 2020

jorisvandenbossche commented Sep 28, 2020

simonjayhawkins commented Sep 28, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 30, 2020

simonjayhawkins commented Sep 30, 2020

REGR: Series.mod behaves different with numexpr #36552

REGR: Series.mod behaves different with numexpr #36552

simonjayhawkins commented Sep 22, 2020 •

edited

jorisvandenbossche Sep 24, 2020 •

edited