Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Series.__mod__ behaves different with numexpr #36552

Merged
merged 9 commits into from
Sep 30, 2020

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins
Copy link
Member Author

simonjayhawkins commented Sep 22, 2020

will add tests if reverting change in #30147 without any new test failures see #30147 (comment)

@@ -171,8 +171,6 @@ def _create_methods(cls, arith_method, comp_method, bool_method, special):
mul=arith_method(cls, operator.mul, special),
truediv=arith_method(cls, operator.truediv, special),
floordiv=arith_method(cls, operator.floordiv, special),
# Causes a floating point exception in the tests when numexpr enabled,
# so for now no speedup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message should have been moved in #19649

I think Ok if we have not had reports of floating point exceptions since enabled.

@simonjayhawkins simonjayhawkins added Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version labels Sep 22, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.3 milestone Sep 22, 2020
@jorisvandenbossche
Copy link
Member

@simonjayhawkins example with rmod:

In [41]: pd.Series([-5] * 10001) % 24  
Out[41]: 
0       -5
1       -5
2       -5
3       -5
4       -5
        ..
9996    -5
9997    -5
9998    -5
9999    -5
10000   -5
Length: 10001, dtype: int64

In [42]: -5 % pd.Series([24] * 10001) 
Out[42]: 
0       -5
1       -5
2       -5
3       -5
4       -5
        ..
9996    -5
9997    -5
9998    -5
9999    -5
10000   -5
Length: 10001, dtype: int64

But not sure how it comes this also regressed compared to 1.0, as we already had rmod: "%", before as well (maybe in the end rmod also ends up calling mod ?)

@jorisvandenbossche
Copy link
Member

(sorry, see now you only asked about (r)floordiv)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for those cases?

@simonjayhawkins
Copy link
Member Author

But not sure how it comes this also regressed compared to 1.0, as we already had rmod: "%", before as well (maybe in the end rmod also ends up calling mod ?)

rmod was already failing, so not a regression. I couldn't find any issues so probably not so used so much, considering we have had 2 reports on the mod issue in a short space of time.

@jorisvandenbossche
Copy link
Member

I don't see any failure here:

In [4]: pd.__version__                                                                                                                                                                                             
Out[4]: '1.0.3'

In [5]: pd.Series([-5] * 10001) % 24                                                                                                                                                                               
Out[5]: 
0        19
1        19
2        19
3        19
4        19
         ..
9996     19
9997     19
9998     19
9999     19
10000    19
Length: 10001, dtype: int64

In [6]: -5 % pd.Series([24] * 10001)                                                                                                                                                                               
Out[6]: 
0        19
1        19
2        19
3        19
4        19
         ..
9996     19
9997     19
9998     19
9999     19
10000    19
Length: 10001, dtype: int64

and both are failing on master

@simonjayhawkins
Copy link
Member Author

for regressions, i check against 1.0.5.

>>> pd.__version__
'1.0.5'
>>> -5 % pd.Series([24] * 10001)
0       -5
1       -5
2       -5
3       -5
4       -5
        ..
9996    -5
9997    -5
9998    -5
9999    -5
10000   -5
Length: 10001, dtype: int64
>>>

i'll try to find the cause of that regression. it was indeed working on 0.25.3 also.

@jorisvandenbossche
Copy link
Member

Whoops, I simply did not have numexpr installed in my 1.0 environment ..

@simonjayhawkins
Copy link
Member Author

Whoops, I simply did not have numexpr installed in my 1.0 environment ..

And I didn't in my 0.25.3! 😄

@jorisvandenbossche
Copy link
Member

Optional dependencies used in core operations ... :-)

Anyway, rmod is less used, so most important is fixing mod itself, which is undoubtly a regression

@simonjayhawkins
Copy link
Member Author

Anyway, rmod is less used, so most important is fixing mod itself, which is undoubtly a regression

my plan for this PR was simply to revert the change the caused the regression.

I not yet looked, but there may be special handling which is why floordiv and rfloordiv and not affected.

I will add tests for all 4 ops and xfail the rmod one and can then look to a proper fix (which could also be backportable of course)

@jorisvandenbossche
Copy link
Member

Sounds good!

@simonjayhawkins
Copy link
Member Author

unrelated failure (also on #34918)

@simonjayhawkins simonjayhawkins marked this pull request as ready for review September 23, 2020 18:54
@pytest.mark.parametrize("scalar", [-5, 5])
def test_python_semantics_with_numexpr_installed(self, op, box, tester, scalar):
# https://github.com/pandas-dev/pandas/issues/36047
expr._MIN_ELEMENTS = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it is done like that in other tests as well (so doesn't need to be fixed here), but that doesn't seem a very clean way to patch this, as it will influence other tests as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an class with a teardown_method that resets _MIN_ELEMENTS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but yes we should replace with a fixture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't see the teardown/setup, only the import at the top ..

expr.set_use_numexpr(False)
expected = method(scalar)
expr.set_use_numexpr(True)
tester(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test against a manually constructed expected result?
(or do you know if we have elsewhere a test that covers this behaviour with negative numbers for modulo ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll look though the tests again to see what we already have.

the method i used here is inspired by run_arithmetic, run_binary and also run_frame

I kept the object sizes small here so could do elementwise comparion similar to code in #36552 (comment) wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly good to test the equivalency of numpy vs numexpr behaviour. But here we discovered a specific case where they are not equal, and I think it is useful to also (in addition, can also be a separate test) test this corner case of negative values with modulo.
We might change the implementation in the future (eg assume we would take on numexpr as required dependency and always use it), and then we wouldn't necessarily catch this case.

The object size shouldn't matter since you did expr._MIN_ELEMENTS = 0? (so it's fine to use a small example for which it is easy to manually create the expected result?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object size shouldn't matter

I meant iterating through in Python was not a resource issue.

I have added an element-wise comparison but tests are failing at the moment

>>> pd.__version__
'1.2.0.dev0+499.g3b12293cea'
>>>
>>> 5 // 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>>
>>> 5 // np.array([0, 0])[0]
<stdin>:1: RuntimeWarning: divide by zero encountered in long_scalars
0
>>>
>>> 5 // np.array(0)
<stdin>:1: RuntimeWarning: divide by zero encountered in floor_divide
0
>>>
>>> 5 // np.array([-1, 0, 1])
array([-5,  0,  5], dtype=int32)
>>>
>>> 5 // pd.Series([-1, 0, 1])
0   -5.0
1    inf
2    5.0
dtype: float64
>>>
>>> -5 // np.array([-1, 0, 1])
array([ 5,  0, -5], dtype=int32)
>>>
>>> -5 // pd.Series([-1, 0, 1])
0    5.0
1   -inf
2   -5.0
dtype: float64
>>>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed test. this discrepancy that has surfaced is not related to the regression issue, will investigate further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pandas is doing the right thing here? other than returning a non-integer result for floordiv. maybe should return IntegerArray in future

>>> 5 / np.array([-1, 0, 1])
<stdin>:1: RuntimeWarning: divide by zero encountered in true_divide
array([-5., inf,  5.])
>>>
>>> 5 / pd.Series([-1, 0, 1])
0   -5.0
1    inf
2    5.0
dtype: float64
>>>

Can you also test against a manually constructed expected result?

still outstanding

@jreback
Copy link
Contributor

jreback commented Sep 26, 2020

this looks ready @simonjayhawkins (non-withstanding your comment above, which looks separte). not worried about the single failure which is old on travis.

@simonjayhawkins
Copy link
Member Author

Not quite ready. review request #36552 (comment) remains outstanding.

I can't block by requesting changes on my own PR, moving to 1.1.4 so release note needs to be moved. closing for now instead. Will re-open after 1.1.3 release.

I've started the final release readiness checks. merging now would require going back to square one.

@jorisvandenbossche
Copy link
Member

@simonjayhawkins since this is basically ready, and fixing a serious regression (silently wrong result), I would still include it if possible.

I think the only outstanding request was to have a more manual test checking correctness? That can certainly be left for a follow-up PR and shouldn't block 1.1.3 (we also didn't have such test now, so I mainly see this as a nice-to-have)

@simonjayhawkins
Copy link
Member Author

Yeah a couple of reports about this. so getting this in would be good. (I closed this since often never seem to get round to nice to haves!) reopening to check ci status

@simonjayhawkins simonjayhawkins mentioned this pull request Sep 28, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no objection to merging for 1.1.3 (minor comment but nbd)

"box, tester",
[
(DataFrame, tm.assert_frame_equal),
(Series, tm.assert_series_equal),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to usually add the tester as assert_equal handles this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. a bit belt and braces. will change so we re-run ci b4 merge.

@jreback
Copy link
Contributor

jreback commented Sep 30, 2020

lgtm merge when ready (ignore the arm failure)

@simonjayhawkins simonjayhawkins merged commit e659627 into pandas-dev:master Sep 30, 2020
@simonjayhawkins
Copy link
Member Author

@meeseeksdev backport 1.1.x

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Sep 30, 2020
@simonjayhawkins simonjayhawkins deleted the numexpr-regr branch September 30, 2020 20:29
simonjayhawkins added a commit that referenced this pull request Oct 1, 2020
…pr (#36750)

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.__mod__ behaves different with >1e4 rows
4 participants