New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and test index division by zero #19347

Closed
wants to merge 24 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@jbrockmendel
Member

jbrockmendel commented Jan 22, 2018

Related: #19336, this implements the most important parts of the parametrization requested there.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 22, 2018

Member

Side-note: This is still not great; left may no longer be self, so this may raise an AttributeError. The existing method specifically catches AttributeError so I guess this is intentional.

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

use getattr

This comment has been minimized.

@jbrockmendel

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

getattr(left, '_step', left) and remove the AttributeError catching

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

getattr(left, '_stop', left)

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

and remove the AttributeError catching

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

can you do this

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 31, 2018

Member

Out of scope. Until I look at this closely, I'm assuming that the author had a reason for doing it this way.

This comment has been minimized.

@jreback

jreback Feb 1, 2018

Contributor

ok fair enough

result = op(idx, zero)
tm.assert_index_equal(result, expected)
ser_compat = op(Series(idx).astype('i8'), np.array(zero).astype('i8'))
tm.assert_series_equal(ser_compat, Series(result))

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 22, 2018

Member

This is a "check that this matches the Series behavior" check. Note that because the Series implementation does not yet get this right for all dtypes, we are casting the Series version to int64 and checking that it matches the one version that does consistently work. Once the Series methods are fixed, this casting can be removed.

jbrockmendel added some commits Jan 22, 2018

Parameters
----------
op : function (operator.add, operator.div, ...)
left : object, usually Index

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

usually Index means?

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 24, 2018

Member

That the input left is going to be an Index object most of the time, but could be an arbitrary object.

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

I know what it means, I am asking to have you elaborate in the note on these cases, IOW why would be it be an arbitrary object, eg.. when the op is reversed?

Returns
-------
result : ndarray

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

I think this function is better in missing

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 24, 2018

Member

Sounds good.

@@ -645,6 +645,44 @@ def fill_zeros(result, x, y, name, fill):
return result
def mask_zero_div_zero(x, y, result):

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

this is not dissimilar from fill_zeros which is not used very much at all. maybe can consolidate.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 24, 2018

Member

First attempt was to edit fill_zeros to make it work, but I ended up changing it more than I thought appropriate. I'll take another look.

Returns
-------
filled_result : ndarray

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

the semantics you have here are not explained well. you are mutating result in-place, then returning it. is that intentional?

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 24, 2018

Member

I'll flesh it out. The semantics were meant to follow fill_zeros as closely as possible.

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

ok some more expl would be helpful. filling result is ok, you just have to make this really clear.

@@ -17,6 +17,11 @@
from pandas.tests.indexes.common import Base
# For testing division by (or of) zero for Series with length 5, this

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

these should be fixtures

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d
def gen_zeros(arr_len):

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

move this to conftest as its just a fixture

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 24, 2018

Member

This is going to be re-used in Series tests a few PRs from now. Which conftest should it go into?

This comment has been minimized.

@jreback

jreback Jan 24, 2018

Contributor

you can put it in the top-level one pandas/conftest.py I would really like to move more things there

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 24, 2018

Member

How do we pass the arr_len arg to it?

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

you don't you move this as a function, then create as a fixture.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 25, 2018

Member

So I move the function to conftest, then from test_numeric create a fixture like (?):

@pytest.fixture(params=[x for x in gen_zeros(5)
                                          if not isinstance(x, Series)])
def zero(request):
    # For testing division by (or of) zero for Series with length 5, this
    # gives several scalar-zeros and length-5 vector-zeros
    return request.param

Do we import conftest? I haven't seen that elsewhere. What's the upside of this over just having zeros in the module namespace?

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

yes. you don't need to import conftest. you can put this in the module only if its only used there. the point of conftest is to share.

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

since this is only used in 1 place, just move it to the test file itself and make it into a fixture

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 24, 2018

this needs a sub-section in whatsnew to show what has changed (e.g. Int and Range Index with / 0 and such)

@codecov

This comment has been minimized.

codecov bot commented Jan 24, 2018

Codecov Report

Merging #19347 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19347      +/-   ##
==========================================
- Coverage   91.69%   91.67%   -0.03%     
==========================================
  Files         148      148              
  Lines       48561    48586      +25     
==========================================
+ Hits        44530    44543      +13     
- Misses       4031     4043      +12
Flag Coverage Δ
#multiple 90.04% <100%> (-0.02%) ⬇️
#single 41.71% <22.5%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/range.py 95.65% <100%> (-0.06%) ⬇️
pandas/core/indexes/base.py 96.45% <100%> (ø) ⬆️
pandas/core/missing.py 85.9% <100%> (+1.11%) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69cd5fb...be1e2e1. Read the comment docs.

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Division operations on ``Index`` and subclasses will now fill positive / 0 with ``np.inf``, negative / 0 with ``-np.inf``, and 0 / 0 with ``np.nan``. This matches existing Series behavior.

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

double backticks on Series. Add the issue number, can be multiple (e.g. this PR number and the xref issue)

In [6]: index = pd.UInt64Index([0, 1])
In [7]: index / np.array([0, 0], dtype=np.uint64)
Out[7]: UInt64Index([0, 0], dtype='uint64')

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

switch the order of these blocks, we always write previous, then current.

Current Behavior:
.. code-block:: ipython

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

current should always be ipython, so the code executes.

Previous Behavior:
.. code-block:: ipython

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

code blocks here are fine, but you need to actually show the executed code as in an ipython session.

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 25, 2018

Member

Not totally clear on the difference. I'll be pushing shortly with fixes to most other comments and a best-guess as to these.

Returns
-------
filled_result : ndarray

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

ok some more expl would be helpful. filling result is ok, you just have to make this really clear.

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d
def gen_zeros(arr_len):

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

you don't you move this as a function, then create as a fixture.

.. code-block:: ipython
index = pd.Index([-1, 0, 1])

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

a code-block needs to have a complete section copy pasted. refer to the sphinx docs

.. ipython:: python
index = pd.Index([-1, 0, 1])

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

break this into 2 blocks and give some comments

results = op(self, other)
return Index(results, **attrs)
except (ValueError, TypeError, AttributeError,
ZeroDivisionError):

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

when does this raise ZeroDivisionError? isn't he point of the errstate to catch that?

This comment has been minimized.

@jbrockmendel

jbrockmendel Jan 25, 2018

Member

when does this raise ZeroDivisionError?

ATM pd.RangeIndex(0, 5) / 0 raises ZeroDivisionError.

isn't he point of the errstate to catch that?

I think errstate is to suppress warnings, but not really sure.

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

hmm, that seems wrong, but I guess can cover later

try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

use getattr

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d
def gen_zeros(arr_len):

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

yes. you don't need to import conftest. you can put this in the module only if its only used there. the point of conftest is to share.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 25, 2018

the point of conftest is to share.

... isn't that also the point of pandas.util.testing?

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 25, 2018

use getattr

Huh?

@jbrockmendel jbrockmendel reopened this Jan 25, 2018

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 25, 2018

fastparquet errors on appveyor

In [10]: pd.RangeIndex(1, 5) / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-10-4c5e91d516f3> in <module>()

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

you don't need the while traceback with line-number, just the error line (213) itself)

.. code-block:: ipython
In [6]: index = pd.Index([-1, 0, 1])

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

create this an an Int64Index

index = pd.Index([-1, 0, 1])
index / 0
# The result of division by zero should not depend on whether the zero is int or float

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

remove this case (or add above as well). the point of the previous/current display is to make it easy to map up what is changing.

try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

getattr(left, '_stop', left)

try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

and remove the AttributeError catching

nan_mask = (zmask & (x == 0)).ravel()
neginf_mask = (zmask & (x < 0)).ravel()
posinf_mask = (zmask & (x > 0)).ravel()

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

add some comments here

@@ -17,6 +17,11 @@
from pandas.tests.indexes.common import Base
# For testing division by (or of) zero for Series with length 5, this
# gives several scalar-zeros and length-5 vector-zeros
zeros = tm.gen_zeros(5)

This comment has been minimized.

@jreback

jreback Jan 26, 2018

Contributor

just make this a fixture, then you don't need to add a parameterize on each function

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

can you do this

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Division operations on ``Index`` and subclasses will now fill positive / 0 with ``np.inf``, negative / 0 with ``-np.inf``, and 0 / 0 with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`)

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

this is a confusing sentence, its not obvious that 'postive / 0' mean a positive number divided by zero. maybe use double-backticks around positive / 0

try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

can you do this

@@ -17,6 +17,11 @@
from pandas.tests.indexes.common import Base
# For testing division by (or of) zero for Series with length 5, this
# gives several scalar-zeros and length-5 vector-zeros
zeros = tm.gen_zeros(5)

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

can you do this

@@ -1964,6 +1964,32 @@ def add_nans_panel4d(panel4d):
return panel4d
def gen_zeros(arr_len):

This comment has been minimized.

@jreback

jreback Jan 31, 2018

Contributor

since this is only used in 1 place, just move it to the test file itself and make it into a fixture

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Feb 1, 2018

Made zeros a fixture in test_numeric as requested, but that broke tests in test_range b/c of pytest magic. Just pushed a commit that reverted to having zeros be a list in test_numeric namespace.

@@ -18,6 +18,16 @@
from pandas.tests.indexes.common import Base
# For testing division by (or of) zero for Series with length 5, this

This comment has been minimized.

@jreback

jreback Feb 1, 2018

Contributor

maybe you misunderstand what I want here

# gives several scalar-zeros and length-5 vector-zeros
 zeros = [box([0] * 5, dtype=dtype)
          for box in [pd.Index, np.array]
          for dtype in [np.int64, np.uint64, np.float64]]
 zeros.extend([np.array(0, dtype=dtype)
              for dtype in [np.int64, np.uint64, np.float64]])
 zeros.extend([0, 0.0, long(0)])

@pytest.mark.fixture(params=zeros)
def zero(request):
    return request.param

then remove the @pytest.mark.parmetrize on all the test functions

This comment has been minimized.

@jbrockmendel

jbrockmendel Feb 1, 2018

Member

See previous comment:

Made zeros a fixture in test_numeric as requested, but that broke tests in test_range b/c of pytest magic

AFAICT the problem is caused by the fixture being in the namespace for test_numeric but not for test_range, which has a class that inherits from the test_numeric.Numeric (which defines the relevant tests). I guess we could get around that by putting this in a shared conftest file, but all that accomplishes is making it harder for a reader to find out where things are defined.

This comment has been minimized.

This comment has been minimized.

@jreback

jreback Feb 2, 2018

Contributor

so move it to the conftest, this is quite standard practice and avoids code duplication.

@@ -18,6 +18,16 @@
from pandas.tests.indexes.common import Base
# For testing division by (or of) zero for Series with length 5, this

This comment has been minimized.

@jreback

jreback Feb 2, 2018

Contributor

so move it to the conftest, this is quite standard practice and avoids code duplication.

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Feb 5, 2018

@jreback I think the fixtures are all as requested. LMK.

In [7]: index / 0
Out[7]: Int64Index([0, 0, 0], dtype='int64')

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

check this after its built to make sure it looks ok.

>>> result # raw numpy result does not fill division by zero
array([0, 0, 0])
>>> mask_zero_div_zero(x, y, result)
array([ inf, nan, -inf])

This comment has been minimized.

@jreback

jreback Feb 6, 2018

Contributor

pls add to the list to consolidate this module (e.g. the existing fill_zeros)

@jreback jreback added this to the 0.23.0 milestone Feb 6, 2018

@jreback

jreback approved these changes Feb 6, 2018

@jreback jreback closed this in ed10bf6 Feb 6, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented Feb 6, 2018

thanks. a couple of comments for the future.

@jbrockmendel jbrockmendel deleted the jbrockmendel:div_zero2 branch Feb 6, 2018

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

TST: fix and test index division by zero
Related: pandas-dev#19336

Author: Brock Mendel <jbrockmendel@gmail.com>

Closes pandas-dev#19347 from jbrockmendel/div_zero2 and squashes the following commits:

be1e2e1 [Brock Mendel] move fixture to conftest
64b0c08 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
aa969f8 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
000aefd [Brock Mendel] fix long again
9de356a [Brock Mendel] revert fixture to fix test_range failures
b8cf21d [Brock Mendel] flake8 remove unused import
afedba9 [Brock Mendel] whatsnew clarification
b51c2e1 [Brock Mendel] fixturize
37efd51 [Brock Mendel] make zero a fixture
965f721 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
d648ef6 [Brock Mendel] requested edits
1ef3a6c [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
78de1a4 [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
0277d9f [Brock Mendel] add ipython output to whatsnew
5d7e3ea [Brock Mendel] Merge branch 'master' of https://github.com/pandas-dev/pandas into div_zero2
ea75c3c [Brock Mendel] ipython block
6fc61bd [Brock Mendel] elaborate docstring
ca3bf42 [Brock Mendel] Whatsnew section
cd54349 [Brock Mendel] move dispatch_missing to core.missing
06df02a [Brock Mendel] py3 fix
84c74c5 [Brock Mendel] remove operator.div for py3
6acc2f7 [Brock Mendel] fix missing import
e0e89b9 [Brock Mendel] fix and and tests for divmod
969f342 [Brock Mendel] fix and test index division by zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment