Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and test index division by zero #19347

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
969f342
fix and test index division by zero
jbrockmendel Jan 22, 2018
e0e89b9
fix and and tests for divmod
jbrockmendel Jan 22, 2018
6acc2f7
fix missing import
jbrockmendel Jan 22, 2018
84c74c5
remove operator.div for py3
jbrockmendel Jan 23, 2018
06df02a
py3 fix
jbrockmendel Jan 23, 2018
cd54349
move dispatch_missing to core.missing
jbrockmendel Jan 24, 2018
ca3bf42
Whatsnew section
jbrockmendel Jan 24, 2018
6fc61bd
elaborate docstring
jbrockmendel Jan 25, 2018
ea75c3c
ipython block
jbrockmendel Jan 25, 2018
5d7e3ea
Merge branch 'master' of https://github.com/pandas-dev/pandas into di…
jbrockmendel Jan 25, 2018
0277d9f
add ipython output to whatsnew
jbrockmendel Jan 25, 2018
78de1a4
Merge branch 'master' of https://github.com/pandas-dev/pandas into di…
jbrockmendel Jan 28, 2018
1ef3a6c
Merge branch 'master' of https://github.com/pandas-dev/pandas into di…
jbrockmendel Jan 29, 2018
d648ef6
requested edits
jbrockmendel Jan 29, 2018
965f721
Merge branch 'master' of https://github.com/pandas-dev/pandas into di…
jbrockmendel Jan 31, 2018
37efd51
make zero a fixture
jbrockmendel Jan 31, 2018
b51c2e1
fixturize
jbrockmendel Jan 31, 2018
afedba9
whatsnew clarification
jbrockmendel Jan 31, 2018
b8cf21d
flake8 remove unused import
jbrockmendel Jan 31, 2018
9de356a
revert fixture to fix test_range failures
jbrockmendel Feb 1, 2018
000aefd
fix long again
jbrockmendel Feb 1, 2018
aa969f8
Merge branch 'master' of https://github.com/pandas-dev/pandas into di…
jbrockmendel Feb 1, 2018
64b0c08
Merge branch 'master' of https://github.com/pandas-dev/pandas into di…
jbrockmendel Feb 2, 2018
be1e2e1
move fixture to conftest
jbrockmendel Feb 2, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
44 changes: 44 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Expand Up @@ -176,6 +176,50 @@ Please note that the string `index` is not supported with the round trip format,
new_df
print(new_df.index.name)

.. _whatsnew_0230.enhancements.index_division_by_zero

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Division operations on ``Index`` and subclasses will now fill division of positive numbers by zero with ``np.inf``, division of negative numbers by zero with ``-np.inf`` and `0 / 0` with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`)

Previous Behavior:

.. code-block:: ipython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current should always be ipython, so the code executes.


In [6]: index = pd.Int64Index([-1, 0, 1])

In [7]: index / 0
Out[7]: Int64Index([0, 0, 0], dtype='int64')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch the order of these blocks, we always write previous, then current.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this after its built to make sure it looks ok.

# Previous behavior yielded different results depending on the type of zero in the divisor
In [8]: index / 0.0
Out[8]: Float64Index([-inf, nan, inf], dtype='float64')

In [9]: index = pd.UInt64Index([0, 1])

In [10]: index / np.array([0, 0], dtype=np.uint64)
Out[10]: UInt64Index([0, 0], dtype='uint64')

In [11]: pd.RangeIndex(1, 5) / 0
ZeroDivisionError: integer division or modulo by zero

Current Behavior:

.. ipython:: python

index = pd.Int64Index([-1, 0, 1])
# division by zero gives -infinity where negative, +infinity where positive, and NaN for 0 / 0
index / 0

# The result of division by zero should not depend on whether the zero is int or float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this case (or add above as well). the point of the previous/current display is to make it easy to map up what is changing.

index / 0.0

index = pd.UInt64Index([0, 1])
index / np.array([0, 0], dtype=np.uint64)

pd.RangeIndex(1, 5) / 0

.. _whatsnew_0230.enhancements.other:

Other Enhancements
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/indexes/base.py
Expand Up @@ -4040,6 +4040,8 @@ def _evaluate_numeric_binop(self, other):
attrs = self._maybe_update_attributes(attrs)
with np.errstate(all='ignore'):
result = op(values, other)

result = missing.dispatch_missing(op, values, other, result)
return constructor(result, **attrs)

return _evaluate_numeric_binop
Expand Down
31 changes: 13 additions & 18 deletions pandas/core/indexes/range.py
Expand Up @@ -550,7 +550,7 @@ def __getitem__(self, key):
return super_getitem(key)

def __floordiv__(self, other):
if is_integer(other):
if is_integer(other) and other != 0:
if (len(self) == 0 or
self._start % other == 0 and
self._step % other == 0):
Expand Down Expand Up @@ -592,26 +592,27 @@ def _evaluate_numeric_binop(self, other):
attrs = self._get_attributes_dict()
attrs = self._maybe_update_attributes(attrs)

left, right = self, other
if reversed:
self, other = other, self
left, right = right, left

try:
# apply if we have an override
if step:
with np.errstate(all='ignore'):
rstep = step(self._step, other)
rstep = step(left._step, right)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-note: This is still not great; left may no longer be self, so this may raise an AttributeError. The existing method specifically catches AttributeError so I guess this is intentional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getattr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(left, '_step', left) and remove the AttributeError catching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(left, '_stop', left)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove the AttributeError catching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope. Until I look at this closely, I'm assuming that the author had a reason for doing it this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough


# we don't have a representable op
# so return a base index
if not is_integer(rstep) or not rstep:
raise ValueError

else:
rstep = self._step
rstep = left._step

with np.errstate(all='ignore'):
rstart = op(self._start, other)
rstop = op(self._stop, other)
rstart = op(left._start, right)
rstop = op(left._stop, right)

result = RangeIndex(rstart,
rstop,
Expand All @@ -627,18 +628,12 @@ def _evaluate_numeric_binop(self, other):

return result

except (ValueError, TypeError, AttributeError):
pass

# convert to Int64Index ops
if isinstance(self, RangeIndex):
self = self.values
if isinstance(other, RangeIndex):
other = other.values

with np.errstate(all='ignore'):
results = op(self, other)
return Index(results, **attrs)
except (ValueError, TypeError, AttributeError,
ZeroDivisionError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this raise ZeroDivisionError? isn't he point of the errstate to catch that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this raise ZeroDivisionError?

ATM pd.RangeIndex(0, 5) / 0 raises ZeroDivisionError.

isn't he point of the errstate to catch that?

I think errstate is to suppress warnings, but not really sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that seems wrong, but I guess can cover later

# Defer to Int64Index implementation
if reversed:
return op(other, self._int64index)
return op(self._int64index, other)

return _evaluate_numeric_binop

Expand Down
82 changes: 82 additions & 0 deletions pandas/core/missing.py
@@ -1,6 +1,7 @@
"""
Routines for filling missing data
"""
import operator

import numpy as np
from distutils.version import LooseVersion
Expand Down Expand Up @@ -645,6 +646,87 @@ def fill_zeros(result, x, y, name, fill):
return result


def mask_zero_div_zero(x, y, result, copy=False):
"""
Set results of 0 / 0 or 0 // 0 to np.nan, regardless of the dtypes
of the numerator or the denominator.

Parameters
----------
x : ndarray
y : ndarray
result : ndarray
copy : bool (default False)
Whether to always create a new array or try to fill in the existing
array if possible.

Returns
-------
filled_result : ndarray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics you have here are not explained well. you are mutating result in-place, then returning it. is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll flesh it out. The semantics were meant to follow fill_zeros as closely as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok some more expl would be helpful. filling result is ok, you just have to make this really clear.


Examples
--------
>>> x = np.array([1, 0, -1], dtype=np.int64)
>>> y = 0 # int 0; numpy behavior is different with float
>>> result = x / y
>>> result # raw numpy result does not fill division by zero
array([0, 0, 0])
>>> mask_zero_div_zero(x, y, result)
array([ inf, nan, -inf])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add to the list to consolidate this module (e.g. the existing fill_zeros)

"""
if is_scalar(y):
y = np.array(y)

zmask = y == 0
if zmask.any():
shape = result.shape

nan_mask = (zmask & (x == 0)).ravel()
neginf_mask = (zmask & (x < 0)).ravel()
posinf_mask = (zmask & (x > 0)).ravel()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments here

if nan_mask.any() or neginf_mask.any() or posinf_mask.any():
# Fill negative/0 with -inf, positive/0 with +inf, 0/0 with NaN
result = result.astype('float64', copy=copy).ravel()

np.putmask(result, nan_mask, np.nan)
np.putmask(result, posinf_mask, np.inf)
np.putmask(result, neginf_mask, -np.inf)

result = result.reshape(shape)

return result


def dispatch_missing(op, left, right, result):
"""
Fill nulls caused by division by zero, casting to a diffferent dtype
if necessary.

Parameters
----------
op : function (operator.add, operator.div, ...)
left : object (Index for non-reversed ops)
right : object (Index fof reversed ops)
result : ndarray

Returns
-------
result : ndarray
"""
opstr = '__{opname}__'.format(opname=op.__name__).replace('____', '__')
if op in [operator.truediv, operator.floordiv,
getattr(operator, 'div', None)]:
result = mask_zero_div_zero(left, right, result)
elif op is operator.mod:
result = fill_zeros(result, left, right, opstr, np.nan)
elif op is divmod:
res0 = mask_zero_div_zero(left, right, result[0])
res1 = fill_zeros(result[1], left, right, opstr, np.nan)
result = (res0, res1)
return result


def _interp_limit(invalid, fw_limit, bw_limit):
"""Get idx of values that won't be filled b/c they exceed the limits.

Expand Down
58 changes: 57 additions & 1 deletion pandas/tests/indexes/test_numeric.py
Expand Up @@ -3,7 +3,7 @@
import pytest

from datetime import datetime
from pandas.compat import range, PY3
from pandas.compat import range, PY3, long

import numpy as np

Expand All @@ -18,6 +18,16 @@
from pandas.tests.indexes.common import Base


# For testing division by (or of) zero for Series with length 5, this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be fixtures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you misunderstand what I want here

# gives several scalar-zeros and length-5 vector-zeros
 zeros = [box([0] * 5, dtype=dtype)
          for box in [pd.Index, np.array]
          for dtype in [np.int64, np.uint64, np.float64]]
 zeros.extend([np.array(0, dtype=dtype)
              for dtype in [np.int64, np.uint64, np.float64]])
 zeros.extend([0, 0.0, long(0)])

@pytest.mark.fixture(params=zeros)
def zero(request):
    return request.param

then remove the @pytest.mark.parmetrize on all the test functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment:

Made zeros a fixture in test_numeric as requested, but that broke tests in test_range b/c of pytest magic

AFAICT the problem is caused by the fixture being in the namespace for test_numeric but not for test_range, which has a class that inherits from the test_numeric.Numeric (which defines the relevant tests). I guess we could get around that by putting this in a shared conftest file, but all that accomplishes is making it harder for a reader to find out where things are defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so move it to the conftest, this is quite standard practice and avoids code duplication.

# gives several scalar-zeros and length-5 vector-zeros
zeros = [box([0] * 5, dtype=dtype)
for box in [pd.Index, np.array]
for dtype in [np.int64, np.uint64, np.float64]]
zeros.extend([np.array(0, dtype=dtype)
for dtype in [np.int64, np.uint64, np.float64]])
zeros.extend([0, 0.0, long(0)])


def full_like(array, value):
"""Compatibility for numpy<1.8.0
"""
Expand Down Expand Up @@ -157,6 +167,52 @@ def test_divmod_series(self):
for r, e in zip(result, expected):
tm.assert_series_equal(r, e)

@pytest.mark.parametrize('zero', zeros)
def test_div_zero(self, zero):
idx = self.create_index()

expected = Index([np.nan, np.inf, np.inf, np.inf, np.inf],
dtype=np.float64)
result = idx / zero
tm.assert_index_equal(result, expected)
ser_compat = Series(idx).astype('i8') / np.array(zero).astype('i8')
tm.assert_series_equal(ser_compat, Series(result))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a "check that this matches the Series behavior" check. Note that because the Series implementation does not yet get this right for all dtypes, we are casting the Series version to int64 and checking that it matches the one version that does consistently work. Once the Series methods are fixed, this casting can be removed.


@pytest.mark.parametrize('zero', zeros)
def test_floordiv_zero(self, zero):
idx = self.create_index()
expected = Index([np.nan, np.inf, np.inf, np.inf, np.inf],
dtype=np.float64)

result = idx // zero
tm.assert_index_equal(result, expected)
ser_compat = Series(idx).astype('i8') // np.array(zero).astype('i8')
tm.assert_series_equal(ser_compat, Series(result))

@pytest.mark.parametrize('zero', zeros)
def test_mod_zero(self, zero):
idx = self.create_index()

expected = Index([np.nan, np.nan, np.nan, np.nan, np.nan],
dtype=np.float64)
result = idx % zero
tm.assert_index_equal(result, expected)
ser_compat = Series(idx).astype('i8') % np.array(zero).astype('i8')
tm.assert_series_equal(ser_compat, Series(result))

@pytest.mark.parametrize('zero', zeros)
def test_divmod_zero(self, zero):
idx = self.create_index()

exleft = Index([np.nan, np.inf, np.inf, np.inf, np.inf],
dtype=np.float64)
exright = Index([np.nan, np.nan, np.nan, np.nan, np.nan],
dtype=np.float64)

result = divmod(idx, zero)
tm.assert_index_equal(result[0], exleft)
tm.assert_index_equal(result[1], exright)

def test_explicit_conversions(self):

# GH 8608
Expand Down