New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index division by zero not filled #19322

Open
jbrockmendel opened this Issue Jan 20, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@jbrockmendel
Member

jbrockmendel commented Jan 20, 2018


idx = pd.Index(range(5))
>>> idx
Int64Index([0, 1, 2, 3, 4], dtype='int64')

>>> idx / 0
Int64Index([0, 0, 0, 0, 0], dtype='int64')

>>> idx // 0
Int64Index([0, 0, 0, 0, 0], dtype='int64')

>>> idx % 0
Int64Index([0, 0, 0, 0, 0], dtype='int64')

>>> divmod(idx, 0)
(Int64Index([0, 0, 0, 0, 0], dtype='int64'), Int64Index([0, 0, 0, 0, 0], dtype='int64'))

>>> idx.astype('uint64') / 0
UInt64Index([0, 0, 0, 0, 0], dtype='uint64')

Others are OK:

>>> idx.__truediv__(0)
Float64Index([nan, inf, inf, inf, inf], dtype='float64')

>>> idx.astype('float64') / 0
Float64Index([nan, inf, inf, inf, inf], dtype='float64')
@jreback

This comment has been minimized.

Contributor

jreback commented Jan 20, 2018

this is what numpy does though i think we changed for Series a long time ago

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 20, 2018

The test from #9308 for Series division is now in tests.series.test_operators. Splitting it into separate tests for Series.__div__, Series.__rdiv__, Series.__floordiv__ and parameterizing it with different dtypes and different zero-like other args, the behavior is inconsistent.

# For testing division by (or of) zero for Series with length 3, this
# gives several scalar-zeros and length-3 vector-zeros
zeros = [box([0, 0, 0], dtype=dtype)
         for box in [pd.Series, pd.Index, np.array]
         for dtype in [np.int64, np.uint64, np.float64]]
zeros.extend([np.array(0, dtype=dtype)
             for dtype in [np.int64, np.uint64, np.float64]])
zeros.extend([0, 0.0, long(0)])

class TestSeriesArithmetic(object):
    @pytest.mark.parametrize('dtype', [np.int64, np.uint64, np.float64])
    @pytest.mark.parametrize('zero', zeros)
    def test_div_zero(self, dtype, zero):
        # GH#9144
        # dtype stability GH#19322
        ser = Series([-1, 0, 1]).astype(dtype)

        result = ser / zero
        expected = Series([-np.inf, np.nan, np.inf])
        assert_series_equal(result, expected)

15 of these fail, 38 for the analogous __floordiv__ test. Before I try to fix this: is there a convention we're following where 0 / 0, 1 / 0, and -1 / 0 may vary depending on the exact dtypes of the 1s and 0s? Or should these always be np.nan, np.inf, and -np.inf, respectively?

Update The tests with dtype==np.uint64 are wrong b/c of casting.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 20, 2018

there is an old issue and whatsnew about this see if u can find it
maybe version .11 or around there

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 20, 2018

0.12.0 has

  - Fix modulo and integer division on Series,DataFrames to act similarly to ``float`` dtypes to return
    ``np.nan`` or ``np.inf`` as appropriate (:issue:`3590`). This correct a numpy bug that treats ``integer``
    and ``float`` dtypes differently.

It looks like #3600 and #9308 are both about this, but unsigned ints appear to be missing form the tests, along with a bunch of the reverse ops. I can patch this and implement a thorough set of test cases, but need to confirm that all ints and all zeros are interchangeable for this purpose.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 20, 2018

this is tricky as numpy does different things for uint and int
they don’t convert to float

i think we should though - see what breaks

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Jan 21, 2018

Making some progress on this. It's liable to be a big diff, so my thought is to first write all the tests and xfail the ones that are currently wrong (allowing a reviewer to double-check my understanding of the desired behavior), then make reasonably-sized PRs to un-xfail those tests.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 21, 2018

yeah I think these should all match what we do in series now.

In [1]: idx = pd.Index(range(5))

In [2]: s = Series(idx)

In [3]: s / 0
Out[3]: 
0    NaN
1    inf
2    inf
3    inf
4    inf
dtype: float64

In [4]: idx / s
Out[4]: 
0    NaN
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64

In [5]: idx / 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-5-99f2096171d8> in <module>()
----> 1 idx / 0

~/pandas/pandas/core/indexes/range.py in _evaluate_numeric_binop(self, other)
    600                     if step:
    601                         with np.errstate(all='ignore'):
--> 602                             rstep = step(self._step, other)
    603 
    604                         # we don't have a representable op

ZeroDivisionError: division by zero

In [6]: idx // 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-6-1648d8a4476e> in <module>()
----> 1 idx // 0

~/pandas/pandas/core/indexes/range.py in __floordiv__(self, other)
    553         if is_integer(other):
    554             if (len(self) == 0 or
--> 555                     self._start % other == 0 and
    556                     self._step % other == 0):
    557                 start = self._start // other

ZeroDivisionError: integer division or modulo by zero

In [7]: s // 0
Out[7]: 
0    NaN
1    inf
2    inf
3    inf
4    inf
dtype: float64

yes xfailing things would be ok. Note since these are in ops, It think it would make sense to split out the giant tests_ops to sub-modules first :> (I know that's for Series), but same idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment