Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement truediv, rtruediv directly in TimedeltaArray; tests #23829

Merged
merged 23 commits into from Nov 29, 2018

Conversation

Projects
None yet
6 participants
@jbrockmendel
Copy link
Member

commented Nov 21, 2018

Follow-up to #23642, implements __rdiv__ in TimedeltaArray, plus missing tests.

After this will be a PR that does the same for floordiv+rfloordiv.

  • closes #22631
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Fixes:
tdi / np.timedelta64('NaT')
tdi / tdi (#22631)
tdi / object_dtyped
object_dtyped / tdi

@pep8speaks

This comment has been minimized.

Copy link

commented Nov 21, 2018

Hello @jbrockmendel! Thanks for updating the PR.

Comment last updated on November 29, 2018 at 02:49 Hours UTC
@codecov

This comment has been minimized.

Copy link

commented Nov 21, 2018

Codecov Report

Merging #23829 into master will decrease coverage by <.01%.
The diff coverage is 92.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23829      +/-   ##
==========================================
- Coverage   92.31%   92.31%   -0.01%     
==========================================
  Files         161      161              
  Lines       51513    51578      +65     
==========================================
+ Hits        47554    47613      +59     
- Misses       3959     3965       +6
Flag Coverage Δ
#multiple 90.71% <92.1%> (ø) ⬆️
#single 42.37% <13.15%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.32% <66.66%> (-0.17%) ⬇️
pandas/core/indexes/timedeltas.py 89.6% <86.66%> (+0.07%) ⬆️
pandas/core/arrays/timedeltas.py 95.95% <96.36%> (-0.31%) ⬇️
pandas/io/formats/printing.py 93.61% <0%> (+0.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 982c169...d72bf90. Read the comment docs.

rng = tm.box_expected(rng, box_with_array)
for obj in [mismatched, mismatched[:2]]:
# one shorter, one longer
for other in [obj, np.array(obj), pd.Index(obj)]:

This comment has been minimized.

Copy link
@gfyoung

gfyoung Nov 21, 2018

Member

Is it possible to parameterize this? Potentially via flags to indicate what transformation to perform on obj before testing for the error?

Definitely the outer-loop can be parameterized (just use [1, 2, 3, 4] and [1, 2]), unless there's a really strong reason for using mismatched and mismatched[:2].

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

It's possible, but I'm trying to push back a little bit against over-parametrization. The pytest setup cost is non-trivial


elif is_timedelta64_dtype(other):
# let numpy handle it
return self._data / other

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 21, 2018

Member

Do we need to ensure other is a np.ndarray and not TimedeltaArray here? (meaning, extract the numpy array out of the TimedeltaArray)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

No. self._data.__div__(other) would return NotImplemented if other were a TimedeltaArray. This PR includes a test that covers TDA/TDA.

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 21, 2018

Member

Yes, but that is another level of redirection, while we know this will happen and can directly do the correct thing here?
(I suppose is_timedelta64_dtype only passes through those two cases of ndarray or TimedeltaArray?)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

I suppose is_timedelta64_dtype only passes through those two cases of ndarray or TimedeltaArray?

Yes, since we exclude Series and Index at the start.

Yes, but that is another level of redirection, while we know this will happen and can directly do the correct thing here?

I guess we could replace other with getattr(other, "_data", other) (or if/when TimedeltaArray gets an __array__ method, just np.array(other), which would be prettier)

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Nov 21, 2018

Contributor

In the (hopefully not too distant future), TimedeltaArray will no longer be an Index. In this case we would want to explicitly grab the ._data out of it and proceed?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Nov 21, 2018

Contributor

And what's the return type here? Does this need to be wrapped in a a type(self) so that we return a TimedeltaArray?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

And what's the return type here?

float-dtyped ndarray

# let numpy handle it
return self._data / other

elif is_object_dtype(other):

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 21, 2018

Member

Do we need to allow this?
I would be fine with raising a TypeError here.

(I first wanted to say: can't we dispatch that to numpy, thinking that numpy object dtype would handle that, but they raise a TypeError)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

I can see why this would be first on the chopping block if we had to support fewer cases. Is there a compelling reason not to handle this case?

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 21, 2018

Member

You can also turn around the question :) Is there a compelling reason to do handle this case?

It's just an extra case to support. And eg, we could discuss whether this should return object dtype data or timedelta, as you are inferring now? Looking at Series behaviour with int64 and object integers, it actually returns object. For datetimes it now raises. So at least, our support is at the moment not very consistent.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

Is there a compelling reason to do handle this case?

Because a selling point of pandas is that things Just Work? Because the code and tests are already written, so the marginal cost is \approx zero?

our support is at the moment not very consistent

Fair enough. If a goal is to make things more consistent (which I'm +1 on BTW) then we're probably not going to go around and start breaking the places where it currently is supported.

This comment has been minimized.

Copy link
@jreback

jreback Nov 21, 2018

Contributor

i agree with @jbrockmendel here

Show resolved Hide resolved pandas/core/arrays/timedeltas.py Outdated
if isinstance(other, Index):
# TimedeltaArray defers, so we need to unwrap
oth = other._values
result = TimedeltaArray.__truediv__(self, oth)

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 21, 2018

Member

Can we do here something like result = self._values / oth instead of calling the dunder method? (I don't really know the difference in practice, but it looks a bit cleaner)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 21, 2018

Author Member

I'll double-check. Usually my reasoning for using the dunder methods is to preserve NotImplemented semantics.

@jreback jreback added this to the 0.24.0 milestone Nov 21, 2018

Show resolved Hide resolved pandas/core/arrays/timedeltas.py
Show resolved Hide resolved pandas/core/indexes/timedeltas.py Outdated

@jreback jreback removed this from the 0.24.0 milestone Nov 21, 2018


elif is_object_dtype(other):
result = [self[n] / other[n] for n in range(len(self))]
result = np.array(result)

This comment has been minimized.

Copy link
@jreback

jreback Nov 21, 2018

Contributor

actually this is really close to what soft_convert_objects does.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 22, 2018

Author Member

That isn't clear to me. soft_convert_objects doesn't call lib.infer_dtype or any analogue.

This comment has been minimized.

Copy link
@jreback

jreback Nov 23, 2018

Contributor

you are essentially re-implementing it. i would rather not do that.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 23, 2018

Author Member

I'm not clear on what you have in mind. Something like:

if lib.infer_dtype(result) == 'timedelta':
    result = soft_convert_objects(result, timedelta=True, coerce=False)
    return type(self)(result)
return result

?

This comment has been minimized.

Copy link
@jreback

jreback Nov 23, 2018

Contributor

yes

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

Fair enough, will change

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

changed. this ends up changing the behavior of the DataFrame test case, but that's largely driven by the fact that DataFrame([NaT]) gets inferred as datetime64[ns]

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

what is changed here? shouldn't is_object_type result in a TypeError or a NotImplemented?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 28, 2018

Author Member

There are two independent questions that have been asked about the object-dtype case:

  1. should we just raise TypeError instead or should we handle it so it Just Works (the latter being what this PR does)
  2. Given that we handle this case, do we try to infer the output dtpye or just return object dtype? This PR originally did the former, then changed to do the latter following discussion.

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

ok this is fine, you are returning object dtype (which is consistent with how we do for Series now)

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 22, 2018

Next round implementing floordiv, mod, divmod, and reversed ops is almost ready. Any preference between follow-up vs adding to this PR?

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 25, 2018

Any preference between follow-up vs adding to this PR?

How many more things do you need to follow-up on? And how straightforward are they to handle? Probably would lean towards merging this and follow-up later, but not sure whether some of the remaining comments are blockers...


elif is_object_dtype(other):
result = [self[n] / other[n] for n in range(len(self))]
result = np.array(result)

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

ok reversing here. i agree with @jorisvandenbossche we don't infer object dtypes in ops (only in the constructor), so this seems reasonable

In [2]: pd.Series([1,2, 3]) / pd.Series([1,1,1], dtype=object)
Out[2]: 
0    1
1    2
2    3
dtype: object
Show resolved Hide resolved pandas/core/arrays/timedeltas.py
# let numpy handle it
return other / self._data

elif is_object_dtype(other):

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

or can raise NotImplemented here? does that work?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

I think that might be fragile; might depend on having __array__ implemented. Either way, better to make it explicit than rely on numpy imlpementation

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

same comment as above

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

ok, can you add a comment here (and above), were we do the operation but do not infer the output type (just for posterity), otherwise this PR lgtm. ping on green.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2018

I think all comments have been addressed

@jorisvandenbossche
Copy link
Member

left a comment

One remaining question/comment about the testing of the last change for object dtype.

For the rest looks good to me!

result = tdser / vector.astype(object)
expected = [tdser[n] / vector[n] for n in range(len(tdser))]
expected = tm.box_expected(expected, xbox)

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 27, 2018

Member

Wouldn't this converted the expected list to an DatetimeArray, while the expected result is on object ndarray? (so it's not really testing that?)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

No. In the case where xbox is tm.to_array (the only case that could conceivably give a DatetimeArray), tm.to_array(any_list) returning np.array(that_list)

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 27, 2018

Member

Ah, OK, all a bit opaque .. (I checked what get_upcast_box and box_expected do, but not box_with_array :-))

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

Yah, I'm hoping to simplify some of it, and ideally even get rid of box_expected, but it'll be a while before thats feasible.

@jreback jreback added this to the 0.24.0 milestone Nov 28, 2018

@@ -1227,7 +1227,7 @@ Timedelta
- Bug in :class:`TimedeltaIndex` where adding a timezone-aware datetime scalar incorrectly returned a timezone-naive :class:`DatetimeIndex` (:issue:`23215`)
- Bug in :class:`TimedeltaIndex` where adding ``np.timedelta64('NaT')`` incorrectly returned an all-`NaT` :class:`DatetimeIndex` instead of an all-`NaT` :class:`TimedeltaIndex` (:issue:`23215`)
- Bug in :class:`Timedelta` and :func:`to_timedelta()` have inconsistencies in supported unit string (:issue:`21762`)

- Bug in :class:`TimedeltaIndex` division where dividing by another :class:`TimedeltaIndex` raised ``TypeError`` instead of returning a :class:`Float64Index` (:issue:`23829`)

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

i think you need the refernces issue number here. do you need to mention the other issue that is refernces in the top or PR?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 28, 2018

Author Member

will do


elif is_object_dtype(other):
result = [self[n] / other[n] for n in range(len(self))]
result = np.array(result)

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

what is changed here? shouldn't is_object_type result in a TypeError or a NotImplemented?

# let numpy handle it
return other / self._data

elif is_object_dtype(other):

This comment has been minimized.

Copy link
@jreback

jreback Nov 28, 2018

Contributor

same comment as above

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 28, 2018

Travis fail is unrelated flake8-rst problem

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

ping

@jreback jreback merged commit d887927 into pandas-dev:master Nov 29, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181129.12 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2018

thanks!

@jbrockmendel jbrockmendel deleted the jbrockmendel:gtests branch Nov 29, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 11, 2018

Is the difference in behavior between TimedeltaIndex / timedelta64('NaT') and TimedeltaIndex / pd.NaT deliberate?

In [14]: rng = pd.timedelta_range(start='1D', periods=10, freq='D')

In [15]: rng / np.timedelta64('NaT')
Out[15]: Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype='float64')

In [16]: rng / pd.NaT
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-0bf699fa1965> in <module>
----> 1 rng / pd.NaT

~/sandbox/pandas-alt/pandas/core/indexes/timedeltas.py in __truediv__(self, other)
    259             # TimedeltaArray defers, so we need to unwrap
    260             oth = other._values
--> 261         result = TimedeltaArray.__truediv__(self, oth)
    262         return wrap_arithmetic_op(self, other, result)
    263

~/sandbox/pandas-alt/pandas/core/arrays/timedeltas.py in __truediv__(self, other)
    364         elif lib.is_scalar(other):
    365             # assume it is numeric
--> 366             result = self._data / other
    367             freq = None
    368             if self.freq is not None:

TypeError: ufunc true_divide cannot use operands with types dtype('<m8[ns]') and dtype('O')
@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 11, 2018

Is the difference in behavior between TimedeltaIndex / timedelta64('NaT') and TimedeltaIndex / pd.NaT deliberate?

Yes. timedelta64("NaT") is unambiguously timedelta-like, whereas pd.NaT is both datetime-like and timedelta-like (and defaults to datetime-like).

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.