Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix raise of TypeError when subtracting timedelta array #22054

Merged
merged 1 commit into from
Sep 18, 2018

Conversation

illegalnumbers
Copy link
Contributor

closes #21980

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

# raise rathering than letting numpy return wrong answer
return NotImplemented
return op(self.to_timedelta64(), other)
try:
converted_other = other.astype('datetime64[ns]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting is wrong here, since it could be mixed, eg [Timestamp, Timedelta].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixed test I added passes but I'm guessing this case is why I'm getting problems in other tests? Is there another approach I should take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah there was a bug in my test. Still looking around but definitely curious on another approach. I was thinking I could iterate over the entire array and do piecemeal conversions but that also seems wrong.

@@ -929,7 +933,7 @@ cdef class _Timedelta(timedelta):
def nanoseconds(self):
"""
Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Want to add a line to lint.sh that checks for trailing white space in cython files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@illegalnumbers
Copy link
Contributor Author

Actually I'm a little confused now @jbrockmendel, checking on one of the failing tests that I have it is the one for test_ops_series_object (GH #13043), it seems like based on the functionality that I'm changing the expectation there should change? Should the dtype now actually be a <M8[ns] when you do an addition rather than a O since we do the conversion properly now?

@jbrockmendel
Copy link
Member

rather than a O since we do the conversion properly now?

The thing is that the line converted_other = other.astype('datetime64[ns]') doesn't belong at all. Among other things the case to test is other = np.array([pd.Timestamp.now(), pd.Timedelta('1D')]). That has an object-dtype, __radd__ and __rsub__ should both be valid, and the astype call would fail.

@illegalnumbers
Copy link
Contributor Author

Yea adding in the test for other = np.array([pd.Timestamp.now(), pd.Timedelta('1D')]) exposes the failure you mentioned. I can remove that but moving forward I'm not sure how to implement that since the O type object fails to apply the lambda since Numpy just errors on the application of the lambda.

@jbrockmendel
Copy link
Member

not sure how to implement that since the O type object fails to apply the lambda since Numpy just errors on the application of the lambda.

I'm not sure what lambda you're referring to, but I imagine this will end up looking something like:

if other.dtype.kind in ['m', 'M']:
     [do what it does now]
elif other.dtype.kind == 'O':
    return np.array([op(self, x) for x in other])
raise TypeError(...)

@illegalnumbers
Copy link
Contributor Author

illegalnumbers commented Jul 25, 2018

Oh dang nice! I was just working through something similar in a REPL. I'll push up a revision in the next few minutes. Thanks for the help! EDIT: The lambda I was referring to was op function that gets passed in.

@illegalnumbers
Copy link
Contributor Author

illegalnumbers commented Jul 26, 2018

Ok so I'm not sure why my build wouldn't output anything for the tests that did fail on TravisCI - @jbrockmendel is there a way I can retrigger travis? Is that known to be flakey? Or should I look into something else? I appreciate the help.

@jbrockmendel
Copy link
Member

Travis error looks unrelated. When this happens I usually find some typo somewhere to fix and make a dummy commit to force it to re-run.

ci/lint.sh Outdated
@@ -49,6 +49,11 @@ if [ "$LINT" ]; then
if [ $? -ne "0" ]; then
RET=1
fi

if [[ -n $(find **/*.pyx -type f -exec egrep -l " +$" {} \;) ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this print the list of offending files? Can you also include .pxd, .pxi, and .pxi.in? Thanks for stepping up on this, linting these files is really tough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea no worries! It won't display them as output I think just from this test but I can include the same find / grep dance under it or save it as a variable and echo it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I can include the same find / grep dance under it or save it as a variable and echo it.

Great. If something breaks, we definitely want to know where to look for it.

@@ -441,7 +441,7 @@ Datetimelike
Timedelta
^^^^^^^^^

-
- Fixed bug where array of timestamp and deltas raised a TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'Timedelta' (:issue:`21980`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at how quotation marks and backticks are used elsewhere In this file (iOS keyboard doesn’t make it easy to give an example directly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look!

@@ -1012,3 +1012,22 @@ def test_get_level_values_box(self):
index = MultiIndex(levels=levels, labels=labels)

assert isinstance(index.get_level_values(0)[0], Timestamp)

def test_diff_sub_timedelta(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests go in tests/scalar/timedelta/test_arithmetic.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move!

res = arr - pd.Timedelta('1D')
tm.assert_numpy_array_equal(res, exp)

def test_diff_sub_timedelta_mixed(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR probably also fixes addition right? If so, pls include a test. Possibly also for reversed ops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should and I was thinking about reversed this morning actually so I'll include both today.

@illegalnumbers
Copy link
Contributor Author

illegalnumbers commented Jul 26, 2018

@jbrockmendel hmm upon further investigation this AM it seems that when doing the reverse subtract ie

        arr = np.array([Timestamp('20130101 9:01'),
                        Timestamp('20121230 9:02')])
        exp = np.array([Timestamp('20121231 9:01'),
                        Timestamp('20121229 9:02')])
        res = pd.Timedelta('1D') - arr
        tm.assert_numpy_array_equal(res, exp)

I get TypeError: descriptor '__sub__' requires a 'datetime.datetime' object but received a 'Timedelta' in pandas/_libs/tslibs/timestamps.pyx:330. Still investigating but it seems like I might have to do a bigger change than I thought.

@jbrockmendel
Copy link
Member

I get TypeError: descriptor 'sub' requires a 'datetime.datetime' object but received a 'Timedelta' in pandas/_libs/tslibs/timestamps.pyx:330

The operation being run is Timedelta - Timestamp right? That should raise a TypeError, you're OK.

@illegalnumbers
Copy link
Contributor Author

Hmm actually, it appears that rsub doesn't work originally with Timedelta and Timestamps...maybe this is an existing issue?

>>> pd.Timedelta('1d') - Timestamp('20130101 9:01')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 332, in pandas._libs.tslibs.timestamps._Timestamp.__sub__
TypeError: descriptor '__sub__' requires a 'datetime.datetime' object but received a 'Timedelta'

@illegalnumbers
Copy link
Contributor Author

illegalnumbers commented Jul 26, 2018

Ok so in that case I'll just check that these raise appropriately then.

Thanks for all the help!

@pep8speaks
Copy link

pep8speaks commented Jul 26, 2018

Hello @illegalnumbers! Thanks for updating the PR.

Comment last updated on September 06, 2018 at 04:04 Hours UTC

@codecov
Copy link

codecov bot commented Jul 26, 2018

Codecov Report

Merging #22054 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22054      +/-   ##
==========================================
+ Coverage   92.04%   92.05%   +<.01%     
==========================================
  Files         169      169              
  Lines       50782    50782              
==========================================
+ Hits        46744    46745       +1     
+ Misses       4038     4037       -1
Flag Coverage Δ
#multiple 90.46% <ø> (ø) ⬆️
#single 42.26% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 97.04% <0%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46abe18...20c93d2. Read the comment docs.

@illegalnumbers illegalnumbers force-pushed the GH-21980 branch 2 times, most recently from 0208713 to bbf1a06 Compare July 26, 2018 20:42
@illegalnumbers
Copy link
Contributor Author

@jbrockmendel lemme know if there's anything else I need to do to get this guy merged! :)

@@ -441,7 +441,7 @@ Datetimelike
Timedelta
^^^^^^^^^

-
- Fixed bug where array of timestamp and deltas raised a TypeError: unsupported operand type(s) for -: ``numpy.ndarray`` and ``Timedelta`` (:issue:`21980`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp --> :class:Timestamp
double back ticks around TypeError
don't need the full error message

Can you clarify what "array of timestamp and deltas" means? e.g.

Bug where subtracting :class:`Timedelta` from an object-dtyped array would raise ``TypeError``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ci/lint.sh Outdated
then
RET=1
echo $trailing_space_pxd
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks. Any other ideas you have for linting cython files will be a big hit with the maintainers (separate PR(s))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I am not sure if I'll have much time in the next little bit after this gets merged but I'll do my best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna move this out, see other comments.

@@ -616,3 +690,35 @@ def test_rdivmod_invalid(self):

with pytest.raises(TypeError):
divmod(np.array([22, 24]), td)

def test_td_div_timedelta_timedeltalike_array(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a valid case here be object-dtyped array containing all Timedelta objects (or some mix of timedelta, np.timedelta64)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good call to do a mix.

Copy link
Contributor Author

@illegalnumbers illegalnumbers Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (see below this test)

with pytest.raises(TypeError):
pd.Timedelta('1D') * arr

def test_td_rmult_timedelta_mixed_timedeltalike_array(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these you can probably de-duplicate using pytest.mark.parametrize. Not a deal-breaker.

If there's a graceful way to work "object_dtype" into the test names that'd be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a lot more effor to get this de-duped than I initially thought, would it be ok to submit as-is? I added in object_dtype to the method titles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls parameterize in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illegalnumbers the idea here is to notice that all four of these tests are special cases of a single test function. In particular, you can write arr * pd.Timedelta('1D') as operator.mul(arr, pd.Timedelta('1D') and similarly for the others with operator.truediv, ops.rdiv, ops.rmul. Then these can be parametrized as:

@pytest.mark.parametrize('op', [operator.mul, ops.rmul, operator.truediv, ops.rdiv])
def test_td_mul_object_array(self, op):
    arr = np.array([pd.Timestamp.now(), pd.Timedelta('1D')])

    with pytest.raises(TypeError):
        op(arr, pd.Timedelta('1D'))

@illegalnumbers
Copy link
Contributor Author

Gonna see if I can finish this up today!

@illegalnumbers
Copy link
Contributor Author

Took a little while extra but it should be ready for review again. Barring any lint failures of course.

@illegalnumbers
Copy link
Contributor Author

illegalnumbers commented Sep 5, 2018

I'm not sure if these are related to my ps? Kind of hard to read the build output.

EDIT: Seems like most are in the excel writer?

@jbrockmendel
Copy link
Member

EDIT: Seems like most are in the excel writer?

Yes. I think this is fixed now. If you rebase/push it should go through alright. I'll go over this one more time today, after which jreback will be called in for the final OK.

@illegalnumbers
Copy link
Contributor Author

Done! Like I said on the parameterize, I did the best I could without getting stuck. There were a few tests that seemed really similar but I kept getting exceptions when I tried to parameterize them so it seemed like more work than it was worth considering my experience.

@illegalnumbers
Copy link
Contributor Author

@jbrockmendel I think this should be good again!

@jorisvandenbossche jorisvandenbossche changed the title fix raise of TypeError when subtracting timedelta array BUG: fix raise of TypeError when subtracting timedelta array Sep 7, 2018
if other.dtype.kind in ['m', 'M']:
return op(self.to_timedelta64(), other)
elif other.dtype.kind == 'O':
return np.array([op(self, x) for x in other])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I am a bit confused here why simply returning NotImplemented is not sufficient ?
(I tested it, and that doesn't seem to work. Although with a datetime.timedelta it does, and that one does return NotImplemented ..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pd.Timedelta version fails because both arr.__add__(td) and td.__radd__(arr) return NotImplemented. arr.__add__(td.to_pytimedelta()) returns OK, so presumably it is something on the numpy implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was playing a bit with it, and I think Timedelta behaves differently as datetime.timedelta because of the __array_priority__ we add to the _Timedelta class (by commenting it out, the simple example works), which is needed to get other behaviors working I suppose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche ok otherwise this PR looks ok, are you suggesting that we remove that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused.

@jbrockmendel
Copy link
Member

The tests could be parametrized a bit further, but at some point that just becomes equivalent to re-writing the method this PR is implementing.

@jreback LGTM.

@illegalnumbers
Copy link
Contributor Author

I'm super happy to have contributed! This was fun. Sorry it took so long.

@jreback jreback added this to the 0.24.0 milestone Sep 8, 2018
@jreback jreback merged commit dbb767c into pandas-dev:master Sep 18, 2018
@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

thanks!

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
st-bender added a commit to st-bender/pynrlmsise00 that referenced this pull request Jul 3, 2020
The old pandas versions available for Py34 cannot subtract timedeltas
from ndarrays. Subtracting them individually works and was used in the
fix for later pandas versions:
pandas-dev/pandas#22054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.ndarray[object] - Timedelta raises
6 participants