Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timedelta * Series -> TypeError #8813

Closed
shoyer opened this issue Nov 14, 2014 · 17 comments · Fixed by #8884
Closed

BUG: Timedelta * Series -> TypeError #8813

shoyer opened this issue Nov 14, 2014 · 17 comments · Fixed by #8884
Labels
Bug Timedelta Timedelta data type
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Nov 14, 2014

I encountered this today when rerunning a script that worked prior to the 0.15 arrival of Timedelta:

In [2]: pd.to_timedelta(10, unit='D') * pd.Series(range(10))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-a5e3bc3be480> in <module>()
----> 1 pd.to_timedelta(10, unit='D') * pd.Series(range(10))

/Users/shoyer/dev/pandas/pandas/tslib.so in pandas.tslib.Timedelta.__mul__ (pandas/tslib.c:31355)()

TypeError: cannot multiply a Timedelta with <class 'pandas.core.series.Series'>

@jreback I think there might be another issue for this already?

@aimboden
Copy link

Note that reversing the order does work:

pd.Series(range(10))*pd.to_timedelta(10, unit='D')
Out[158]: 
0    0 days
1   10 days
2   20 days
3   30 days
4   40 days
5   50 days
6   60 days
7   70 days
8   80 days
9   90 days
dtype: timedelta64[ns]

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

I recall that their was no easy way to fix this
(though will look again)
because TimeDelta is a scalar and Python thinks for some reason doesn't call rmul

@shoyer
Copy link
Member Author

shoyer commented Nov 14, 2014

@jreback I can also take a look, I've written a lot of classes that do arithmetic at this point. I'm pretty sure this is fixable.

@jorisvandenbossche jorisvandenbossche added this to the 0.15.2 milestone Nov 14, 2014
@jorisvandenbossche jorisvandenbossche added the Timedelta Timedelta data type label Nov 14, 2014
@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

oh, this is a buggie. It is intercepted in cython. I don't allow any multiplication of a scalar timedelta with other objects (nothing makes sense). Except for the other way around (e.g. Series/Timedelta) if its an integer Series/Index it makes sense to allow multiple and does (so this is the bug).

I think division is right, but maybe check that too. E.g. you can only divide a Timedelta (object/series/index) by a Timedelta object (though I think allow integer division too).

@jreback jreback added the Bug label Nov 14, 2014
@immerrr
Copy link
Contributor

immerrr commented Nov 14, 2014

I don't allow any multiplication of a scalar timedelta with other objects (nothing makes sense)

This doesn't feel right.

I mean, set of possible multiplicand types is inherently open, you cannot realistically say that nothing in the infinite set of possibilities makes sense. Which is why python requires NotImplemented to be returned from binary operators so that the second operand has a shot at performing a reflected operation.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

sorry, misspoke, numpy array/index/series of integers works. (other return a TypeError). could certainly be NotImplemetedError though).

@immerrr not much actually makes sense. Maybe you have something that does that was missed?

@immerrr
Copy link
Contributor

immerrr commented Nov 14, 2014

First, it's not raise NotImplementedError, it's return NotImplemented: https://docs.python.org/2/reference/datamodel.html#object.__radd__. NotImplemented itself is documented poorly, but hopefully this will be fixed soon there's an issue that has landed recently that changed the doc to unambiguously state the following :

If one of those methods does not support the operation with the supplied arguments, it should return NotImplemented.

Restricting the set of operand types in python's world of duck typing doesn't make sense to me. For the very least think of all the possible types of ndarray-like containers that don't pass the isinstance(ndarray) check, e.g. scipy sparse matrices or xray.DataArray.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

@immerrr
of course restricting makes sense. If only to preserve user sanity.
You don't want to constantly gave to check for NotImplemented types. Its just counter-productive.

xray can feel free to handle any cases it wants. but 99% of users want to be informed that
pd.Timedelta(....)*pd.Timedelta(..) just does not make sense

@immerrr
Copy link
Contributor

immerrr commented Nov 14, 2014

You don't want to constantly gave to check for NotImplemented types. Its just counter-productive.

Could you rephrase that? I suppose you mean that you'll have to check for NotImplemented in user-level code, it's not true: python interpreter will raise a TypeError on its own if both a.__mul__(b) and b.__rmul__(a) return NotImplemented.

xray can feel free to handle any cases it wants. but 99% of users want to be informed that
pd.Timedelta(....)*pd.Timedelta(..) just does not make sense

It is exactly my point that it won't be able to do that. timedelta.__mul__ raising a TypeError preemptively won't give dataarray.__rmul__ a chance to handle the operation.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

ok, i'll rephrase. What I mean is for 99% of the users a TypeError is the expectation for an operation that is not 'right' and NOT NotImplementedError returned / raise, which means that it just has not been done. TypeError means this does not make sense (in pandas).

If xray wants to implement then they can do it (though I suspect since its much easier to simply push this upstream, e.g. to get @shoyer to convince us that its a valid operation and support it mainstream in pandas). E.g. Interval

this is much easier IMHO that e.g. pushing upstream to numpy (TL;DR) the other discussions that we have had.

@immerrr
Copy link
Contributor

immerrr commented Nov 14, 2014

ok, i'll rephrase. What I mean is for 99% of the users a TypeError is the expectation for an operation that is not 'right' and NOT NotImplementedError returned / raise, which means that it just has not been done. TypeError means this does not make sense (in pandas).

  1. There may be (probably, are already) many projects that implement their own ndarray-like containers.
  2. There's a non-formalized "convincing" operation which may or may not succeed or in which a certain project maintainer may or may not be interested.
  3. It's not NotImplementedError, it's a singleton object called NotImplemented, and it doesn't mean "it just has not been done", it means, according to the doc, that it "does not support the operation with the supplied arguments", for whatever reason. E.g. ints/datetimes do that:
In [32]: d = datetime.datetime.now()

In [33]: x = 1

In [34]: d.__add__(x)
Out[34]: NotImplemented

In [35]: x.__add__(d)
Out[35]: NotImplemented

In [36]: d + x
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-aa9ad8fe26a5> in <module>()
----> 1 d + x

TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'int'

Surprisingly, though, strings do raise a TypeError. BUT they also call __radd__ if it's present on the other object:

In [37]: 'foobar'.__add__(x)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-6c317551e6f0> in <module>()
----> 1 'foobar'.__add__(x)

TypeError: cannot concatenate 'str' and 'int' objects

In [38]: class Foo(object):
   ....:     def __radd__(self, other):
   ....:         return 'xyzzy'
   ....:     

In [39]: 'foobar'.__add__(Foo())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-d607980902de> in <module>()
----> 1 'foobar'.__add__(Foo())

TypeError: cannot concatenate 'str' and 'Foo' objects

In [40]: 'foobar' + Foo()
Out[40]: 'xyzzy'

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

I guess i disagree with this spec a bit. pandas just tries to do the right thing and not raise too many TypeErrors/NotImplemented.

@immerrr
Copy link
Contributor

immerrr commented Nov 14, 2014

I'm trying to point out that python allows the other object to have a say in what's allowed or not, too. Raising TypeErrors doesn't.

You don't agree to play by that rules? Fine, I rest my case.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

@immerrr are we really disagreeing with whether to raise TypeError or return NotImplemented in an obvious case?

@shoyer
Copy link
Member Author

shoyer commented Nov 14, 2014

I can dive into this more later, but in brief, I agree with @immerrr that the appropriate thing to do with an invalid operation is always to always return NotImplemented rather than to explicitly raise TypeError. That said, Timedelta should use duck typing for dtype == int rather than isinstance checks.

On Fri, Nov 14, 2014 at 8:24 AM, jreback notifications@github.com wrote:

@immerrr are we really disagreeing with whether to raise TypeError or return NotImplemented in an obvious case?

Reply to this email directly or view it on GitHub:
#8813 (comment)

@jreback
Copy link
Contributor

jreback commented Nov 14, 2014

ok, will leave to @shoyer @immerrr for a pull-request then. I prefer more explicit checking here, othewise what's the point of a TypeError.

@immerrr
Copy link
Contributor

immerrr commented Nov 15, 2014

are we really disagreeing with whether to raise TypeError or return NotImplemented in an obvious case?

Kind of, yes :)

I'm trying to show that the obvious solution (allow series/index only) violates open-closed principle and may force you (us) to get back to this issue in future (to add support for potentially infinite number of types). This particular case is, of course, trivial to fix in either way. However the idea of allowing the second operand to handle the operation really applies to all of the binary operators and may be beneficial for other places in the code, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants