Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dividing None Series with Timedelta fails with pandas 1.0.1 #31869

Closed
pquentin opened this issue Feb 11, 2020 · 10 comments
Closed

Dividing None Series with Timedelta fails with pandas 1.0.1 #31869

pquentin opened this issue Feb 11, 2020 · 10 comments
Assignees
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@pquentin
Copy link

Code Sample

import datetime
import pandas as pd

s1 = pd.Series([datetime.date(2020, 1, 1)])
s2 = pd.Series([None])
print((s1 - s2) / pd.Timedelta(days=1))

Problem description

With pandas 1.0.0, this was working fine and putting NaN in the Series.

With pandas 1.0.1, I get the following traceback:

Traceback (most recent call last):
  File "t.py", line 6, in <module>
    print((s1 - s2) / pd.Timedelta(days=1))
  File ".../lib/python3.6/site-packages/pandas/core/ops/common.py", line 64, in new_method
    return method(self, other)
  File ".../lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 500, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  File ".../lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 193, in arithmetic_op
    res_values = dispatch_to_extension_op(op, lvalues, rvalues)
  File ".../lib/python3.6/site-packages/pandas/core/ops/dispatch.py", line 125, in dispatch_to_extension_op
    res_values = op(left, right)
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1397, in pandas._libs.tslibs.timedeltas.Timedelta.__rtruediv__
numpy.core._exceptions.UFuncTypeError: ufunc 'true_divide' cannot use operands with types dtype('O') and dtype('<m8[ns]')

I run git bisect and the new behavior has been introduced in a8aff6c.

What is the correct behavior?

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Darwin
OS-release : 19.3.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.6.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Feb 11, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 11, 2020
@jorisvandenbossche
Copy link
Member

@pquentin Thanks for the report!

cc @jbrockmendel

@jbrockmendel
Copy link
Member

Thanks, I'll try to patch this in the next couple days.

@jbrockmendel jbrockmendel self-assigned this Feb 11, 2020
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Feb 11, 2020
@jbrockmendel
Copy link
Member

So there are a couple of tricky issues here.

  1. what we're getting for s1 - s2 is weird. datetime.date(2020, 1, 1) - None raises, so it seems sketchy that we're giving back Series([np.nan], dtype=object)
  2. s2 / pd.Timedelta(days=1) is wrong and should be fixed
  3. but np.nan / pd.Timedelta(days=1) should raise.

@pquentin is the problem with (s1 - s2) / Timedelta or s2 / Timedelta? The title suggests the latter.

@pquentin
Copy link
Author

@jbrockmendel Our code is doing (s1 - s2) / Timedelta just like in my sample code, but it's the division that appears to fail, hence the title. Does this answer your question?

We'd be happy to update our code if what we're doing should never have been supported in the first place, however.

@jbrockmendel
Copy link
Member

We'd be happy to update our code if what we're doing should never have been supported in the first place, however.

I'm working on a patch that will fix s2 / td but it won't fix (s1-s2)/td, which AFAICT is correct. In terms of updating on your end, the question is what you're trying to do with s1-s2, since it isnt obvious what that means.

@pquentin
Copy link
Author

Ah yes I understand your question now. So s1 contains start dates, s2 contains end dates, and we encode "no end date" using None. We can add a condition to use different code in that case.

Thanks!

@jbrockmendel
Copy link
Member

encode "no end date" using None

pandas support for datetime.date is limited; is using datetime.datetime viable?

@pquentin
Copy link
Author

Sure, but how would that help? Ultimately we want to compute the duration in days, here's the actual code:

df["duration_days"] = (df.end_date - df.start_date) / pd.Timedelta(days=1))

When end_date is None, duration_days should be None or NaN. I think we now agree that this does not make a lot of sense to support this in pandas, so we'll add a workaround on our side (not sure what it's going to be yet). Closing, thanks a lot.

@jbrockmendel
Copy link
Member

When end_date is None

If you use datetime objects, then pandas will case None to pd.NaT for Not-A-Time, i.e. a datetime analogue to NaN. From there, everything should Just Work.

@pquentin
Copy link
Author

pquentin commented Mar 6, 2020

Thanks! I ended up using pd.Timestamp, it works well 👍

duration = df.end_date.apply(pd.Timestamp) - df.start_date.apply(pd.Timestamp)
df["duration_days"] = duration / pd.Timedelta(days=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

3 participants