Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Truncated "Y" conversion with to_timedelta #14024

Closed
agartland opened this issue Aug 17, 2016 · 10 comments
Closed

BUG: Truncated "Y" conversion with to_timedelta #14024

agartland opened this issue Aug 17, 2016 · 10 comments
Labels
API Design Bug Deprecate Functionality to remove in pandas Timedelta Timedelta data type

Comments

@agartland
Copy link
Contributor

When converting a fractional year to a Timedelta object the float is truncated:

print pd.to_timedelta(1., unit='Y')
365 days 05:49:12

print pd.to_timedelta(1.5, unit='Y')
365 days 05:49:12

print pd.to_timedelta(0.5, unit='Y')
0 days 00:00:00

print pd.to_timedelta(0.9, unit='Y')
0 days 00:00:00

Using a float with the hour unit is OK:

print pd.to_timedelta(0.9, unit='H')
0 days 00:54:00

print pd.to_timedelta(1., unit='H')
0 days 01:00:00

Pandas version:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 21.2.1
Cython: 0.20.1
numpy: 1.11.0
scipy: 0.16.0
statsmodels: 0.7.0.dev-51faa1a
xarray: None
IPython: 4.2.0
sphinx: 1.2.2
patsy: 0.3.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.28.0
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Aug 17, 2016

So we pass to numpy for this conversion (then pandas converts to the proper units). It doesn't allow floats, so its converted to integer. So should either:

  • raise that this conversion is not allowed
  • do the conversion entirely in pandas (e.g. Y == 365 1/4 days I think), though numpy is more exact. This is a fishy calculation anyhow, as a Timedelta is a fixed unit (its not w/respect to a calendar). Someone would have to look at exactly what they are doing
  • remove support for these non-fixed conversions (e.g. Y, M)
In [1]: np.timedelta64(1,unit='Y')
Out[1]: numpy.timedelta64(1)

In [2]: np.timedelta64(1.5,unit='Y')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-9d5b43137138> in <module>()
----> 1 np.timedelta64(1.5,unit='Y')

ValueError: Could not convert object to NumPy timedelta

@jreback
Copy link
Contributor

jreback commented Aug 17, 2016

see around here. I think look into the 2nd point, see docs here and/or source code and see what we can do.

@jreback jreback added this to the Next Major Release milestone Aug 17, 2016
@jreback
Copy link
Contributor

jreback commented Aug 17, 2016

cc @jorisvandenbossche

@agartland
Copy link
Contributor Author

Thanks @jreback
I can see why conversion of a non-fixed unit is a little weird.
I would propose that pandas should maintain consistency with numpy which means raising a ValueError for non int arguments with units Y or M.
I could see other arguments though.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 17, 2016

I actually was of the impression we disallowed this, but apparantly, it's in to_datetime that such units are not allowed, but in to_timedelta they are.

In [10]: pd.to_datetime(1, unit='M')
...
ValueError: cannot cast unit M

I personally would remove support for those units (above 'D'), as it is rather garbage that is produced (if people expect that it gives exactly eg the month they want, but not any month is 30 days and 10 hours ...).

Not sure how much that would break code (how many people are relying on this behaviour?)

@jreback
Copy link
Contributor

jreback commented Aug 17, 2016

hmm ok let's deprecate it then?
and maybe raise on floats

@jorisvandenbossche
Copy link
Member

Yes, deprecating first is a good idea.
If we deprecate, I would also just deprecate the floats, it will be removed later on anyhow (and for other units floats work correctly)

@agartland
Copy link
Contributor Author

I think raising on floats for Y and M is important since the current version is quietly truncating values, which is potentially dangerous behavior.

@jreback jreback added the Deprecate Functionality to remove in pandas label Aug 18, 2016
@jreback
Copy link
Contributor

jreback commented Aug 18, 2016

ok let's deprecate Y, M. W (even though weeks are 7 days, easier to just have D as the highest values). For both floats & integers.

@jorisvandenbossche
Copy link
Member

Closing this in favor of #16344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Deprecate Functionality to remove in pandas Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

3 participants