Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Timedelta cannot span more than 293 years => implementation limitation #35687

Closed
Tracked by #46587
xuancong84 opened this issue Aug 12, 2020 · 10 comments
Closed
Tracked by #46587
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type

Comments

@xuancong84
Copy link

xuancong84 commented Aug 12, 2020

The Pandas datetime arithmetic module is indeed very useful and it can be used for archeological research studies as well. Unfortunately, the limitation that Timedelta object cannot span more than 293 years puts a huge shame on this wonderful piece of data science library. With this limitation, it is not possible to go back for thousands or even more than 300 years in history.

>>> pd.to_timedelta('1Y')*5
Timedelta('1826 days 05:06:00')
>>> pd.to_timedelta('1Y')*292
Timedelta('106650 days 19:26:24')
>>> pd.to_timedelta('1Y')*293
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1347, in pandas._libs.tslibs.timedeltas.Timedelta.__mul__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1230, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 180, in pandas._libs.tslibs.timedeltas.convert_to_timedelta64
  File "pandas/_libs/tslibs/timedeltas.pyx", line 308, in pandas._libs.tslibs.timedeltas.cast_from_unit
OverflowError: Python int too large to convert to C long

In terms of practicality, time resolution and time span are always contradicting with each other. Nowadays, quantum physics often deals with time objects at a scale of nano (10**-9), pico (10**-12), or even femto(10**-15)-second. While archeology often deals with time objects at a span of thousands, millions, or even billions of years. If I remember correctly, pandas set the time counter base unit at nano-seconds, thus, the span will be short. The solution to cater for both high resolution and large span is to use floating point rather than a large integer, as the time counter base unit. The speed will be slightly slower for floating point. But if you take a look at Intel Architecture, on modern CPU, floating point arithmetic is almost the same as integer especially when SIMD is used.

@xuancong84 xuancong84 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 12, 2020
@xuancong84
Copy link
Author

Criticism is good, but the last sentence goes beyond fair criticism imho... So may I ask you in return, why you haven't thought about that earlier, considering how foreseeing you are?

I have removed criticism and proposed a solution. Please take a look!

@jreback
Copy link
Contributor

jreback commented Aug 12, 2020

timedeltas are backed by an int64 which give a reasonable tradeoff between resolution (ns) and time range. This is the same issue w.r.t. Timestamp ranges as well where there have been many discussions. (search the repo).

If someone wants to implement a timedelta64[ms] extension array (and/or [s]) then this limitation can be avoided. This is a non-trivial project and would require some effort.

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

@jreback jreback added Timedelta Timedelta data type Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 12, 2020
@jreback jreback changed the title BUG: timedelta cannot span more than 293 years, Python int too large to convert to C long!!! ENH: timedelta implement limitations Aug 12, 2020
@xuancong84
Copy link
Author

xuancong84 commented Aug 13, 2020

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

I was doing automated carbon dating at some archeological site, for every item the machine has collected, the computer computes its history by measuring Carbon-13 isotope, so the range can go from a few years to a few thousand years back from now ("now" is different for different runs). Most people in the same field are still using MS-Excel to do manual computation. Honestly speaking, I do not mind that timedelta has a small span, but a span of only 292 years is really too small for many scientific works.

@xuancong84 xuancong84 changed the title ENH: timedelta implement limitations ENH: Timedelta cannot span more than 293 years => implementation limitation Aug 13, 2020
@bashtage
Copy link
Contributor

IMO a long-term solution would be to move to a default 128bit (or 96 bit) type which could then have both very precise resolution and a long time span. This is likely even more challenging than that @jreback suggested. FWIW this is not unprecedented as MATLAB has gone down this route.

@bashtage
Copy link
Contributor

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

Astronomy, archeology, and geology are three places where the natural limit will be regularly hit.

@venaturum
Copy link
Contributor

venaturum commented Feb 22, 2021

I also run into this issue for my package staircase which provides functionality for working with mathematical step functions.

An example application is representing the number of "active objects" over time, eg buses, website users etc with a step function.
Integrating the step function (to find the area underneath) gives the total active time, eg total bus hours, total time website viewed, and the answer is a timedelta. Due to the nature of the integration it is possible to reach large timedeltas. Eg if we wanted to obtain the total active time, over a year, then if the average value of the function, i.e. the average number of active objects, is larger than 293, then the result is >= 365 days * 293 = 106945 days, which is beyond the current timedelta limit.

@jbrockmendel jbrockmendel added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Jan 17, 2022
@jbrockmendel
Copy link
Member

In 2.0 we support non-nanosecond Timedeltas, though depending on how you call the constructor you may still need to explicitly cast to the lower resolution:

>>> pd.to_timedelta("365D").as_unit("s") * 10**6
Timedelta('365000000 days 00:00:00')

@xuancong84 does this handle your use case?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Mar 24, 2023
@xuancong84
Copy link
Author

xuancong84 commented Mar 25, 2023

@jbrockmendel Yes, this handles the case. However, it is too mechanical and manual. A more elegant implementation should be able to automatically switch unit among ns/us/ms/s depending on the time scale.

That is why a high-precision floating point implementation is recommended rather than a large integer. This is because the IEEE floating point mechanism is designed to handle such dynamic scaling problem. If you don't use floating point, then you have to handle it manually which is troublesome and bug-borne. Of course, the drawback is that when you are adding two durations that are out of the precision scale, e.g., adding 1 nanosecond to 1000 years, then the addition will underflow, resulting in no increment. But such a situation is rarely the case, can someone think of a case when you need to add a very small duration (such as ns) to a very large duration (>1000 years)?

In conclusion, the most elegant solution in my opinion is to switch between floating point timedelta and large integer timedelta instead of switching among fixed units such as ns/us/ms/s. Otherwise, what if you want to scale to trillions of years, is a unit of 1 second enough? What if you want to deal with scales at femto-second level, can you aggregate even to a few minutes??

@jbrockmendel
Copy link
Member

You're welcome to implement a float-based datetime dtype. I don't expect pandas to implement one internally. Closing as complete.

@bashtage
Copy link
Contributor

I would add that with the changes made available in 2 it should be relatively straight forward to write and extension type that could support ns resolution plus a wider range as a stand along package.

I agree that acceptance into pandas would be a long shot. This said, the only way forward for something like this would be to have it as a stand alone that could be shown to be very popular, and to have it mature outside pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

5 participants