Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_difference(), calendar_difference(), time_point_difference() #266

DavisVaughan opened this issue Nov 24, 2021 · 3 comments · Fixed by #271

date_difference(), calendar_difference(), time_point_difference() #266

DavisVaughan opened this issue Nov 24, 2021 · 3 comments · Fixed by #271
feature a feature request or enhancement


Copy link

DavisVaughan commented Nov 24, 2021

date_difference(x, ...)
date_difference.Date(x, ..., precision = y/q/m/w/d)
date_difference.POSIXt(x, ..., precision = y/q/m/w/d/H/M/S)

calendar_difference(x, ...)
calendar_difference.year_month_day(x, ..., precision = y/q/m)

time_point_difference(x, ...)
time_point_difference.year_month_day(x, ..., precision = w/d/H/M/S/subsecond

General idea is that these result in a data frame containing columns like year, month, ... down through the most precise component. If you were to add the components back to from in order of largest to smallest (does order matter?) bypassing intermediate invalid dates and time zone issues (by working with calendars and time points) then you'd end up with to

This would make it easy to compute someone's "age" in years as that is just date_difference(a, b)$year

Possibly first pass way to compute the months:

(date1.Year - date2.Year) * 12 + date1.Month - date2.Month + (date1.Day >= date2.Day ? 0 : -1)

Need to be very careful about the precisions that are allowed for calendar vs time point. I think ideally they don't overlap? i.e. day goes with time point.

The precision is used for the "starting" precision. i.e. precision = "month" would result in a data frame where the first column is the total number of whole months between the dates (although, this might be unnecessary cause you can always compute that from the year*12?)

Possibly the calendar and time point versions return data frames of duration columns, while the date version returns data frames of integer columns

Possibly allow from > to, resulting in negative components. But this may be hard to define. It may "fall out" if we can define the algebraic property of the result. i.e. from + (components) = to

Non-year-month-day calendars would be useful too. like determining the number of iso weeks between dates

Copy link
Member Author

DavisVaughan commented Nov 25, 2021

When returning a data frame with columns from year -> second, the transition from month to hour seems difficult (i.e. calendrical to chronological durations). I think the only valid way is to do what as.period(<interval>) does, which would be to:

  • convert to a calendar and compute the years and months
  • convert the shifted result straight to a naive time and convert the hours through seconds

Otherwise you'd have to take the shifted calendar, convert back to POSIXct/Date (accounting for nonexistent/ambiguous issues) then convert to sys time to compute the hms. I think this is probably wrong.

The proposed method above works but it means it would report 01:00:00 EST to 01:00:00 EDT as having a difference of 0.

Possibly more appealing would be for this function to always return a single numeric (integer?) vector containing the difference as specified at a particular precision.

i.e. for year/quarter/month we would compute the "whole number" of years, quarters, or months that have elapsed from the current date including the day. We would need a special calendar method for this. Since it would be using calendars, it would be in an implied "naive" time

And for hour/minute/second we would convert to sys-time and just use subtraction + rounding to compute the whole number of units between the two times. This means that 01:00:00 EST to 01:00:00 EDT would report the more intuitive time of 1 hour.

It could return duration objects for an easy way to add back to the from time to get "close" to the to time

I still think an equivalent to as.period(<interval>) would be useful to see a breakdown of the time difference in years/months/days, but maybe that can be a different function. date_difference_breakdown()?

Copy link
Member Author

DavisVaughan commented Nov 25, 2021

It is possible this is a special case of a more general problem that could be solved with clock_calendar_interval and clock_time_point_interval classes

i.e. finding the whole number of years between two dates could be:

calendar_interval(ymd(2019, 2, 2), ymd(2023, 2, 4)) %/% duration_years(1) # probably not /

This obviously also opens up %/% duration_years(2) for "whole number of 2 years"

I don't think we'd have any way to compute fractional components.

finding the whole number of hours:

time_point_interval(<nt>, <nt>) %/% duration_hours(1) # probably not /

Then you just make date_difference() and maybe calendar_difference() as a wrapper for these

A time point interval is likely just a combination of a starting time point and a duration in the corresponding units representing the difference. A calendar interval is more complicated and probably needs the start and end points.

For the time point case, we would need to define duration %/% duration, and that would handle most of the hard work for us. I have had reservations about that because it would return an integer and that can quickly get OOB of the integer range for nanosecond durations, but I think most use cases would be in the hour / minute range.

Interval operations:

  • Make an interval from existing dates (carefully manage missing values)
  • Get / Set start and end of interval
  • No comparisons of intervals
  • Yes equality proxy
  • Maybe ability to flip an interval?
  • Arithmetic using duration objects? (i.e. calendar_interval + duration_months, tp_interval + hours)
  • Mainly useful for integer division operations
    • No division by numerics
    • No multiplication
    • No non-integer division operations
    • Time point interval int divided by duration gives an integer
    • Calendar interval int divided by duration gives an integer

After dev vctrs is on CRAN:

  • Something like the non-equi join operations of "within", "overlaps", and "between"
    • Vectorized 1:1 variants
      • Does a date occur between the start and end of an interval? int_detect_between(date, interval) or maybe date_detect_between()
      • Is one interval within another interval? (With an argument for endpoint inclusivity) int_detect_within(x, y)
      • Does one interval overlap another interval? (With an argument for endpoint inclusivity) int_detect_overlap(x, y)
    • vec_locate_matches() like variants
      • Is a date between any of a set of intervals (and which one) int_locate_between_matches(needle, haystack) + %in% variant
      • Is one interval within any of a set of intervals (and which one) int_locate_within_matches() + %in% variant
      • Is one interval overlapping with any of a set of intervals (and which one) int_locate_overlap_matches() + %in% variant

Copy link
Member Author

DavisVaughan commented Nov 27, 2021

On second thought, a new class (classes) is a lot of work, and it is unclear what the date/posixct interval class would look like. More importantly, it seems like there are really only two places where intervals are fairly useful:

  • Integer division to get the whole number of units between the two dates (i.e. age of a person in years between their birthday and today)
  • Finding if a date lies in an interval (both vectorized and match-like)

So we could probably do this with a handful of functions instead, rather than having a full blown class for this.

calendar_difference(from, to, precision, ..., n = 1L)
time_point_difference(from, to, precision, ..., n = 1L)
date_difference(from, to, precision, ..., n = 1L)

# Age of a person in whole years
date_difference(<birthday>, date_today(), "year")
# Relatively simple, just `lower <= x & x <= upper` I think
# (I think these are simple enough that we don't need a helper)
date_detect_between(x, lower, upper, ..., inclusive = TRUE)
calendar_detect_between(x, lower, upper, ..., inclusive = TRUE)
time_point_detect_between(x, lower, upper, ..., inclusive = TRUE)

# like data.table inrange
# (makes sense to have a helper here because constructing the vec-matches data frame is annoying)
date_locate_range_match(needles, haystack_lower, haystack_upper, ..., inclusive = TRUE) # first overlap, integer positions vector
date_detect_range_match(needles, haystack_lower, haystack_upper, ..., inclusive = TRUE) # logical vector
# and calendar/time_point variants

# use vec-matches directly if all matches are needed?

# what about range overlapping/within another range? (in a match like way, because otherwise it can be done with binary ops)
# (maybe just give an example with vec-matches)
# (makes less sense to have a helper here because you'd specify lower/upper columns either way)
date_range_locate_range_match(needles_lower, needles_upper, haystack_lower, haystack_upper, ..., inclusive = TRUE)
date_range_detect_range_match(needles_lower, needles_upper, haystack_lower, haystack_upper, ..., inclusive = TRUE)

date_count_between(start, end, precision, ..., n = 1L)
calendar_count_between(start, end, precision, ..., n = 1L)
time_point_count_between(start, end, precision, ..., n = 1L)
date_locate_match_between(needles, haystack_start, haystack_end, ..., inclusive = TRUE)
calendar_locate_match_between(needles, haystack_start, haystack_end, ..., inclusive = TRUE)
calendar_locate_match_between(needles, haystack_start, haystack_end, ..., inclusive = TRUE)

# and maybe detect variants

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
feature a feature request or enhancement
None yet

Successfully merging a pull request may close this issue.

1 participant