Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow date_group(bound = "upper") or similar #232

Closed
pnacht opened this issue May 19, 2021 · 4 comments · Fixed by #242
Closed

Allow date_group(bound = "upper") or similar #232

pnacht opened this issue May 19, 2021 · 4 comments · Fixed by #242
Labels
feature a feature request or enhancement

Comments

@pnacht
Copy link

pnacht commented May 19, 2021

Is there a fundamental reason for date_round and its siblings forbidding precision = "month"?

This is likely my lack of imagination, but I can't see when rounding a date would be ambiguous. This is especially true for Date types which don't have to worry about entering/leaving summer time or what have you.

I'd absolutely understand if its a matter of implementation, given months' variable durations and leap years, etc.

@DavisVaughan
Copy link
Member

This is a good question. It has to do with how the underlying types in clock work.

The TLDR is that date_round() and friends are good for rounding to a multiple of a specified precision, up to day. What you are asking for is something more like date_group(), which is for grouping by a single component, such as "month of the year".

date_round() and friends use time points under the hood, and these are implemented as a number of ticks since a certain epoch/start date. The floor/ceiling/round functions operate directly on these ticks to generate the intervals used to bucket x.

As you mentioned, you can't meaningfully generate equally spaced buckets to round up to month/year, since they are variable.

library(clock)

x <- date_build(1970, 1:3, 15:17)
x
#> [1] "1970-01-15" "1970-02-16" "1970-03-17"

# A type of time point
nt <- as_naive_time(x)
nt
#> <time_point<naive><day>[3]>
#> [1] "1970-01-15" "1970-02-16" "1970-03-17"

# Stored as an integer number of days since 1970-01-01
unclass(nt)
#> $ticks
#> [1] 14 46 75
#> 
#> attr(,"precision")
#> [1] 4
#> attr(,"clock")
#> [1] 1

# Mathematical floor on integer ticks that generates buckets of
# [0, 5), [5, 10), [10, 15), ...
# and chooses the lower bound
time_point_floor(nt, "day", n = 5)
#> <time_point<naive><day>[3]>
#> [1] "1970-01-11" "1970-02-15" "1970-03-17"

The other option that you can use is date_group(). This operates on calendars under the hood, specifically the year_month_day calendar type. Calendars are implemented as field types, in other words they hold the year, month, and day components in separate fields. When you "group" by month of the year, this first drops all information about any component more precise than month, and then operates only on the month component to generate the buckets. When working with just calendars, the returned type is at month precision since we dropped all the day information. With the Date version, we have to return something for the day, and since the left hand side of the interval is chosen, we return the first day of the month.

library(clock)

x <- date_build(1970, 1:3, 15:17)
x
#> [1] "1970-01-15" "1970-02-16" "1970-03-17"

# Group by 2 months
date_group(x, "month", n = 2)
#> [1] "1970-01-01" "1970-01-01" "1970-03-01"

ymd <- as_year_month_day(x)
ymd
#> <year_month_day<day>[3]>
#> [1] "1970-01-15" "1970-02-16" "1970-03-17"

# Notice that this stores the components separately
unclass(ymd)
#> $year
#> [1] 1970 1970 1970
#> 
#> $month
#> [1] 1 2 3
#> 
#> $day
#> [1] 15 16 17
#> 
#> attr(,"precision")
#> [1] 4

# Drops day completely, then looks only at the month component
# and generates buckets of:
# [1970-1, 1970-2], [1970-3, 1970-4], ... [1970-11, 1970-12]
# and chooses LHS
calendar_group(ymd, "month", n = 2L)
#> <year_month_day<month>[3]>
#> [1] "1970-01" "1970-01" "1970-03"

# Since we have to convert back to Date, this widens to day precision,
# using the first day of the month
calendar_widen(calendar_group(ymd, "month", n = 2L), "day")
#> <year_month_day<day>[3]>
#> [1] "1970-01-01" "1970-01-01" "1970-03-01"

@pnacht
Copy link
Author

pnacht commented May 20, 2021

I see. In this case it would seem useful to add an additional parameter to calendar_group.Date and calendar_widen similar to zoo::as.Date's frac argument.

zoo has types such as yearmon which, as the name suggests, converts Dates to just the year and month.

zoo::as.yearmon("2020-5")
# > [1] "May 2020"

These types then have their own definition of as.Date, which includes an additional argument frac, which can be in the [0, 1] range and describes where along the period the Date should be generated:

zoo::as.Date(zoo::as.yearmon("2020-5"))  # default frac = 0
# > [1] "2020-05-01"
zoo::as.Date(zoo::as.yearmon("2020-5"), frac = 1)
# > [1] "2020-05-31"
zoo::as.Date(zoo::as.yearmon("2020-5"), frac = 0.5)
# > [1] "2020-05-16"

It just seems unintuitive (if understandable) that the user can't easily get the date at the end of the month (or year, for that matter). As described, there is effectively a date_floor(precision = "month/year") in the form of date_group(), which just makes the lack of an equivalent for the opposite end a bit odd.

@DavisVaughan
Copy link
Member

DavisVaughan commented May 20, 2021

As I was writing out the second example, I did think that it might be possible for date_group() to return the right hand side of the intervals. So something like this would return:

x <- date_build(1970, 1:3, 15:17)
x
#> [1] "1970-01-15" "1970-02-16" "1970-03-17"

# Group by 2 months
date_group(x, "month", n = 2, bound = "upper")
#> [1] "1970-02-28" "1970-02-28" "1970-04-30"

Also keep in mind that with something like n = 5, the 12 months of the year get divided into non-equal buckets (grouping never crosses the boundary of the next component up, in this case it never crosses the year boundary). Right now the lower bounds are always spaced out equally, but if the upper bound was returned they might not be. This isn't necessarily a reason not to do this.

library(clock)

x <- date_parse(c("2019-02-04", "2019-08-02", "2019-11-01"))

# [1, 5], [6, 10], [11, 12]
date_group(x, "month", n = 5)
#> [1] "2019-01-01" "2019-06-01" "2019-11-01"

date_group(x, "month", n = 5, bound = "upper")
#> [1] "2019-05-31" "2019-10-31" "2019-12-31"

I'm not really sure when this would be all that useful though.


It would require calendar_group() to be able to return the RHS of the interval it creates (this would be a little tricky for grouping day of the month). It would also require calendar_widen() to be able to widen using the end of each component rather than the start. I don't think any of this is inconsistent with the design of calendar types, I just need a compelling example of where this would be useful (maybe arguing that it is for "completeness" is enough).

@pnacht
Copy link
Author

pnacht commented May 20, 2021

I'm not really sure when this would be all that useful though.

Well, it'd be useful in the same circumstances that date_floor/ceiling are used for smaller intervals. In my line of work, I often need to get data at the start and end of a series of months. Therefore I need a means of generating such dates. In fact, that's why I posted this issue in the first place. I've been using either the zoo::as.Date(frac = 1) method described above or lubridate::ceiling_date() - 1 for this, but I quite appreciate the clock package and it would be nice to have it as my sole "date-management" package.

(this would be a little tricky for grouping day of the month)

I agree. Generating end-of-month dates and then grouping by them would be messy. But that seems more an issue with the design choice of leaving this to date_group() (who's name does indeed indicate it's meant for grouping, not date generation) and not to date_floor/ceiling().

@DavisVaughan DavisVaughan changed the title Why don't date_round/floor/ceiling allow "month" precision? Allow date_group(bound = "upper") or similar May 20, 2021
@DavisVaughan DavisVaughan added the feature a feature request or enhancement label May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants