Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: The implied time zone of Date #203

Closed
DavisVaughan opened this issue Apr 6, 2021 · 1 comment · Fixed by #206
Closed

Discussion: The implied time zone of Date #203

DavisVaughan opened this issue Apr 6, 2021 · 1 comment · Fixed by #206
Labels
breaking change ☠️ API change likely to affect existing code

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Apr 6, 2021

Currently, clock assumes that Date is UTC, because it seemed like this is also what base R assumed. I have now learned it is not quite that simple, and base R is actually inconsistent, treating Date as UTC or naive depending on the conversion being done

  • Date -> POSIXct/lt, Date is treated as UTC (i.e. sys-time)
  • POSIXct/lt -> Date, Date is treated as naive-time
  • Sys.Date(), Date is treated as naive-time
  • lubridate makes all of the same assumptions as base R, just with more intuitive defaults on a case-by-case basis. I still think it could be improved.

References:

This begs the question, what is most useful for clock? A few places this comes up:

  • as.Date.clock_zoned_time and as_zoned_time.Date
  • Potential date_today(zone = ) (Computed the current zoned-time, then convert to a Date, must assume naive or UTC) (Implement date_now() and date_today() #197)
  • Potential date_cast(x) and date_time_cast(x, zone, ..., nonexistent = , ambiguous =) to be more consistent replacements for as.Date() and as.POSIXct()

It seems like it may be more intuitive if Date was treated as naive-time at all times.

# Date -> POSIXct
# - Date is treated as UTC
# - Resulting POSIXct uses the local time zone
# - No `tz` argument

x <- as.Date("2019-01-01")
x
#> [1] "2019-01-01"

# Date was treated as UTC, result shown in local time
as.POSIXct(x)
#> [1] "2018-12-31 19:00:00 EST"
clock::date_set_zone(as.POSIXct(x), "UTC")
#> [1] "2019-01-01 UTC"

as.POSIXct.Date
#> function (x, ...) 
#> .POSIXct(unclass(x) * 86400)
#> <bytecode: 0x7f8d54999538>
#> <environment: namespace:base>

# Date was treated as UTC, defaults to showing UTC time
# (same assumptions as R, just smarter defaults)
lubridate::as_datetime(x)
#> [1] "2019-01-01 UTC"
lubridate::as_datetime(x, tz = "America/New_York")
#> [1] "2018-12-31 19:00:00 EST"

# I'd argue that the above is probably still confusing,
# the user probably wanted to assume `x` was a naive-time
lubridate::force_tz(x, "America/New_York")
#> [1] "2019-01-01 EST"
# Date -> POSIXlt
# - Date is treated as UTC
# - Resulting POSIXlt uses UTC
# - No `tz` argument

x <- as.Date("2019-01-01")
x
#> [1] "2019-01-01"

as.POSIXlt(x)
#> [1] "2019-01-01 UTC"

as.POSIXlt.Date
#> function (x, ...) 
#> .Internal(Date2POSIXlt(x))
#> <bytecode: 0x7f8d14cbf2f8>
#> <environment: namespace:base>
# POSIXct -> Date
# - Date is treated as naive
# - The current printed time of the POSIXct is converted to match `tz` without
#   changing the underlying duration (defaulting `tz` to UTC)
# - The new printed time is converted to a Date using `as.Date.POSIXlt`
# - `tz = "UTC"` by default

x <- as.POSIXct("2019-01-01 23:00:00", tz = "America/Los_Angeles")
x
#> [1] "2019-01-01 23:00:00 PST"

# - `tz = "UTC"` by default
# - Printed time converted from America/Los_Angeles->UTC, then that is converted
#   to a Date, making the assumption that Date is a naive time
as.Date(x)
#> [1] "2019-01-02"

# it is like doing:
clock::date_set_zone(x, "UTC")
#> [1] "2019-01-02 07:00:00 UTC"
as.Date(clock::date_set_zone(x, "UTC"))
#> [1] "2019-01-02"

# This is generally what the user probably was expecting. 
# - Keep the printed time of the original POSIXct
# - Just drop the time part and return a Date
# - Let Date be a naive type
as.Date(x, tz = clock::date_zone(x))
#> [1] "2019-01-01"

as.Date.POSIXct
#> function (x, tz = "UTC", ...) 
#> {
#>     if (tz == "UTC") {
#>         z <- floor(unclass(x)/86400)
#>         attr(z, "tzone") <- NULL
#>         .Date(z)
#>     }
#>     else as.Date(as.POSIXlt(x, tz = tz))
#> }
#> <bytecode: 0x7f8d1497c248>
#> <environment: namespace:base>

# Uses `as.Date()` from above, but defaults to the time zone of `x`, as the
# user probably expected
# (same assumptions as R, just smarter defaults)
lubridate::as_date(x)
#> [1] "2019-01-01"
lubridate::as_date(x, tz = "UTC")
#> [1] "2019-01-02"

# I think that lubridate `as_date()` is right here, it:
# - Treats Date as naive-time
# - Uses the time zone of `x` by default when converting

# However, I don't think the `tz` arg should be here,
# as it is probably more confusing than helpful
# POSIXlt -> Date
# - Date is treated as naive
# - The printed time of `x` is retained, with the time being dropped
# - No `tz` argument

x <- as.POSIXlt("2019-01-01 23:00:00", tz = "America/Los_Angeles")
x
#> [1] "2019-01-01 23:00:00 PST"

as.Date(x)
#> [1] "2019-01-01"

as.Date.POSIXlt
#> function (x, ...) 
#> .Internal(POSIXlt2Date(x))
#> <bytecode: 0x7f8d14cbc4b0>
#> <environment: namespace:base>

Created on 2021-04-06 by the reprex package (v1.0.0)

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Apr 6, 2021

Implementation:

library(clock)

date_standardize <- function(x) {
  if (identical(typeof(x), "double")) {
    return(x)
  }
  
  # Convert somewhat rare integer Date to double.
  # Preserves names.
  storage.mode(x) <- "double"
  
  x
}

date_cast <- function(x) {
  UseMethod("date_cast")
}

date_cast.clock_time_point <- function(x) {
  as.Date(x)
}

date_cast.clock_zoned_time <- function(x) {
  # as.Date(x) if it worked with naive-time
  as.Date(as_naive_time(x))
}

date_cast.clock_calendar <- function(x) {
  as.Date(x)
}

# What as.Date.Date() should be
date_cast.Date <- function(x) {
  date_standardize(x)
}

# What as.Date.POSIXct() and as.Date.POSIXlt() should be
date_cast.POSIXt <- function(x) {
  date_cast(as_naive_time(x))
}

date_time_cast <- function(x, ...) {
  UseMethod("date_time_cast")
}

date_time_cast.clock_sys_time <- function(x, zone, ...) {
  ellipsis::check_dots_empty()
  as.POSIXct(x, tz = zone)
}

date_time_cast.clock_naive_time <- function(x, zone, ..., nonexistent = NULL, ambiguous = NULL) {
  ellipsis::check_dots_empty()
  as.POSIXct(x, tz = zone, nonexistent = nonexistent, ambiguous = ambiguous)
}

date_time_cast.clock_zoned_time <- function(x, ...) {
  ellipsis::check_dots_empty()
  as.POSIXct(x)
}

date_time_cast.clock_calendar <- function(x, zone, ..., nonexistent = NULL, ambiguous = NULL) {
  ellipsis::check_dots_empty()
  as.POSIXct(x, tz = zone, nonexistent = nonexistent, ambiguous = ambiguous)
}

# What as.POSIXct.Date() should be
date_time_cast.Date <- function(x, zone, ..., nonexistent = NULL, ambiguous = NULL) {
  date_time_cast(as_naive_time(x), zone, ..., nonexistent = nonexistent, ambiguous = ambiguous)
}

# What as.POSIXct.POSIXct() and as.POSIXct.POSIXlt() should be
date_time_cast.POSIXt <- function(x, ...) {
  ellipsis::check_dots_empty()
  x <- clock:::to_posixct(x)
  x
}

date_today <- function(zone) {
  # as.Date(zoned_time_now(zone)) if as.Date() works right
  date_cast(zoned_time_now(zone))
}
date_time_now <- function(zone) {
  as.POSIXct(zoned_time_now(zone))
}

Examples:

date_today("UTC")
#> [1] "2021-04-06"
date_today("America/New_York")
#> [1] "2021-04-06"

date_time_now("UTC")
#> [1] "2021-04-06 14:37:05 UTC"
date_time_now("America/New_York")
#> [1] "2021-04-06 10:37:05 EDT"

x <- as.POSIXct("2019-01-01 23:00:00", tz = "America/Los_Angeles")
date_cast(x)
#> [1] "2019-01-01"
x <- as.POSIXct("2019-01-01 23:00:00", tz = "UTC")
date_cast(x)
#> [1] "2019-01-01"

x <- as.Date("2019-01-01")
date_time_cast(x, "UTC")
#> [1] "2019-01-01 UTC"
date_time_cast(x, "America/New_York")
#> [1] "2019-01-01 EST"

# In Asia/Beirut, there was a DST gap from 23:59:59->01:00:00,
# skipping the 0 hour entirely
x <- as.Date("2021-03-28")
try(date_time_cast(x, "Asia/Beirut"))
#> Error : Nonexistent time due to daylight saving time at location 1. Resolve nonexistent time issues by specifying the `nonexistent` argument.
date_time_cast(x, "Asia/Beirut", nonexistent = "roll-forward")
#> [1] "2021-03-28 01:00:00 EEST"

# ^ same idea as
x <- as_naive_time(year_month_day(2021, 3, 28))
try(as_zoned_time(x, "Asia/Beirut"))
#> Error : Nonexistent time due to daylight saving time at location 1. Resolve nonexistent time issues by specifying the `nonexistent` argument.

Path forward:

  • Convert existing tooling to assume Date is naive
  • Add date_cast() and date_time_cast()
  • Add date_today() and date_time_now()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant