Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reliably convert git_time to date? #407

Closed
jdblischak opened this issue Nov 8, 2019 · 11 comments
Closed

How to reliably convert git_time to date? #407

jdblischak opened this issue Nov 8, 2019 · 11 comments

Comments

@jdblischak
Copy link
Contributor

@jdblischak jdblischak commented Nov 8, 2019

Given a git_commit object, I would like to be able to return the date of the commit as a character string. Here is my current approach:

> x <- commits(n = 1)[[1]]
> x$author$when
2019-11-08 15:39:58
> as.character(as.Date(as.POSIXct(x$author$when)))
[1] "2019-11-08"

However this occasionally sporadically fails, and the date is off by a whole day. I haven't been able to track down exactly when it happens, but I think it is related to the current time in the given timezone.

As an example, the recent release 1.5.0 of my workflowr package failed the CRAN check on r-devel-windows after it was submitted:

> test_check("workflowr")
  -- 1. Failure: Conversion of git_time to character string of date is correct (@t
  as.character(Sys.Date()) not identical to as.character(as.Date(as.POSIXct(c1$author$when))).
  1/1 mismatches
  x[1]: "2019-11-06"
  y[1]: "2019-11-05"

I also record the time and timezone during each check, so I was expecting it to return "2019-11-06".

< > Sys.time()
< [1] "2019-11-06 00:56:50 CET"
< > Sys.timezone()
< [1] "Europe/Berlin"

This is hard to reproduce. I used devtools::check_win_devel() to test the package with r-devel on a CRAN Windows machine. This time the test passed (link to log - note: this will expire in a few days). The timezone is the same, but the time of day changed.

< > Sys.time()
< [1] "2019-11-07 23:33:16 CET"
< > Sys.timezone()
< [1] "Europe/Berlin"

How can I obtain the current date for a Git commit that isn't affected by the time of day?

Potentially related Issues/PRs: #276, #384, #392, #393

@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 13, 2019

Another data point: My workflowr package was checked again on the CRAN server r-devel-windows-ix86+x86_64, and this time it passed.

< > Sys.time()
< [1] "2019-11-12 04:49:51 CET"
< > Sys.timezone()
< [1] "Europe/Berlin"

https://www.r-project.org/nosvn/R.check/r-devel-windows-ix86+x86_64/workflowr-00check.html

To summarize so far:

  • Passed: 04:49:51 CET, 23:33:16 CET
  • Failed: 00:56:50 CET
@stewid
Copy link
Member

@stewid stewid commented Nov 13, 2019

I'm not sure why it fails. Below is some code I've used to explore the timestamp in the commit and conversion to a character to try to understand why it fails.

This is the description of the timestamp format from https://git-scm.com/docs/git-log

--date=raw shows the date as seconds since the epoch (1970-01-01 00:00:00 UTC), followed by a space, and then the timezone as an offset from UTC (a + or - with four digits; the first two are hours, and the second two are minutes). I.e., as if the timestamp were formatted with strftime("%s %z")). Note that the -local option does not affect the seconds-since-epoch value (which is always measured in UTC), but does switch the accompanying timezone value.

library(git2r)

## Read a commit without using git2r
read_git_commit <- function(filename) {
  # Read compressed data
  n <- file.info(filename)$size
  data <- readBin(filename, what = "raw", n = n)
  data <- memDecompress(data, "gzip")

  # Find "\0" in data; separates header from content
  null_byte <- which(data == 0)

  # Determine type of git object
  header <- rawToChar(data[1:(null_byte-1)])

  # Read content
  i <- seq(from = null_byte + 1, to = length(data))
  content <- readLines(textConnection(rawToChar(data[i])))

  c(header, content)
}

## Create a directory in tempdir
path <- tempfile(pattern = "git2r-")
dir.create(path)

## Initialize a repository
repo <- init(path)
config(repo, user.name = "Alice", user.email = "alice@example.org")

## Current time and timezone.
Sys.time()
#> [1] "2019-11-13 23:04:04 CET"
Sys.timezone()
#> [1] "Europe/Stockholm"

## Create a file and commit it.
writeLines("Hello world!", file.path(path, "test.txt"))
add(repo, "test.txt")
commit(repo, "Commit message")
#> [4c7d388] 2019-11-13: Commit message

## Read the content of the commit.
sha1 <- sha(last_commit(path))
filename <- file.path(path, ".git/objects", substr(sha1, 1, 2), substr(sha1, 3, 40))
read_git_commit(filename)
#> [1] "commit 164"                                          
#> [2] "tree a0b0b9e615e9e433eb5f11859e9feac4564c58c5"       
#> [3] "author Alice <alice@example.org> 1573682644 +0100"   
#> [4] "committer Alice <alice@example.org> 1573682644 +0100"
#> [5] ""                                                    
#> [6] "Commit message"

## Inspect time using git2r
(time <- last_commit(path)$author$when$time)
#> [1] 1573682644
(offset <- last_commit(path)$author$when$offset)
#> [1] 60
as.POSIXct(time, origin = "1970-01-01", tz = "UTC")
#> [1] "2019-11-13 22:04:04 UTC"

Created on 2019-11-13 by the reprex package (v0.3.0)

Internally in git2r, tz = GMT is used instead of tz = UTC. Could this be the cause of this problem?

@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 14, 2019

@stewid Thanks for investigating.

Internally in git2r, tz = GMT is used instead of tz = UTC. Could this be the cause of this problem?

My understanding is that GMT and UTC are synonymous. From ?timezones:

Names "UTC" and its synonym "GMT" are accepted on all platforms.

@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 14, 2019

I've set up a CI build to run some tests every hour to see if I can reproducibly trigger the error.

@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 15, 2019

My 24 hour test completed. Using the timezone "America/New_York", the results were:

  • 12:00 - 18:00: git2r returned correct date
  • 19:00 - 23:00: git2r returned wrong date
  • 00:00 - 12:00: git2r returned correct date
@stewid
Copy link
Member

@stewid stewid commented Nov 15, 2019

Does it work if you change line 19 in your test script from
date_git2r <- as.character(as.Date(as.POSIXct(c1$author$when))
to
date_git2r <- as.character(as.Date(as.POSIXct(c1$author$when$time, origin = "1970-01-01", tz = "UTC"))
?

@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 15, 2019

OK. I think I figured it out. The date begins failing the last 5 hours of the day in the timezone America/New_York because GMT is 5 hours ahead.

> Sys.timezone()
[1] "America/New_York"
> Sys.time()
[1] "2019-11-15 12:47:49 EST"
> as.POSIXct(c1$author$when)
[1] "2019-11-15 17:47:39 GMT"

git2r:::as.POSIXct.git_time hard codes "GMT":

> git2r:::as.POSIXct.git_time
function (x, ...) 
{
    as.POSIXct(x$time, origin = "1970-01-01", tz = "GMT")
}
<bytecode: 0x561c0a0aab80>
<environment: namespace:git2r>

And the default for as.Date.POSIXct is "UTC" (ie the same):

> as.Date.POSIXct
function (x, tz = "UTC", ...) 
{
    if (tz == "UTC") {
        z <- floor(unclass(x)/86400)
        attr(z, "tzone") <- NULL
        .Date(z)
    }
    else as.Date(as.POSIXlt(x, tz = tz))
}
<bytecode: 0x561c0a0d5a10>
<environment: namespace:base>

Thus if I want the day in the local timezone as is returned by Git, I need to set tz in the call to as.Date():

> x <- commits(n = 1)[[1]]
> x$author$when
2019-11-08 15:39:58
> as.character(as.Date(as.POSIXct(x$author$when), tz = Sys.timezone()))
[1] "2019-11-08"
@stewid
Copy link
Member

@stewid stewid commented Nov 15, 2019

Great, looking forward to see the results from your new test run 👍

@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 15, 2019

date_git2r <- as.character(as.Date(as.POSIXct(c1$author$when$time, origin = "1970-01-01", tz = "UTC"))

Note that this wouldn't work. It still returns the time 5 hours ahead:

> Sys.time()
[1] "2019-11-15 12:57:32 EST"
> as.POSIXct(c1$author$when$time, origin = "1970-01-01", tz = "UTC")
[1] "2019-11-15 17:55:36 UTC"
jdblischak added a commit to jdblischak/workflowr that referenced this issue Nov 15, 2019
@jdblischak
Copy link
Contributor Author

@jdblischak jdblischak commented Nov 18, 2019

My solution of setting the local timezone in the call to as.Date() fixed the issue I was having. My hourly test passed for all 24 hours. Furthermore, the fix worked with rhub::check_for_cran() and devtools::check_win_devel() (I was worried that the timezone might not be set on the build machines).

jdblischak added a commit to jdblischak/workflowr that referenced this issue Nov 18, 2019
@stewid
Copy link
Member

@stewid stewid commented Nov 18, 2019

Thanks, your script that was run every hour was a very elegant way to address this issue.

@stewid stewid closed this Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.