Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery datetime prior to 1970-01-01 #285

Open
skydavis435 opened this issue Jan 16, 2019 · 4 comments

Comments

@skydavis435
Copy link

commented Jan 16, 2019

BigQuery tables containing DATETIME column with dates prior to 1970-01-01 do not get stored in data frames correctly. Datetimes before the UNIX epoch default to 1969-12-31 23:59:59.

Filtering based on this column still behaves as expected, but the wrong datetime is displayed/stored. Mutating a column specifying that a string is desired reveals the expected datetime string.

library(tidyverse)
library(bigrquery)

con <- DBI::dbConnect(
  bigquery(),
  project = "bigquery-public-data",
  dataset = "moon_phases",
  billing = "your-billing"
  )

tbl(con, "moon_phases") %>%
  filter(peak_datetime < "1970-01-01") %>% 
  mutate(peak_datetime_string = sql("CAST(peak_datetime as STRING)"))
Complete
Billed: 10.49 MB
Downloading 11 rows in 1 pages.
# Source:   lazy query [?? x 4]                                                                                                                                    
# Database: BigQueryConnection
   phase    phase_emoji peak_datetime       peak_datetime_string  
   <chr>    <chr>       <dttm>              <chr>              
 1 New Moon 🌑          1969-12-31 23:59:59 1889-07-28 00:00:00
 2 New Moon 🌑          1969-12-31 23:59:59 1900-12-22 00:01:00
 3 New Moon 🌑          1969-12-31 23:59:59 1825-05-18 00:02:00
 4 New Moon 🌑          1969-12-31 23:59:59 1914-02-25 00:02:00
 5 New Moon 🌑          1969-12-31 23:59:59 1888-11-04 00:02:00
 6 New Moon 🌑          1969-12-31 23:59:59 1941-11-19 00:04:00
 7 New Moon 🌑          1969-12-31 23:59:59 1796-04-08 00:05:00
 8 New Moon 🌑          1969-12-31 23:59:59 1938-11-22 00:05:00
 9 New Moon 🌑          1969-12-31 23:59:59 1840-05-02 00:06:00
10 New Moon 🌑          1969-12-31 23:59:59 1853-08-05 00:06:00
@hadley

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

Minimal reprex:

library(bigrquery)

sql <- "
  SELECT `peak_datetime`, SAFE_CAST(`peak_datetime` AS STRING) AS `string`
  FROM `bigquery-public-data.moon_phases.moon_phases`
  WHERE (`peak_datetime` < '1970-01-01')
  LIMIT 10
"

job <- bq_project_query(bq_test_project(), sql)
bq_table_download(job)
@hadley

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

Simpler reprex:

 dt <- as.POSIXct("1901-01-01")
 bq_parse_single(as.character(dt, "%Y-%m-%dT%H:%M:%S"), "datetime")

hadley added a commit that referenced this issue Jan 23, 2019

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 23, 2019

The root cause of this is the C function gmtime() which only returns values for date times after the start of the unix epoch. I've at least made this return NA values, but a proper fix will require a gmtime() replacement.

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 23, 2019

Argh, that appears to be true on OS X, but not linux 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.