Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIXt import to TIMESTAMP field fails #251

Closed
byapparov opened this issue May 31, 2018 · 7 comments

Comments

@byapparov
Copy link
Contributor

commented May 31, 2018

tbl <- bq_table(
  project = Sys.getenv("BIGQUERY_PROJECT"),
  dataset = Sys.getenv("BIGQUERY_DATASET"),
  table = "test_table"
)

bq_table_create(
  tbl, 
  fields = bq_fields(
    list(bq_field("datetime", "TIMESTAMP"))
  )
)

dt <- data.frame(datetime = Sys.time())

job <- bigrquery::bq_perform_upload(
  x = tbl,
  values = dt,
  write_disposition = "WRITE_APPEND",
  create_disposition = "CREATE_NEVER"
)
bq_job_wait(job)

Running job [] 1s
Error: Invalid schema update. Field datetime has changed type from TIMESTAMP to FLOAT [invalid]

With CREATE_IF_NEEDED create disposition value table will be created with FLOAT field instead of TIMESTAMP.

@rasmusab

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2018

Hi @byapparov ,

It seems that this is because of the two lines in export_json() (line 81 in old-job-upload.R) that "Convert times to unix timestamps".

export_json <- function(values) {
  # Eliminate row names
  rownames(values) <- NULL

  # Convert times to unix timestamps
  is_time <- vapply(values, function(x) inherits(x, "POSIXt"), logical(1))
  values[is_time] <- lapply(values[is_time], as.numeric)

  con <- rawConnection(raw(0), "r+")
  on.exit(close(con))
  jsonlite::stream_out(values, con, verbose = FALSE, na = "null")

  rawToChar(rawConnectionValue(con))
}

If I comment these two lines out I can round trip a TIMESTAMP or a DATETIME like this:

library(bigrquery)
library(DBI)
con <- dbConnect(bigrquery::bigquery(), project = "my-special-project")

db_datetime  <- dbGetQuery(con, "SELECT CURRENT_DATETIME()")
db_timestamp <- dbGetQuery(con, "SELECT CURRENT_TIMESTAMP()")

test_dataset <- bq_dataset("my-special-project", "temp_test_123_qwerty")
test_timestamp_table <- bq_table(test_dataset, "test_timestamp")
test_datetime_table <- bq_table(test_dataset, "test_datetime")

bq_table_create(test_timestamp_table, db_timestamp)
bq_table_create(test_datetime_table, db_datetime)

bq_table_upload(test_timestamp_table, db_timestamp, write_disposition = "WRITE_APPEND")
bq_table_upload(test_datetime_table, db_datetime, write_disposition = "WRITE_APPEND")

I could make a pull request out of this but I assume that there is a good reason why these two lines are there, so I digress.

@byapparov

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2018

@rasmusab if you make a PR I can check that existing tests pass and add new tests.

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

I think a PR would be useful. If it works ok now, it's possible that bigquery behaviour has changed since I wrote the code (or I made a mistake and it was never necessary)

@rasmusab

This comment has been minimized.

Copy link
Contributor

commented Jan 27, 2019

I assume that there is a good reason why these two lines are there

I think that I've found the good reason in that if I remove the two lines mentioned above from export_json then the test in test-upload.r stops working.

I haven't been able to figure out why exactly, yet. But it seems to be something with the string representation of POSIXct values. A weird thing is, the original test-upload.r won't pass after my proposed fix above and it tries to rountrip the following data frame:

>   df1 <- data.frame(
+     x = as.Date("2018-01-01"),
+     y = as.POSIXct(as.Date("2018-01-01")) 
+   )
> df1
           x                   y
1 2018-01-01 2018-01-01 01:00:00

This gives an error, but if one adds a second to y then it passes:

>   df1 <- data.frame(
+     x = as.Date("2018-01-01"),
+     y = as.POSIXct(as.Date("2018-01-01"))  + 1
+   )
> df1
           x                   y
1 2018-01-01 2018-01-01 01:00:00

However the above (adding + 1 second) does not pass when the # Convert times to unix timestamps are in place, as they currently are.

So, no PR from me currently, as I'm not sure what's going on yet.

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

@rasmusab I bet that's because date-times at midnight don't get the time printed — maybe it would be better to call format() rather than as.numeric()

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

Reprex:

library(bigrquery)

ds <- bq_test_dataset()
df <- data.frame(datetime = as.POSIXct("2020-01-01 09:00", tz = "UTC"))

tb1 <- bq_table_create(
  bq_table(ds, "timestamp"), 
  bq_fields(list(bq_field("datetime", "TIMESTAMP")))
)
bq_table_upload(tb1, df)
#> Error: Provided Schema does not match Table bigrquery-examples:TESTING_cpmmhddfef.timestamp. Field datetime has changed type from TIMESTAMP to INTEGER [invalid]

tb2 <- bq_table_create(
  bq_table(ds, "datetime"), 
  bq_fields(list(bq_field("datetime", "DATETIME")))
)
bq_table_upload(tb2, df)
#> Error: Provided Schema does not match Table bigrquery-examples:TESTING_cpmmhddfef.datetime. Field datetime has changed type from DATETIME to INTEGER [invalid]

Created on 2019-01-28 by the reprex package (v0.2.1.9000)

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

I've got this — there's another issue around autodetecting the schema when the table already exists.

@hadley hadley closed this in f7eb39d Jan 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.