Think about return types for date, time and timestamp #42

krlmlr · 2015-10-11T08:06:28Z

Current situation:

Dates are returned as "Date" class
Times are returned character, or integer (number of seconds, MySQL only)
Timestamps are returned character, or POSIXct (MySQL only)

It is unclear if, and how much, the DBI driver should abstract the DB server's understanding of these types. The current MySQL convention looks good for dates and timestamps, and should be adopted by the other backends. The "time" type is tricky: It doesn't seem to have a time zone, and probably represents a duration in most cases, but this cannot be relied on. Still, a lubridate::duration() might be a good bet here.

For now, the test tests for the "character" data type.

The text was updated successfully, but these errors were encountered:

hadley · 2015-10-11T14:39:29Z

How are times stored in the database? Seconds from midnight?

Might be useful to create (or find) lightweight time package. Both haven and readr currently define their own time classes.

krlmlr · 2015-10-30T18:55:10Z

I'm not sure about the internal storage, but probably it's something like seconds from midnight, or a fraction of a day.

I like this idea of a data type package. ???hms didn't return anything useful, so I'd vote for extracting the code from haven. This class should define useful converters to/from the most common date/time types (POSIX, lubridate, ...).

Same for 64-bit and larger, and perhaps also currency: A new package that uses a lossless storage type (probably character) but defines useful conversions to/from types the user actually can work with (numeric, bit64, int64, raw, ...).

BLOBs should probably be returned as lists of raw.

imanuelcostigan · 2015-10-30T19:02:08Z

I think times / date storage is DB dependent.

hannes · 2015-11-09T17:06:25Z

Seems reasonable to me, but please keep DBI free from dependencies on other packages. I do think numbers should stay numbers though if at all possible. And MonetDB stores times as timestamps since 1970-01-01 with millisecond precision in long values internally.

krlmlr · 2015-11-09T17:21:22Z

Why is this important? I think lossless data types can be of more general use than just for DBI, and should therefore live in a separate package.

krlmlr · 2015-11-10T08:48:35Z

@hannesmuehleisen: Unfortunately, numbers can't always stay numbers in R without sacrificing precision. For timestamps stored as milliseconds since the epoch, using numeric might just work (at lest for the next few millennia) -- but not necessarily for generic 64-bit or larger values.

If there's a new "data types" package, I agree that it shouldn't include mandatory dependencies itself. But I still don't see why DBI shouldn't import this "data types" package. Conversion to and from R's internal data types (with mutable warnings if precision is lost) always will be possible.

hannes · 2015-11-10T08:50:14Z

I would argue as long as base R does not have proper int64 support, DBI does not have to either.

krlmlr · 2015-11-10T09:05:34Z

There are R packages that do offer some support for 64-bit values. They just might not be suitable for every use case.

I think DBI's job is to be able to get the data in and out of the database in a safe and unambiguous way without additional loss of precision, using R's base data types. To me, this means either character or list(raw(...)) for 64-bit values, and I think character is the lesser of the two evils.

> bigint <- structure("1234567890123456789", class = c("DBI_bigint", "character"))
> bigint
[1] "1234567890123456789"
attr(,"class")
[1] "DBI_bigint" "character" 
> bit64::as.integer64(bigint)
integer64
[1] 1234567890123456789
> type.convert(bigint, numerals = "warn.loss")
[1] 1.234568e+18
Warning message:
In type.convert(bigint, numerals = "warn.loss") :
  accuracy loss in conversion from "1234567890123456789" to numeric
> as.numeric(bigint)
[1] 1.234568e+18

hannes · 2015-11-10T09:18:37Z

Throwing a warning if accuracy is lost is a good idea. But still think its a bad idea to convert some numerical SQL types to R's numeric or integer and some to characters. Nobody expects that. The clean way of doing that would be to follow the JDBC approach, where the caller MUST specify the type of data to be read. But I think nobody wants that either.

krlmlr · 2015-11-10T09:29:23Z

Agreed to both. Another way would be to include the original data as an attribute in cases where conversion losses can occur:

> bigint <- structure(1234567890123456789, original = "1234567890123456789")
> bigint
[1] 1.234568e+18
attr(,"original")
[1] "1234567890123456789"

The attribute won't survive data frame manipulations, but at least the data is there when DBI returns it, and the unsuspecting user isn't surprised.

What about time-only data types? I think those should be returned as seconds from midnight, to allow painless interoperability with POSIXct.

hadley · 2015-11-11T21:47:59Z

Let's use lightweight time class - number of seconds since midnight (as double) + class attribute.

For data types which have a loss conversion to closest R type, I like the idea to store the "original" value (as a character vector) as an attribute.

hannes · 2015-11-12T08:12:59Z

Like the attr solution as well.

imanuelcostigan · 2015-11-12T21:52:32Z

RJDBC currently only supports character and numeric return and write values. So dates for example are returned as characters. I've worked around this in my RSQLServer fetch method and expanded RJDBC's dbDataType method to improve writing out R types to SQLServer. Similarly, I have also expanded RJDBC's .fillStatementParameters which is used by dbSendUpdate.

My approach is obviously mostly hackery, and I would love to see better way of mapping DB types to R types (and vice-versa)

krlmlr · 2015-11-12T21:57:00Z

@imanuelcostigan: Data types not supported by the SQL engine are (not) covered in #61. This issue is about SQL data types for which R has no (lossless) equivalent.

imanuelcostigan · 2015-11-13T00:10:11Z

@krlmlr I'm not sure I follow. For example, dates and date-times are supported by SQL Server.

krlmlr · 2015-11-13T08:47:27Z

I'm just saying that the conversion DB -> R and R -> DB are two separate issues. The former is handled here, but I'm not sure how your case fits in. You seem to be able somehow to determine the true underlying data type, which is fine: I think it's the DBI driver's job to return proper data types for the DB -> R path. The data types "bigint" (64-bit integer) and "time" (number of seconds, without date) are a bit tricky, and this is what this issuer here is mostly about.

The conversion R -> DB has been discussed in #61. DBI will support specifying field types when creating tables via dbWriteTable (#18), but probably not much more.

Is there any further functionality you'd like to see in DBI?

yasminlucero · 2015-12-01T23:44:10Z

+1 to the idea of storing the 'original' value as a character vector. I just want to reiterate the importance of returning lossless information.

In my current application, I need to be able to return correct BIGINT counts often. I have to do the cast to char query side, which is tricky because the query generation is already very complex. A feature like this would side-step a lot of complexity for me.

krlmlr · 2016-01-13T21:33:50Z

"hms" package for time values, code pulled out of readr/haven
- Check that data type is "double"
Lossy values: "lossy" S3 class
- int64
- decimal
- character with embedded NUL (Think about strings with embedded NULs #94)
- ...

krlmlr · 2016-03-31T10:53:27Z

@hadley: Could you please take a look at the new hms package (early draft)? I'd like to put it under the rstats-db namespace, and eventually release to CRAN.

krlmlr mentioned this issue Oct 11, 2015

Data type for time r-dbi/RMySQL#95

Closed

krlmlr mentioned this issue Nov 6, 2015

Add roundtrip test for time values r-dbi/DBItest#14

Closed

krlmlr mentioned this issue Nov 11, 2015

Think about custom serializers/deserializers #61

Closed

krlmlr added the action:design label Nov 11, 2015

This was referenced Feb 2, 2016

Int64 should be saved as character r-dbi/RPostgres#14

Closed

64-bit integers r-dbi/bigrquery#94

Closed

krlmlr mentioned this issue Mar 31, 2016

Default implementation for dbDataType() #70

Closed

krlmlr mentioned this issue Apr 20, 2016

dbDataType() improvements #97

Merged

krlmlr closed this as completed in #97 Apr 30, 2016

krlmlr mentioned this issue May 26, 2016

Test for 64-bit integers r-dbi/DBItest#74

Closed

krlmlr mentioned this issue Jun 16, 2016

Big int is not properly handled r-dbi/RSQLite#65

Closed

krlmlr mentioned this issue Aug 16, 2016

Explicitly cast data tests to precise types r-dbi/DBItest#90

Closed

imanuelcostigan mentioned this issue Sep 20, 2016

datetime/date problems imanuelcostigan/RSQLServer#110

Closed

krlmlr mentioned this issue Jul 5, 2017

Data type for time r-dbi/RMariaDB#16

Closed

github-actions bot locked and limited conversation to collaborators Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Think about return types for date, time and timestamp #42

Think about return types for date, time and timestamp #42

krlmlr commented Oct 11, 2015

hadley commented Oct 11, 2015

krlmlr commented Oct 30, 2015

imanuelcostigan commented Oct 30, 2015

hannes commented Nov 9, 2015

krlmlr commented Nov 9, 2015

krlmlr commented Nov 10, 2015

hannes commented Nov 10, 2015

krlmlr commented Nov 10, 2015

hannes commented Nov 10, 2015

krlmlr commented Nov 10, 2015

hadley commented Nov 11, 2015

hannes commented Nov 12, 2015

imanuelcostigan commented Nov 12, 2015

krlmlr commented Nov 12, 2015

imanuelcostigan commented Nov 13, 2015

krlmlr commented Nov 13, 2015

yasminlucero commented Dec 1, 2015

krlmlr commented Jan 13, 2016

krlmlr commented Mar 31, 2016

Think about return types for date, time and timestamp #42

Think about return types for date, time and timestamp #42

Comments

krlmlr commented Oct 11, 2015

hadley commented Oct 11, 2015

krlmlr commented Oct 30, 2015

imanuelcostigan commented Oct 30, 2015

hannes commented Nov 9, 2015

krlmlr commented Nov 9, 2015

krlmlr commented Nov 10, 2015

hannes commented Nov 10, 2015

krlmlr commented Nov 10, 2015

hannes commented Nov 10, 2015

krlmlr commented Nov 10, 2015

hadley commented Nov 11, 2015

hannes commented Nov 12, 2015

imanuelcostigan commented Nov 12, 2015

krlmlr commented Nov 12, 2015

imanuelcostigan commented Nov 13, 2015

krlmlr commented Nov 13, 2015

yasminlucero commented Dec 1, 2015

krlmlr commented Jan 13, 2016

krlmlr commented Mar 31, 2016