-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Think about return types for date, time and timestamp #42
Comments
How are times stored in the database? Seconds from midnight? Might be useful to create (or find) lightweight time package. Both haven and readr currently define their own time classes. |
I'm not sure about the internal storage, but probably it's something like seconds from midnight, or a fraction of a day. I like this idea of a data type package. Same for 64-bit and larger, and perhaps also currency: A new package that uses a lossless storage type (probably character) but defines useful conversions to/from types the user actually can work with (numeric, bit64, int64, raw, ...). BLOBs should probably be returned as lists of |
I think times / date storage is DB dependent. |
Seems reasonable to me, but please keep DBI free from dependencies on other packages. I do think numbers should stay numbers though if at all possible. And MonetDB stores times as timestamps since 1970-01-01 with millisecond precision in long values internally. |
Why is this important? I think lossless data types can be of more general use than just for DBI, and should therefore live in a separate package. |
@hannesmuehleisen: Unfortunately, numbers can't always stay numbers in R without sacrificing precision. For timestamps stored as milliseconds since the epoch, using If there's a new "data types" package, I agree that it shouldn't include mandatory dependencies itself. But I still don't see why DBI shouldn't import this "data types" package. Conversion to and from R's internal data types (with mutable warnings if precision is lost) always will be possible. |
I would argue as long as base R does not have proper int64 support, DBI does not have to either. |
There are R packages that do offer some support for 64-bit values. They just might not be suitable for every use case. I think DBI's job is to be able to get the data in and out of the database in a safe and unambiguous way without additional loss of precision, using R's base data types. To me, this means either
|
Throwing a warning if accuracy is lost is a good idea. But still think its a bad idea to convert some numerical SQL types to R's numeric or integer and some to characters. Nobody expects that. The clean way of doing that would be to follow the JDBC approach, where the caller MUST specify the type of data to be read. But I think nobody wants that either. |
Agreed to both. Another way would be to include the original data as an attribute in cases where conversion losses can occur:
The attribute won't survive data frame manipulations, but at least the data is there when DBI returns it, and the unsuspecting user isn't surprised. What about time-only data types? I think those should be returned as seconds from midnight, to allow painless interoperability with POSIXct. |
Let's use lightweight time class - number of seconds since midnight (as double) + class attribute. For data types which have a loss conversion to closest R type, I like the idea to store the "original" value (as a character vector) as an attribute. |
Like the attr solution as well. |
RJDBC currently only supports character and numeric return and write values. So dates for example are returned as characters. I've worked around this in my My approach is obviously mostly hackery, and I would love to see better way of mapping DB types to R types (and vice-versa) |
@imanuelcostigan: Data types not supported by the SQL engine are (not) covered in #61. This issue is about SQL data types for which R has no (lossless) equivalent. |
@krlmlr I'm not sure I follow. For example, dates and date-times are supported by SQL Server. |
I'm just saying that the conversion DB -> R and R -> DB are two separate issues. The former is handled here, but I'm not sure how your case fits in. You seem to be able somehow to determine the true underlying data type, which is fine: I think it's the DBI driver's job to return proper data types for the DB -> R path. The data types "bigint" (64-bit integer) and "time" (number of seconds, without date) are a bit tricky, and this is what this issuer here is mostly about. The conversion R -> DB has been discussed in #61. DBI will support specifying field types when creating tables via Is there any further functionality you'd like to see in DBI? |
+1 to the idea of storing the 'original' value as a character vector. I just want to reiterate the importance of returning lossless information. In my current application, I need to be able to return correct BIGINT counts often. I have to do the cast to char query side, which is tricky because the query generation is already very complex. A feature like this would side-step a lot of complexity for me. |
|
@hadley: Could you please take a look at the new hms package (early draft)? I'd like to put it under the rstats-db namespace, and eventually release to CRAN. |
Current situation:
It is unclear if, and how much, the DBI driver should abstract the DB server's understanding of these types. The current MySQL convention looks good for dates and timestamps, and should be adopted by the other backends. The "time" type is tricky: It doesn't seem to have a time zone, and probably represents a duration in most cases, but this cannot be relied on. Still, a
lubridate::duration()
might be a good bet here.For now, the test tests for the "character" data type.
The text was updated successfully, but these errors were encountered: