Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSQLite keeps numeric of epoch day as date in SQLite database whereas SQLite regards numeric as Julian day (REAL) or epoch second (INTEGER) #509

Open
mickeykawai opened this issue May 24, 2024 · 1 comment

Comments

@mickeykawai
Copy link

mickeykawai commented May 24, 2024

Hi, I love shiny with RSQLite as well as light weight SQLite. Thank you for this useful package.

Here I'd comment an apparent discrepancy of data type of date between R and SQLite. This is related but different from the issue of extended_types = TRUE. Also this is different from discussion about keeping date as ISO8601 string like "2012-03-04" but not numeric (#351).

In short, present situation is that RSQLite saves epoch day in SQLite3, which is regarded on SQLite3 as julian day by default.
I wonder it would be more consistent if RSQLite converts R's Date object of epoch day to julian day (or less preferably epoch second; #) before inserting the data into the backend corresponding table of SQLite's database (# note SQLite auto does not work as expected for data of epoch second for early 1970 because a second in range of 0-5373484.499999 is regarded as julian day, and also does not work as expected for data of julian day for older than B.C. 4714-11-24 (-4713-11-24) or newer than A.D. 9999-12-31 because such day is regarded as epoch second).
Then, to see on SQLite what date it is in human readable format, we just need like

select date(mydate) from t1 where date(mydate) = "2012-03-04";

instead of present style like

select date(mydate *24*3600, "unixepoch") from t1 where date(mydate *24*3600, "unixepoch") = "2012-03-04";

or

-- For where clause, calculate based on epoch second at right-side value. 
select date(mydate *24*3600, "unixepoch") from t1 where mydate = unixepoch("2012-03-04") / 24/3600;

or

-- For where clause, calculate based on julian day at right-side value.
select date(mydate *24*3600, "unixepoch") from t1 where mydate = julianday("2012-03-04") - julianday("1970-01-01");

c.f. SQLite 2.2. Date and Time Datatype

2.2. Date and Time Datatype

SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:

TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Applications can choose to store dates and times in any of these formats and freely convert between formats using the built-in date and time functions.


Brief explanation

RSQLite keeps epoch day as date in SQLite database, as R's Date object is epoch day.
c.f.
as.Date: Date Conversion Functions to and from Character
Connect to an SQLite database / Extended Types

SQLite handles three types as date, ISO-8601 format text, julian day, or epoch second.
The default modifier auto of date() in SQLite handles if the value is between 0.0 and 5373484.499999, then it is interpreted as a julian day number (corresponding to dates between -4713-11-24 12:00:00 and 9999-12-31 23:59:59, inclusive).
c.f.
2.2. Date and Time Datatype
Date And Time Functions

So, what happens when we create a date column via RSQLite in a table of SQLite3 database? RSQLite saves epoch day in SQLite3, which SQLite3 regards as julian day by default. To see on SQLite what date it is in human readable format, we need explicit multiplication by *24*3600 along with addition of modifier "unixepoch", like

select date(mydate *24*3600, "unixepoch") from t1 where date(mydate *24*3600, "unixepoch") = "2012-03-04";

For example, in my test data below, R's Date of "2012-03-04" keeps it as epoch day, i.e. 15403, and then SQLite regards it as julian day, so SQLite gives "-4670-01-26" when "select date(mydate) from t1;".

I posted this issue on Posit Community here:
https://forum.posit.co/t/where-can-i-find-a-document-that-tells-we-need-explicit-as-character-when-we-use-input-of-dateinput-as-parameterized-sqlite3-query-via-rsqlite-but-not-with-query-constructed-with-insecure-paste0/186415/2

Reprex

Here let us create SQLite database t.db (not on memory) via RSQLite, and then examine the data from R and from SQLite.

library(DBI)

con <- dbConnect(RSQLite::SQLite(), "t.db", extended_types = TRUE)
dbExecute(con, 'CREATE TABLE t1 (mydate DATE, id INTEGER PRIMARY KEY);')
dbExecute(con, 'INSERT OR IGNORE INTO t1 (mydate) VALUES (?), (?);', c(as.Date("2012-03-04"), as.Date("1234-05-06")))

date1 <- as.Date("2012-03-04")
dput(date1)
# structure(15403, class = "Date")

query <- "select * from t1 where t1.mydate = ?"
dbGetQuery(con, query, params = c(date1))
#      mydate id
# 1 2012-03-04  1

dbDisconnect(con)

So, on R, we can see mydate is shown as 2012-03-04 as expected.
(Note that I added extended_types = TRUE in dbConnect() at the beginning to get returned results as R's Date object which RSQLite correctly shows as human-readable format.)

But on SQLite, the data inserted by RSQlite as date is indeed a small integer of epoch day, which SQLite regards as julian day by default. So we need multiplication by *24*3600 and modifier unixepoch explicitly.

On terminal,

sqlite3 --header t.db  

Then on sqlite3 console,
# below, -- indicates comment line on sqlite.

select * from t1;
-- mydate|id
-- 15403|1
-- -268693|2
-- The date records inserted by RSQLite are kept as small integer that is epoch day. 

select date(mydate) from t1;
-- date(mydate)
-- -4670-01-26
-- 
-- The integer is regarded as julian day at SQLite side by default, 
--  which makes "2012-03-04" to "-4670-01-26". 
-- It seems SQLite fails to judge negative small integer as epoch second for the second record (originally "1234-05-06"), 
--  although it says "If the time-value is numeric (the DDDDDDDDDD format) then the 'auto' modifier causes the time-value to interpreted as either a julian day number or a unix timestamp, depending on its magnitude." https://www.sqlite.org/lang_datefunc.html#modifiers ..

-- Adding modifier "unixepoch" along with multiplication explicitly to get desired result. 
select date(mydate *24*3600, "unixepoch") from t1;
-- date(mydate *24*3600, "unixepoch")
-- 2012-03-04
-- 1234-05-06
-- Here it works as described in SQLite's document as: 
--  "The "unixepoch" modifier (20) only works if it immediately follows a time-value in the DDDDDDDDDD format. This modifier causes the DDDDDDDDDD to be interpreted not as a Julian day number as it normally would be, but as Unix Time - the number of seconds since 1970. " https://www.sqlite.org/lang_datefunc.html#modifiers

-- We need it in both select and where clauses. 
select date(mydate *24*3600, "unixepoch") from t1 where date(mydate *24*3600, "unixepoch") = "2012-03-04";
-- date(mydate *24*3600, "unixepoch")
-- 2012-03-04

-- Or if you prefer doing at right-side value, these work. 
-- Calculation based on epoch second. 
select date(mydate *24*3600, "unixepoch") from t1 where mydate = unixepoch("2012-03-04") / 24/3600;
-- date(mydate *24*3600, "unixepoch")
-- 2012-03-04

-- Calculation based on julian day.
select date(mydate *24*3600, "unixepoch") from t1 where mydate = julianday("2012-03-04") - julianday("1970-01-01");
-- date(mydate *24*3600, "unixepoch")
-- 2012-03-04

.q
Log with SQLite 3.43.2.
% sqlite3 --header t.db      
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> select * from t1;
mydate|id
15403|1
-268693|2

sqlite> -- the integer is regarded as julian day. 
sqlite> select date(mydate) from t1;
date(mydate)
-4670-01-26

sqlite> -- add modifier "unixepoch" along with multiplication explicitly to get desired result. 
sqlite> select date(mydate *24*3600, "unixepoch") from t1;
date(mydate *24*3600, "unixepoch")
2012-03-04
1234-05-06

sqlite> -- need it in both select and where clauses. 
sqlite> select date(mydate *24*3600, "unixepoch") from t1 where date(mydate *24*3600, "unixepoch") = "2012-03-04";
date(mydate *24*3600, "unixepoch")
2012-03-04

sqlite> -- Or if you prefer doing at right-side value, these work. 
sqlite> -- Calculation based on epoch second. 
sqlite> select date(mydate *24*3600, "unixepoch") from t1 where mydate = unixepoch("2012-03-04") / 24/3600;
date(mydate *24*3600, "unixepoch")
2012-03-04

sqlite> -- Calculation based on julian day.
sqlite> select date(mydate *24*3600, "unixepoch") from t1 where mydate = julianday("2012-03-04") - julianday("1970-01-01");
date(mydate *24*3600, "unixepoch")
2012-03-04

sqlite> .q
%
sessionInfo
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] DBI_1.2.2

loaded via a namespace (and not attached):
 [1] bit_4.0.5         compiler_4.3.2    fastmap_1.1.1    
 [4] cli_3.6.2         hms_1.1.3         tools_4.3.2      
 [7] rstudioapi_0.16.0 memoise_2.0.1     bit64_4.0.5      
[10] vctrs_0.6.5       RSQLite_2.3.6     cachem_1.0.8     
[13] blob_1.2.4        lifecycle_1.0.4   pkgconfig_2.0.3  
[16] rlang_1.1.3   
@krlmlr
Copy link
Member

krlmlr commented May 24, 2024

Thanks, @mickeykawai, for the very detailed write-up.

@ablack3: As the contributor to extended_types, can you please comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants