-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with IDate data type after readind a CSV with import #293
Comments
I can't reproduce this issue: library(rio)
download.file('https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7',
destfile = 'covid.csv')
df <-import('covid.csv')
df$jour
sapply(df,class)
export(df, format = 'RDS')
import('df.rds')
df <- import('df.rds')
df$jour All works fine for me. R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 12.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rio_0.5.29
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 fansi_1.0.3 utf8_1.2.2 crayon_1.5.1
[5] cellranger_1.1.0 lifecycle_1.0.1 magrittr_2.0.2 zip_2.2.0
[9] pillar_1.7.0 stringi_1.7.6 rlang_1.0.2 cli_3.2.0
[13] readxl_1.3.1 curl_4.3.2 data.table_1.14.2 vctrs_0.3.8
[17] ellipsis_0.3.2 openxlsx_4.2.5 tools_4.1.0 forcats_0.5.1
[21] foreign_0.8-81 glue_1.6.2 hms_1.1.1 compiler_4.1.0
[25] pkgconfig_2.0.3 haven_2.4.1 tibble_3.1.6 |
The problem is reproducible but I think it is a problem of dplyr. Something similar in data.table was discussed here: Rdatatable/data.table#2008 library(rio)
download.file('https://www.data.gouv.fr/fr/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7',
destfile = 'covid.csv')
df <-import('covid.csv')
export(df, "df.RDS")
df <- import('df.RDS')
df$b <- df$jour-365
df2 <- dplyr::filter(df,sexe==0)
df2$b <- df2$jour-365
#> Error in `-.IDate`(df2$jour, 365): Internal error: storage mode of IDate is somehow no longer integer
storage.mode(df$jour)
#> [1] "integer"
storage.mode(df2$jour)
#> [1] "double" Created on 2023-09-11 with reprex v2.0.2 here is a minimal example that is independent of rio library(data.table)
library(dplyr)
df <- data.table(a=Sys.Date(),b=14)
df$a-365
#> [1] "2022-09-11"
storage.mode(df$a)
#> [1] "double"
tb <- as_tibble(df)
tb$a-365
#> [1] "2022-09-11"
storage.mode(tb$a)
#> [1] "double"
df$a <- as.IDate(df$a)
df$a-365
#> [1] "2022-09-11"
storage.mode(df$a)
#> [1] "integer"
tb <- as_tibble(df) |> dplyr::filter(a>=Sys.Date())
tb$a-365
#> Error in `-.IDate`(tb$a, 365): Internal error: storage mode of IDate is somehow no longer integer
storage.mode(tb$a)
#> [1] "double" Created on 2023-09-11 with reprex v2.0.2 @chainsawriot Dn't think we need to do anything in rio, but is this something to escalate to the dplyr team? Edit: |
Ok this appears to be an open issue in vctrs: r-lib/vctrs#1781 |
@schochastics Thank you very much for the investigation. I tried |
Oh fascinating! |
Hello,
I am experiencing non deterministic problems with a data frame containing an IDate column. In the following example I kept only two columns of the original file (donnees-hospitalieres-covid19-2021-12-08-19h05.csv or the same for a different date from https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/). The file was read using
import
without additional option.Replacing the call to dplyr by
df2 <- df[df$sexe==0,]
make the things go well again. So I don't really know where is the problem : in rio, in dplyr or in the IDate type from data.table.If the problem is related with the IDate type wouldn't it be possible to use the standard type Date instead, overriding what fread did?
From a fresh session under R 4.1.1 :
The text was updated successfully, but these errors were encountered: