-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Support Tibbles #12
Comments
Thanks for the feature request. Would you like to be able to write as well as read tibble from a fst file? Or just get the data returned as a tibble? |
I think that both writing and reading would be great. Would there be a way to store these additional classes (or data.table when appropriate) in the fst file so that, when reading, the correct class is returned? Perhaps when checking whether the |
Returning the data as a some_list <- as.list(1:10000)
# Serialize
some_file <- file("test.bin", "wb")
lapply(some_list, serialize, some_file)
close(some_file)
# Unserialize
some_file <- file("test.bin", "rb")
res <- lapply(1:10000, function(x) { unserialize(some_file)})
close(some_file) In this case each list element is serialized by R's native serialization mechanism. In that way, data could still be accessed randomly and could even be compressed by the LZ4 or ZSTD compressors that |
Data.frames and data.tables also allows complex columns, so there is nothing specific to "tibbles". Tibble is just attribute wich allows pretty printing and few other minor things. |
Indeed @dselivanov, there is no intrinsic difference between a |
Support for complex columns will be definitely very nice feature (but I realize that speed will suffer a lot). Actually I asked for similar functionality in feather here. |
Yes, nice. I can use R's internal serialize method to serialize each list element to a raw vector and compress with LZ4 and ZSTD from there (and then write to a |
I must confess that I'm not in a position to properly comment on this. Perhaps that @hadley, the author of |
Hi @jeroenjanssens, these last months, the core For that reason, I can't honor your request for storing the specific table type inside the library(pryr)
library(tibble)
mem_used()
#> 34.2 MB
df <- data.frame(x = 1:100000000) # 400 MB vector
mem_used()
#> 434 MB
df_tibble <- tibble::as.tibble(df)
mem_used()
#> 435 MB you can see that the cast to address(df)
#> [1] "0x17ad2850"
address(dt_tibble)
#> [1] "0x105706e0"
x_vec <- df$x
x_vec2 <- df_tibble$x
address(x_vec)
#> [1] "0x7ff5e7cb0010"
address(x_vec2)
#> [1] "0x7ff5e7cb0010" also, in terms of speed, the cast is very effective: library(microbenchmark)
median(microbenchmark(
df <- as.tibble(df)
)$time)
#> [1] 3285 that's just 3 microseconds for that cast, very fast. To make a long story short, you can effectively get a library(fst)
write_fst(df, "df.fst")
df_tibble <- as.tibble(read_fst("df.fst")) Hope that will be sufficient for your purposes, thanks a lot for filing your feature request! |
Thanks for getting back to this. This is a very reasonable solution. Thanks! |
Excellent package. I read that it supports
data.table
s. Would it be possible to also add support for reading FST files astibble
s ?The text was updated successfully, but these errors were encountered: