Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_fstrds and read_fstrds functions #210

Closed
kendonB opened this issue Aug 22, 2019 · 2 comments
Closed

write_fstrds and read_fstrds functions #210

kendonB opened this issue Aug 22, 2019 · 2 comments

Comments

@kendonB
Copy link

kendonB commented Aug 22, 2019

I followed this page: https://www.r-bloggers.com/multi-threaded-lz4-and-zstd-compression-from-r/

It would be amazing to have a drop-in replacement for saveRDS and readRDS that uses fst compression. I wrote the below functions which seem to just work though may benefit from some optimization. Could the fst package incorporate something like this?

library(fst)

write_fstrds <- function(object, file, compressor = "ZSTD", 
                         compression = 0, hash = FALSE){
  raw_vector = serialize(object, NULL)
  raw_vector_compressed = fst::compress_fst(raw_vector, compressor = compressor, 
                                            compression = compression, hash = hash)
  file_connection = file(file, "wb")
  on.exit(close.connection(file_connection))
  writeBin(raw_vector_compressed, file_connection)
}

read_fstrds <- function(file){
  file_connection = file(file, "rb")
  on.exit(close.connection(file_connection))
  raw_vector_compressed = readBin(con = file_connection, 
                                  what = "raw", n = file.size(file))
  unserialize(fst::decompress_fst(raw_vector_compressed))
}

write_fstrds(iris, "iris.fstrds")
identical(iris, read_fstrds("iris.fstrds"))
#> [1] TRUE
file.remove("iris.fstrds")
#> [1] TRUE

Created on 2019-08-23 by the reprex package (v0.3.0)

@kendonB
Copy link
Author

kendonB commented Aug 23, 2019

Looks like this already exists: https://github.com/traversc/qs

@kendonB kendonB closed this as completed Aug 23, 2019
@MarcusKlik
Copy link
Collaborator

Hi @kendonB, thanks for reading my post and sharing your code!

Yes, the qs package is probably what you are looking for, it uses LZ4 and ZSTD compression for in-memory and on-disk serialization of general R objects.

Your code also shows the steps required to implement list columns in fst. Each list element needs to be serialized first (on the master thread) and the result can be compressed and written to disk (on background threads). Like with character columns, the master thread requirement will slow the serialization of list columns, but it would certainly be a nice feature to have (and the list columns would have full random access like the other types).

(see also #174 and #20)

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants