You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, I try to figure out the performance of the cyphr package on large datasets. It seems that for data that cannot be well compressed (random strings), cyphr::encrypt() soon reaches some memory limits (10 M rows, 2 columns of which 1 is a long string with 500 characters). This limit seems to be independent of available system RAM and OS as I tested with (8GB, 16 GB, 32 GB on Windows 10; 170GB on Linux cluster) and has always executed saveRDS() without problem, but got an error for cyphr::encrypt(saveRDS())
In the reprex, about 3.5 GB of RAM are used according to the RStudio memory usage report and writing the unencrypted compressed RDS file takes about 3.3 GB of storage.
This reprex takes about 3 minutes to run on a normal PC.
# packages
library(cyphr)
library(stringi)
# creating a data.frame with long random stringsrows<-1E7str_len<-500#length of stringsstr_n<-1000#number of different stringsrand_strings<-stringi::stri_rand_strings(str_n, str_len)
large_data<-data.frame(
id=1:rows,
year= sample(1980:2020, size=rows, replace=TRUE),
long_str= sample(rand_strings, size=rows, replace=TRUE)
)
# To do anything we first need a key:key<-cyphr::key_sodium(sodium::keygen())
# Save large file unencrypted to figure out compressed size# saveRDS(large_data, "myfile.rds")# fs::file_size("myfile.rds")# this file is about 3.3 GB when written unencrypted to disk (standard compression of rds)# be careful, running this command will take about 3-10 minutes, before error is thrown# Save large data with encryptioncyphr::encrypt(saveRDS(large_data, "myfile_encr.rds"), key)
#> Error in encrypt(msg, key()): lange Vektoren noch nicht unterstützt: memory.c:3887# --> Error: Error in encrypt(msg, key()) : long vectors not supported yet: memory.c:3887
Thanks - I've made a PR into sodium (r-lib/sodium#22) that should fix this issue, I hope. Worth noting that running this may just run your machine out of memory though!
Currently, I try to figure out the performance of the
cyphr
package on large datasets. It seems that for data that cannot be well compressed (random strings),cyphr::encrypt()
soon reaches some memory limits (10 M rows, 2 columns of which 1 is a long string with 500 characters). This limit seems to be independent of available system RAM and OS as I tested with (8GB, 16 GB, 32 GB on Windows 10; 170GB on Linux cluster) and has always executedsaveRDS()
without problem, but got an error forcyphr::encrypt(saveRDS())
In the reprex, about 3.5 GB of RAM are used according to the RStudio memory usage report and writing the unencrypted compressed RDS file takes about 3.3 GB of storage.
This reprex takes about 3 minutes to run on a normal PC.
Created on 2022-06-10 by the reprex package (v2.0.1)
Session info
The text was updated successfully, but these errors were encountered: