Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory limit error for encrypt() #51

Closed
marianschmidt opened this issue Jun 10, 2022 · 3 comments
Closed

Memory limit error for encrypt() #51

marianschmidt opened this issue Jun 10, 2022 · 3 comments

Comments

@marianschmidt
Copy link

marianschmidt commented Jun 10, 2022

Currently, I try to figure out the performance of the cyphr package on large datasets. It seems that for data that cannot be well compressed (random strings), cyphr::encrypt() soon reaches some memory limits (10 M rows, 2 columns of which 1 is a long string with 500 characters). This limit seems to be independent of available system RAM and OS as I tested with (8GB, 16 GB, 32 GB on Windows 10; 170GB on Linux cluster) and has always executed saveRDS() without problem, but got an error for cyphr::encrypt(saveRDS())

In the reprex, about 3.5 GB of RAM are used according to the RStudio memory usage report and writing the unencrypted compressed RDS file takes about 3.3 GB of storage.

This reprex takes about 3 minutes to run on a normal PC.

# packages
library(cyphr)
library(stringi)

# creating a data.frame with long random strings
rows <- 1E7
str_len <- 500 #length of strings
str_n <- 1000  #number of different strings
rand_strings <- stringi::stri_rand_strings(str_n, str_len)

large_data <- data.frame(
  id = 1:rows,
  year = sample(1980:2020, size = rows, replace = TRUE),
  long_str = sample(rand_strings, size = rows, replace = TRUE)
)

# To do anything we first need a key:
key <- cyphr::key_sodium(sodium::keygen())

# Save large file unencrypted to figure out compressed size
# saveRDS(large_data, "myfile.rds")
# fs::file_size("myfile.rds")
# this file is about 3.3 GB when written unencrypted to disk (standard compression of rds)


# be careful, running this command will take about 3-10 minutes, before error is thrown
# Save large data with encryption
cyphr::encrypt(saveRDS(large_data, "myfile_encr.rds"), key)
#> Error in encrypt(msg, key()): lange Vektoren noch nicht unterstützt: memory.c:3887

# --> Error: Error in encrypt(msg, key()) : long vectors not supported yet: memory.c:3887

Created on 2022-06-10 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2022-06-10
#>  pandoc   2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
#>  cyphr       * 1.1.2   2021-05-17 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.2.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.2.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.2.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  sodium        1.2.0   2021-10-21 [1] CRAN (R 4.2.0)
#>  stringi     * 1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble        3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/ga27jar/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@richfitz
Copy link
Member

richfitz commented Jun 10, 2022

Thanks - I've made a PR into sodium (r-lib/sodium#22) that should fix this issue, I hope. Worth noting that running this may just run your machine out of memory though!

@marianschmidt
Copy link
Author

Thanks a lot for fixing this so quickly. I really appreciate your help. Bumping the dependency to sodium >= 1.2.1 would fix this for cyphr.

@richfitz
Copy link
Member

This is on cran now (#52)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants