-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to encrypt arrow
files using the cyphr
package?
#50
Comments
Hi @marianschmidt; the assumption made by Options here are:
|
Hi @richfitz; Thanks a lot for your prompt reply and sharing possible solutions.
# packages
library(cyphr)
library(arrow)
#>
#> Attache Paket: 'arrow'
#> Das folgende Objekt ist maskiert 'package:utils':
#>
#> timestamp
# To do anything we first need a key:
key <- cyphr::key_sodium(sodium::keygen())
# Register new method for arrow::write_dataset()
cyphr::rewrite_register("arrow", "write_dataset", "path")
ls(cyphr:::db)
#> [1] "arrow::write_dataset" "base::load" "base::readLines"
#> [4] "base::readRDS" "base::save" "base::saveRDS"
#> [7] "base::writeLines" "readxl::read_excel" "readxl::read_xls"
#> [10] "readxl::read_xlsx" "utils::read.csv" "utils::read.csv2"
#> [13] "utils::read.delim" "utils::read.delim2" "utils::read.table"
#> [16] "utils::write.csv" "utils::write.csv2" "utils::write.table"
#> [19] "writexl::write_xlsx"
# arrow::write_dataset() without encryption is working
# both for partitioned and unpartitioned parquet files
arrow::write_dataset(iris, "myfile_arrow_part", partitioning = c("Species"))
list.files("myfile_arrow_part", recursive = TRUE)
#> [1] "Species=setosa/part-0.parquet" "Species=versicolor/part-0.parquet"
#> [3] "Species=virginica/part-0.parquet"
arrow::write_dataset(iris, "myfile_arrow")
list.files("myfile_arrow")
#> [1] "part-0.parquet"
# Trying to encrypt with cyphr results in error message of denied permissions
cyphr::encrypt(write_dataset(iris, "myfile_encrypt_part", partitioning = c("Species")),
key)
#> Warning in file(con, "rb"): kann Datei 'C:
#> \Users\ga27jar\AppData\Local\Temp\RtmpyED7sT\myfile_encrypt_part20f83dde1b0f'
#> nicht öffnen: Permission denied
#> Error in file(con, "rb"): kann Verbindung nicht öffnen
#> Warning in file.remove(paths[ok]): kann Datei 'C:
#> \Users\ga27jar\AppData\Local\Temp\RtmpyED7sT\myfile_encrypt_part20f83dde1b0f'
#> nicht löschen. Grund 'Permission denied'
# This problem persists for writing small data without portioning
cyphr::encrypt(write_dataset(iris, "myfile_encrypt"),
key)
#> Warning in file(con, "rb"): kann Datei 'C:
#> \Users\ga27jar\AppData\Local\Temp\RtmpyED7sT\myfile_encrypt20f844c072c' nicht
#> öffnen: Permission denied
#> Error in file(con, "rb"): kann Verbindung nicht öffnen
#> Warning in file.remove(paths[ok]): kann Datei 'C:
#> \Users\ga27jar\AppData\Local\Temp\RtmpyED7sT\myfile_encrypt20f844c072c' nicht
#> löschen. Grund 'Permission denied' Created on 2022-06-10 by the reprex package (v2.0.1) Session infosessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.3 (2022-03-10)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2022-06-10
#> pandoc 2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> arrow * 8.0.0 2022-05-09 [1] CRAN (R 4.1.3)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.2)
#> bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.2)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.2)
#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.1.3)
#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.3)
#> cyphr * 1.1.2 2021-05-17 [1] CRAN (R 4.1.2)
#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.2)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.1.3)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.2)
#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.3)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.2)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.2)
#> knitr 1.39 2022-04-26 [1] CRAN (R 4.1.3)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.3)
#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.2)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.2)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.2)
#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.3)
#> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.1.3)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> sodium 1.2.0 2021-10-21 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.2)
#> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.1.3)
#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.1.3)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.2)
#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.1.3)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
#> xfun 0.31 2022-05-10 [1] CRAN (R 4.1.3)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
#>
#> [1] C:/Users/ga27jar/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.3/library
#>
#> ------------------------------------------------------------------------------ |
Hi, I have been experimenting with the cyphr package and have hit the memory limit with large .RData files. As an alternative, the arrow package offers partitioning of large data when writing files. I tried to create a new method for
arrow::write_dataset()
, but when usingcyphr::encrypt()
, it results in an error message of denied permissions (using any other build-in write functions of cyphr however works). A reprex with iris below.Created on 2022-06-09 by the reprex package (v2.0.1)
Session info
The text was updated successfully, but these errors were encountered: