From 113541ca37e0c02a1b31cc8db19861e8d6118af2 Mon Sep 17 00:00:00 2001 From: Will Landau <1580860+wlandau@users.noreply.github.com> Date: Tue, 24 Jun 2025 09:58:08 -0400 Subject: [PATCH 1/5] Update reproducibility.qmd --- install/reproducibility.qmd | 69 +++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/install/reproducibility.qmd b/install/reproducibility.qmd index ab98d8e..fc936b0 100644 --- a/install/reproducibility.qmd +++ b/install/reproducibility.qmd @@ -54,3 +54,72 @@ prefix <- ifelse (.Platform$OS.type == "windows", "file:///", "file://") repos <- paste0(prefix, normalizePath(snapshot, "/")) install.packages(c("V8", "mongolite"), repos = repos) ``` + +## Mirroring a universe {#mirror} + +As an alternative to snapshots, one can mirror a universe with [Rclone](https://rclone.org/). + +### Configuration + +[Rclone](https://rclone.org/) can bypass the R-universe zip archive API and incrementally download the individual files from a universe. +After [installing Rclone](https://rclone.org/install/), use a terminal command to configure [Rclone](https://rclone.org/) to use the R-universe [S3](https://rclone.org/s3/) API: + +```bash +rclone config create r-universe s3 \ + list_version=2 force_path_style=false \ + endpoint=https://r-universe.dev provider=Other +``` + +Then, register an individual universe as an [Rclone remote](https://rclone.org/remote_setup/). +For example, let's configure . +We run an `rclone config` command that chooses `maelle` as the universe and `maelle-universe` as the alias that future [Rclone](https://rclone.org/) commands will use: + +```bash +rclone config create maelle-universe alias remote=r-universe:maelle +``` + +`rclone config show` should now show the following contents:^[Rclone configuration is stored in an `rclone.conf` text file located at the path returned by `rclone config file`.] + +``` +[r-universe] +type = s3 +list_version = 2 +force_path_style = false +endpoint = https://r-universe.dev +provider = Other + +[maelle-universe] +type = alias +remote = r-universe:maelle +``` + +### Local downloads + +After configuration, [Rclone](https://rclone.org/) can download from the universe you configured. +The following [`rclone copy`](https://rclone.org/commands/rclone_copy/) command downloads all the package files from to a local folder called `local_foldder_name`: + +```bash +rclone copy maelle-universe: local_folder_name \ + --ignore-size --progress \ + --checkers 8 --transfers 8 +``` + +The full contents are available: + +```r +fs::dir_tree("local_folder_name", recurse = FALSE) +#> local_folder_name +#> ├── bin +#> └── src +``` + +```r +fs::dir_tree("local_folder_name/src", recurse = TRUE) +#> local_folder_name/src +#> └── contrib +#> ├── PACKAGES +#> ├── PACKAGES.gz +#> ├── cransays_0.0.0.9000.tar.gz +#> ├── glitter_0.2.999.tar.gz +#> └── roblog_0.1.0.tar.gz +``` From ab730acff96293d61dea1b3577b20555233d081b Mon Sep 17 00:00:00 2001 From: Will Landau <1580860+wlandau@users.noreply.github.com> Date: Tue, 24 Jun 2025 10:17:41 -0400 Subject: [PATCH 2/5] Update reproducibility.qmd --- install/reproducibility.qmd | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/install/reproducibility.qmd b/install/reproducibility.qmd index fc936b0..5d4d0c3 100644 --- a/install/reproducibility.qmd +++ b/install/reproducibility.qmd @@ -100,8 +100,7 @@ The following [`rclone copy`](https://rclone.org/commands/rclone_copy/) command ```bash rclone copy maelle-universe: local_folder_name \ - --ignore-size --progress \ - --checkers 8 --transfers 8 + --ignore-size --progress --checkers 8 --transfers 8 ``` The full contents are available: @@ -123,3 +122,32 @@ fs::dir_tree("local_folder_name/src", recurse = TRUE) #> ├── glitter_0.2.999.tar.gz #> └── roblog_0.1.0.tar.gz ``` + +### Remote mirroring + +You may wish to mirror a universe remotely on, say, an [Amazon S3](https://aws.amazon.com/s3) bucket or a [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/)^[Cloudflare has its own Rclone documentation at .] bucket. +For [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/), you will need to give [Rclone](https://rclone.org/) the credentials of the bucket. + +```bash +rclone config create cloudflare-remote s3 \ + provider=Cloudflare \ + access_key_id=YOUR_CLOUDFLARE_ACCESS_KEY_ID \ + secret_access_key=YOUR_CLOUDFLARE_SECRET_ACCESS_KEY \ + endpoint=https://YOUR_CLOUDFLARE_ACCOUNT_ID.r2.cloudflarestorage.com \ + acl=private \ + no_check_bucket=true +``` + +Then, you can copy files directly from the universe to a bucket:^[To upload to a specific prefix inside a bucket, you can replace `cloudflare-remote:YOUR_BUCKET_NAME` with `cloudflare-remote:YOUR_BUCKET_NAME/YOUR_PREFIX`] + +```bash +rclone copy maelle-universe: cloudflare-remote:YOUR_BUCKET_NAME \ + --ignore-size --progress --checkers 8 --transfers 8 +``` + +### Partial uploads + +To only upload part of a universe, you can supply [Rclone filtering](https://rclone.org/filtering/) commands. +If you do, it is recommended to also manually edit the `PACKAGES` and `PACKAGES.gz` files in `bin/` and `src/contrib`. +`PACKAGES` is written in [Debian Control Format](https://www.debian.org/doc/debian-policy/ch-controlfields.html) (DCF), and `PACKAGES.gz` is a [`gzip`](https://www.gzip.org/) archive of `PACKAGES`. +The [`read.dcf()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/dcf.html) and [`write.dcf()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/dcf.html) functions in base R read and write DCF files, and [`R.utils::gzip()`](https://henrikbengtsson.github.io/R.utils/reference/compressFile.html) creates [`gzip`](https://www.gzip.org/) archives. From 30c8735db313fc2abafb78df089fc61d4e443035 Mon Sep 17 00:00:00 2001 From: Will Landau <1580860+wlandau@users.noreply.github.com> Date: Tue, 24 Jun 2025 10:26:05 -0400 Subject: [PATCH 3/5] Update reproducibility.qmd --- install/reproducibility.qmd | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/install/reproducibility.qmd b/install/reproducibility.qmd index 5d4d0c3..8cb916d 100644 --- a/install/reproducibility.qmd +++ b/install/reproducibility.qmd @@ -57,7 +57,7 @@ install.packages(c("V8", "mongolite"), repos = repos) ## Mirroring a universe {#mirror} -As an alternative to snapshots, one can mirror a universe with [Rclone](https://rclone.org/). +As an alternative to snapshots, you can use [Rclone](https://rclone.org/) to mirror a universe. ### Configuration @@ -96,7 +96,7 @@ remote = r-universe:maelle ### Local downloads After configuration, [Rclone](https://rclone.org/) can download from the universe you configured. -The following [`rclone copy`](https://rclone.org/commands/rclone_copy/) command downloads all the package files from to a local folder called `local_foldder_name`: +The following [`rclone copy`](https://rclone.org/commands/rclone_copy/) command downloads all the package files from to a local folder called `local_folder_name`, accelerating the process with up to 8 parallel checkers and 8 parallel file transfers:^[See and for documentation on the command line arguments.] ```bash rclone copy maelle-universe: local_folder_name \ @@ -145,6 +145,10 @@ rclone copy maelle-universe: cloudflare-remote:YOUR_BUCKET_NAME \ --ignore-size --progress --checkers 8 --transfers 8 ``` +This command downloads each package file locally from and uploads it to the bucket. +But although packages go through your local computer in transit, at no point are all packages stored locally on disk. +This makes it feasible to mirror large universes, which is why [R-multiverse](https://r-multiverse.org) uses this pattern to [create production snapshots](https://github.com/r-multiverse/staging/blob/main/.github/workflows/snapshot.yaml). + ### Partial uploads To only upload part of a universe, you can supply [Rclone filtering](https://rclone.org/filtering/) commands. From e5b9dd4f013508b5c23228d52227aa6808df300b Mon Sep 17 00:00:00 2001 From: Noam Ross Date: Wed, 12 Nov 2025 19:18:08 -0500 Subject: [PATCH 4/5] Add basic info on S3 API to mirroring example --- install/reproducibility.qmd | 46 ++++++++++++++++++++++++++++++------- 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/install/reproducibility.qmd b/install/reproducibility.qmd index 8cb916d..319d1d7 100644 --- a/install/reproducibility.qmd +++ b/install/reproducibility.qmd @@ -55,14 +55,44 @@ repos <- paste0(prefix, normalizePath(snapshot, "/")) install.packages(c("V8", "mongolite"), repos = repos) ``` -## Mirroring a universe {#mirror} +### Using the S3 API {#s3} -As an alternative to snapshots, you can use [Rclone](https://rclone.org/) to mirror a universe. +R-universe also exposes a partial [S3-compatible API](https://docs.aws.amazon.com/AmazonS3/latest/API/Type_API_Reference.html) that you can use to list, download, or mirror package files. -### Configuration +In R, you can use the [{paws}](https://paws-r-sdk.github.io/) package to access the S3 API. Note that it requires setting the +`s3_virtual_address` configuration option to `TRUE` and anonymous credentials: -[Rclone](https://rclone.org/) can bypass the R-universe zip archive API and incrementally download the individual files from a universe. -After [installing Rclone](https://rclone.org/install/), use a terminal command to configure [Rclone](https://rclone.org/) to use the R-universe [S3](https://rclone.org/s3/) API: +```r +library(paws) +client <- paws::s3( + config = list( + endpoint = "https://r-universe.dev", + s3_virtual_address = TRUE + ), + credentials = list(anonymous = TRUE), + region = "any_value_here_works" +) +all_files <- client$list_objects_v2(Bucket = "jeroen") +sapply(all_files$Contents, \(x) x$Key) |> + head() +client$download_file( + Bucket = "jeroen", + Key = "src/contrib/RAppArmor_3.2.5.tar.gz", + Filename = "RAppArmor_3.2.5.tar.gz" +) + +``` + +Other tools for accessing S3-compatible APIs can also be used, such as {s3fs} or {aws.s3} in R. In the command line, the AWS CLI or [Rclone](https://rclone.org/) (see below) can be used. + + +### Example: Mirroring a universe with Rclone {#mirror} + +[R-Multiverse](https://r-multiverse.org/) uses [Rclone](https://rclone.org/) to efficiently mirror a universe, incrementally downloading only the files that have changed since the last mirror. + +#### Configuration + +After [installing Rclone](https://rclone.org/install/), use a terminal command to configure Rclone for R-Universe: ```bash rclone config create r-universe s3 \ @@ -93,9 +123,9 @@ type = alias remote = r-universe:maelle ``` -### Local downloads +#### Local downloads -After configuration, [Rclone](https://rclone.org/) can download from the universe you configured. +After configuration, Rclone can download from the universe you configured. The following [`rclone copy`](https://rclone.org/commands/rclone_copy/) command downloads all the package files from to a local folder called `local_folder_name`, accelerating the process with up to 8 parallel checkers and 8 parallel file transfers:^[See and for documentation on the command line arguments.] ```bash @@ -149,7 +179,7 @@ This command downloads each package file locally from Date: Thu, 13 Nov 2025 14:31:21 -0500 Subject: [PATCH 5/5] Address @maelle review comments --- install/reproducibility.qmd | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/install/reproducibility.qmd b/install/reproducibility.qmd index 319d1d7..77640b2 100644 --- a/install/reproducibility.qmd +++ b/install/reproducibility.qmd @@ -4,7 +4,9 @@ title: "Reproducibility" ## Does R-universe archive old versions of packages? How does it work with renv? -R-universe does not archive old versions of packages, but it **tracks the upstream git URL and commit ID** in the R package description. This allows tools like `renv` to restore packages in environments that were installed from R-universe. For more details, see this tech note: [How renv restores packages from r-universe for reproducibility or production](https://ropensci.org/blog/2022/01/06/runiverse-renv/). +R-universe does not archive old versions of packages, but it **tracks the upstream git URL and commit ID** in the R package description. +This allows tools like `renv` to restore packages in environments that were installed from R-universe. +For more details, see this tech note: [How renv restores packages from r-universe for reproducibility or production](https://ropensci.org/blog/2022/01/06/runiverse-renv/). You can also **archive fixed versions of a universe** for production or reproducibility, using what we call [repository snapshots](#snapshots). @@ -59,8 +61,8 @@ install.packages(c("V8", "mongolite"), repos = repos) R-universe also exposes a partial [S3-compatible API](https://docs.aws.amazon.com/AmazonS3/latest/API/Type_API_Reference.html) that you can use to list, download, or mirror package files. -In R, you can use the [{paws}](https://paws-r-sdk.github.io/) package to access the S3 API. Note that it requires setting the -`s3_virtual_address` configuration option to `TRUE` and anonymous credentials: +In R, you can use the [{paws}](https://paws-r-sdk.github.io/) package to access the S3 API. +Note that this requires using the _virtual addressing_ scheme, where `r-universe.dev` is the endpoint and the universe name is the bucket. ```r library(paws) @@ -70,20 +72,23 @@ client <- paws::s3( s3_virtual_address = TRUE ), credentials = list(anonymous = TRUE), - region = "any_value_here_works" + # A region is required for API compatibility, but is not used + region = "us-east-1" ) all_files <- client$list_objects_v2(Bucket = "jeroen") sapply(all_files$Contents, \(x) x$Key) |> head() client$download_file( - Bucket = "jeroen", - Key = "src/contrib/RAppArmor_3.2.5.tar.gz", + # Bucket is the universe name + Bucket = "jeroen", + # Key is the path to the file + Key = "src/contrib/RAppArmor_3.2.5.tar.gz", Filename = "RAppArmor_3.2.5.tar.gz" ) ``` -Other tools for accessing S3-compatible APIs can also be used, such as {s3fs} or {aws.s3} in R. In the command line, the AWS CLI or [Rclone](https://rclone.org/) (see below) can be used. +Outside of R, tools such as the AWS CLI or [Rclone](https://rclone.org/) (see below) can be used to access the S3 API. ### Example: Mirroring a universe with Rclone {#mirror} @@ -155,7 +160,7 @@ fs::dir_tree("local_folder_name/src", recurse = TRUE) ### Remote mirroring -You may wish to mirror a universe remotely on, say, an [Amazon S3](https://aws.amazon.com/s3) bucket or a [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/)^[Cloudflare has its own Rclone documentation at .] bucket. +You may wish to mirror a universe remotely on, say, an [Amazon S3](https://aws.amazon.com/s3) bucket or a [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/)^[Cloudflare has [its own Rclone documentation](https://developers.cloudflare.com/r2/examples/rclone/).] bucket. For [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/), you will need to give [Rclone](https://rclone.org/) the credentials of the bucket. ```bash