diff --git a/install/reproducibility.qmd b/install/reproducibility.qmd index ab98d8e..8cb916d 100644 --- a/install/reproducibility.qmd +++ b/install/reproducibility.qmd @@ -54,3 +54,104 @@ prefix <- ifelse (.Platform$OS.type == "windows", "file:///", "file://") repos <- paste0(prefix, normalizePath(snapshot, "/")) install.packages(c("V8", "mongolite"), repos = repos) ``` + +## Mirroring a universe {#mirror} + +As an alternative to snapshots, you can use [Rclone](https://rclone.org/) to mirror a universe. + +### Configuration + +[Rclone](https://rclone.org/) can bypass the R-universe zip archive API and incrementally download the individual files from a universe. +After [installing Rclone](https://rclone.org/install/), use a terminal command to configure [Rclone](https://rclone.org/) to use the R-universe [S3](https://rclone.org/s3/) API: + +```bash +rclone config create r-universe s3 \ + list_version=2 force_path_style=false \ + endpoint=https://r-universe.dev provider=Other +``` + +Then, register an individual universe as an [Rclone remote](https://rclone.org/remote_setup/). +For example, let's configure . +We run an `rclone config` command that chooses `maelle` as the universe and `maelle-universe` as the alias that future [Rclone](https://rclone.org/) commands will use: + +```bash +rclone config create maelle-universe alias remote=r-universe:maelle +``` + +`rclone config show` should now show the following contents:^[Rclone configuration is stored in an `rclone.conf` text file located at the path returned by `rclone config file`.] + +``` +[r-universe] +type = s3 +list_version = 2 +force_path_style = false +endpoint = https://r-universe.dev +provider = Other + +[maelle-universe] +type = alias +remote = r-universe:maelle +``` + +### Local downloads + +After configuration, [Rclone](https://rclone.org/) can download from the universe you configured. +The following [`rclone copy`](https://rclone.org/commands/rclone_copy/) command downloads all the package files from to a local folder called `local_folder_name`, accelerating the process with up to 8 parallel checkers and 8 parallel file transfers:^[See and for documentation on the command line arguments.] + +```bash +rclone copy maelle-universe: local_folder_name \ + --ignore-size --progress --checkers 8 --transfers 8 +``` + +The full contents are available: + +```r +fs::dir_tree("local_folder_name", recurse = FALSE) +#> local_folder_name +#> ├── bin +#> └── src +``` + +```r +fs::dir_tree("local_folder_name/src", recurse = TRUE) +#> local_folder_name/src +#> └── contrib +#> ├── PACKAGES +#> ├── PACKAGES.gz +#> ├── cransays_0.0.0.9000.tar.gz +#> ├── glitter_0.2.999.tar.gz +#> └── roblog_0.1.0.tar.gz +``` + +### Remote mirroring + +You may wish to mirror a universe remotely on, say, an [Amazon S3](https://aws.amazon.com/s3) bucket or a [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/)^[Cloudflare has its own Rclone documentation at .] bucket. +For [CloudFlare R2](https://www.cloudflare.com/developer-platform/products/r2/), you will need to give [Rclone](https://rclone.org/) the credentials of the bucket. + +```bash +rclone config create cloudflare-remote s3 \ + provider=Cloudflare \ + access_key_id=YOUR_CLOUDFLARE_ACCESS_KEY_ID \ + secret_access_key=YOUR_CLOUDFLARE_SECRET_ACCESS_KEY \ + endpoint=https://YOUR_CLOUDFLARE_ACCOUNT_ID.r2.cloudflarestorage.com \ + acl=private \ + no_check_bucket=true +``` + +Then, you can copy files directly from the universe to a bucket:^[To upload to a specific prefix inside a bucket, you can replace `cloudflare-remote:YOUR_BUCKET_NAME` with `cloudflare-remote:YOUR_BUCKET_NAME/YOUR_PREFIX`] + +```bash +rclone copy maelle-universe: cloudflare-remote:YOUR_BUCKET_NAME \ + --ignore-size --progress --checkers 8 --transfers 8 +``` + +This command downloads each package file locally from and uploads it to the bucket. +But although packages go through your local computer in transit, at no point are all packages stored locally on disk. +This makes it feasible to mirror large universes, which is why [R-multiverse](https://r-multiverse.org) uses this pattern to [create production snapshots](https://github.com/r-multiverse/staging/blob/main/.github/workflows/snapshot.yaml). + +### Partial uploads + +To only upload part of a universe, you can supply [Rclone filtering](https://rclone.org/filtering/) commands. +If you do, it is recommended to also manually edit the `PACKAGES` and `PACKAGES.gz` files in `bin/` and `src/contrib`. +`PACKAGES` is written in [Debian Control Format](https://www.debian.org/doc/debian-policy/ch-controlfields.html) (DCF), and `PACKAGES.gz` is a [`gzip`](https://www.gzip.org/) archive of `PACKAGES`. +The [`read.dcf()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/dcf.html) and [`write.dcf()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/dcf.html) functions in base R read and write DCF files, and [`R.utils::gzip()`](https://henrikbengtsson.github.io/R.utils/reference/compressFile.html) creates [`gzip`](https://www.gzip.org/) archives.