# Rucio Playground Tutorial

This notebook walks you through setting up a complete **Rucio** test environment using Docker Compose. The playground includes 17 containers with XRootD servers, MinIO S3 storage, FTS (File Transfer Service), and a full Rucio stack for testing data replication and multi-hop transfers.

Rucio is a scientific data management framework used to organize, manage, and access large volumes of scientific data across distributed storage systems. It was originally developed for the ATLAS experiment at CERN and is now used across many scientific communities.

## What you will learn

1. **Environment setup** — generate certificates, start 17 Docker containers
2. **Storage registration** — configure XRootD and MinIO S3 as Rucio Storage Elements (RSEs)
3. **Credential management** — wire up S3 credentials for Rucio, FTS, and GFAL2
4. **Data upload** — create files, upload to RSEs, organize into datasets
5. **Replication policies** — declare where data should live using Rucio rules
6. **Automated transfers** — run the judge/conveyor pipeline to execute multi-hop transfers

## Prerequisites

- **Docker** (with Docker Compose support) installed locally
- The **Rucio dev environment** cloned (provides `etc/docker/dev/docker-compose-qt.yml`)
- This notebook should be run **from the root of the repository** so that relative paths resolve correctly

> **Note:** All Rucio CLI commands run inside Docker containers — no local Rucio or Python installation is required.

## References

- [Rucio Documentation](https://rucio.cern.ch/documentation)
- [S3 RSE Configuration](https://rucio.github.io/documentation/operator/s3_rse_config/)
- [FTS3 S3 Support](https://fts3-docs.web.cern.ch/fts3-docs/docs/s3_support.html)
- [EGI Data Transfer Tutorial](https://docs.egi.eu/users/tutorials/adhoc/data-transfer-object-storage/)
- [Rucio K8s Tutorial](https://github.com/rucio/k8s-tutorial)
- [Rucio Docker Dev Environment](https://github.com/rucio/rucio/tree/master/etc/docker/dev)

---
## Environment Overview

The Docker Compose stack spins up 17 containers. The key ones you will interact with:

| Container | Service | Role |
|-----------|---------|------|
| `dev-rucio-1` | Rucio server | CLI commands, daemons, uploads |
| `dev-minio-1` | MinIO (port 9001) | S3 storage instance 1 |
| `dev-minio-2` | MinIO (port 9002) | S3 storage instance 2 |
| `dev-fts-1` | FTS3 | File transfer service |
| XRD1, XRD2, XRD3 | XRootD servers | Grid storage (XRD3 is the multihop intermediary) |

**Credentials** used throughout (demo only): `admin` / `password`

---
## Step 0 — Start the Environment

Generate TLS certificates for the MinIO instances and start the full Docker Compose stack (Rucio server, MinIO storage nodes, FTS3 transfer service, XRootD servers).

In [None]:
%%bash
etc/certs/generate_minio12.sh

In [None]:
%%bash
docker compose --file etc/docker/dev/docker-compose-qt.yml --profile storage up -d

---
## Step 1 — Initialize Rucio

Run Rucio's built-in initialization and test suite inside the main container. This bootstraps the database schema, creates the default account (`root`), and registers the initial set of RSEs (Replica Storage Elements) and scopes.

> **Note:** This may take a few minutes as it runs the full init + test harness (`tools/run_tests.sh -ir`).

In [None]:
%%bash
docker exec -i dev-rucio-1 /bin/bash tools/run_tests.sh -ir

---
## Step 2 — Add HTTPS Protocol to XRD3 RSE

XRD3 is an XRootD storage element that is enabled for **multihop transfers**. We add an HTTPS protocol endpoint so it can act as an intermediary when transferring data to/from S3 storage (which speaks HTTPS, not XRootD native).

The `gfal.Default` implementation is used, with both LAN and WAN domains configured for read, write, delete, and third-party-copy operations.

In [None]:
%%bash
docker exec -i dev-rucio-1 /bin/bash <<END
rucio rse protocol add XRD3 --host xrd3 --scheme https --prefix //rucio --port 1096 --impl rucio.rse.protocols.gfal.Default --domain-json '{"wan": {"read": 2, "write": 2, "delete": 2, "third_party_copy_read": 2, "third_party_copy_write": 2}, "lan": {"read": 2, "write": 2, "delete": 2}}'
END

---
## Step 3 — Create MinIO S3 Buckets

Create a `rucio` bucket on each MinIO instance. MinIO provides S3-compatible object storage.

- **dev-minio-1** listens on port 9001
- **dev-minio-2** listens on port 9002

We use the MinIO Client (`mc`) inside each container to set up the alias and create the bucket. Credentials: `admin` / `password` (demo only).

In [None]:
%%bash
docker exec -i dev-minio-1 /bin/bash <<END
export MC_INSECURE=true
mc alias set local https://localhost:9001 admin password
mc admin info local
mc mb local/rucio
mc ls local/
END

In [None]:
%%bash
docker exec -i dev-minio-2 /bin/bash <<END
export MC_INSECURE=true
mc alias set local https://localhost:9002 admin password
mc admin info local
mc mb local/rucio
mc ls local/
END

---
## Step 4 — Register MinIO RSEs and Configure S3 Attributes

Register **MINIO1** and **MINIO2** as Rucio RSEs (Replica Storage Elements) with:
- S3-compatible protocol (`gfal.NoRename` — S3 does not support rename operations)
- Signed URL support (`sign_url = s3`)
- Path-style S3 URLs (`s3_url_style = path`)
- FTS endpoint for third-party copy transfers
- Infinite storage quota for the `root` account

We also define **distances** between XRD3 and the MinIO RSEs (bidirectional, distance=1) to enable the transfer mesh.

> **Understanding RSE Distance:** RSE distance determines routing for data transfers. A distance of 1 means direct transfer is possible. Setting distances between MINIO RSEs and XRD3 enables multi-hop transfers through XRD3.

Finally, the RSE S3 credentials (access key / secret key) are written to `/opt/rucio/etc/rse-accounts.cfg` keyed by RSE ID.

> **Ref:** [S3 RSE Configuration Guide](https://rucio.github.io/documentation/operator/s3_rse_config/)

In [None]:
%%bash
# Register RSEs, set attributes, define distances
docker exec -i dev-rucio-1 /bin/bash <<END
rucio rse add MINIO1
rucio rse protocol add MINIO1 --host minio1 --port 9001 --scheme https --prefix /rucio/ --impl rucio.rse.protocols.gfal.NoRename --domain-json '{"lan": {"read": 1, "write": 1, "delete": 1}, "wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}}'
rucio rse attribute add MINIO1 --key sign_url --value s3
rucio rse attribute add MINIO1 --key s3_url_style --value path
rucio rse attribute add MINIO1 --key verify_checksum --value False
rucio rse attribute add MINIO1 --key skip_upload_stat --value True
rucio rse attribute add MINIO1 --key strict_copy --value True
rucio rse attribute add MINIO1 --key fts --value https://fts:8446
rucio account limit add root --rse MINIO1 --bytes infinity

rucio rse add MINIO2
rucio rse protocol add MINIO2 --host minio2 --port 9002 --scheme https --prefix /rucio/ --impl rucio.rse.protocols.gfal.NoRename --domain-json '{"lan": {"read": 1, "write": 1, "delete": 1}, "wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}}'
rucio rse attribute add MINIO2 --key sign_url --value s3
rucio rse attribute add MINIO2 --key s3_url_style --value path
rucio rse attribute add MINIO2 --key verify_checksum --value False
rucio rse attribute add MINIO2 --key skip_upload_stat --value True
rucio rse attribute add MINIO2 --key strict_copy --value True
rucio rse attribute add MINIO2 --key fts --value https://fts:8446
rucio account limit add root --rse MINIO2 --bytes infinity

# XRD3 has HTTP enabled, link it up to our mesh
rucio rse distance add XRD3 MINIO1 --distance 1
rucio rse distance add XRD3 MINIO2 --distance 1

rucio rse distance add MINIO1 XRD3 --distance 1
rucio rse distance add MINIO2 XRD3 --distance 1
END

In [None]:
%%bash
# Write S3 credentials keyed by RSE ID
docker exec -i dev-rucio-1 /bin/bash <<'END'
ID1=$(rucio rse show MINIO1 | grep '^  id:' | awk '{print$2}')
ID2=$(rucio rse show MINIO2 | grep '^  id:' | awk '{print$2}')
cat >/opt/rucio/etc/rse-accounts.cfg <<JSON
{
  "$ID1": {
    "access_key": "admin",
    "secret_key": "password",
    "signature_version": "s3v4",
    "region": "us-east-1"
  },
  "$ID2": {
    "access_key": "admin",
    "secret_key": "password",
    "signature_version": "s3v4",
    "region": "us-east-1"
  }
}
JSON
END

---
## Step 5 — Configure FTS3 Cloud Storage Credentials

Register the MinIO S3 endpoints as cloud storage in the **FTS3** transfer service and supply the access credentials. This allows FTS3 to perform third-party-copy transfers between S3 endpoints.

> **What is FTS?** FTS (File Transfer Service) is the middleware responsible for executing actual data transfers between storage endpoints. It handles authentication, retries, checksums, and provides monitoring. Rucio delegates the physical movement of data to FTS.

We also write a `gfal2` S3 configuration file so the transfer agent knows how to authenticate.

> **Ref:** [FTS3 S3 Support](https://fts3-docs.web.cern.ch/fts3-docs/docs/s3_support.html) · [EGI Data Transfer Tutorial](https://docs.egi.eu/users/tutorials/adhoc/data-transfer-object-storage/)

In [None]:
%%bash
docker exec -i dev-fts-1 /bin/bash <<'END'
# Register cloud storage endpoints in FTS3
curl \
  --cert /etc/grid-security/hostcert.pem \
  --key /etc/grid-security/hostkey.pem \
  --capath /etc/grid-security/certificates \
  https://fts:8446/config/cloud_storage \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"storage_name":"S3:minio1"}'

curl \
  --cert /etc/grid-security/hostcert.pem \
  --key /etc/grid-security/hostkey.pem \
  --capath /etc/grid-security/certificates \
  https://fts:8446/config/cloud_storage \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"user_dn":"/CN=Rucio User","storage_name":"S3:minio1","access_token":"admin","access_token_secret":"password"}'


curl \
  --cert /etc/grid-security/hostcert.pem \
  --key /etc/grid-security/hostkey.pem \
  --capath /etc/grid-security/certificates \
  https://fts:8446/config/cloud_storage \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"storage_name":"S3:minio2"}'

curl \
  --cert /etc/grid-security/hostcert.pem \
  --key /etc/grid-security/hostkey.pem \
  --capath /etc/grid-security/certificates \
  https://fts:8446/config/cloud_storage \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{"user_dn":"/CN=Rucio User","storage_name":"S3:minio2","access_token":"admin","access_token_secret":"password"}'

# Write gfal2 S3 configuration
cat >/etc/gfal2.d/s3.conf <<INI
[S3:MINIO1]
ACCESS_KEY=admin
SECRET_KEY=password
REGION=us-east-1
ALTERNATE=true

[S3:MINIO2]
ACCESS_KEY=admin
SECRET_KEY=password
REGION=us-east-1
ALTERNATE=true
INI
END

---
## Step 6 — Create Test Files and Upload to RSEs

Generate two 10 MB random files and upload them to different RSEs:
- `file5` → **MINIO1**
- `file6` → **MINIO2**

Then create a Rucio **dataset** (`test:dataset9`) and attach both files to it. Datasets are logical containers that group related files (DIDs) together.

> **Ref:** [K8s Tutorial — Create transfer testing data](https://github.com/rucio/k8s-tutorial#create-initial-transfer-testing-data)

In [None]:
%%bash
docker exec -i dev-rucio-1 /bin/bash <<END
dd if=/dev/urandom of=file5 bs=10M count=1
dd if=/dev/urandom of=file6 bs=10M count=1

rucio upload --rse MINIO1 --scope test file5
rucio upload --rse MINIO2 --scope test file6

rucio did add --type dataset test:dataset9
rucio did content add -to test:dataset9 test:file5 test:file6
END

---
## Step 7 — Create Replication Rules

Replication rules tell Rucio **where** data should exist. Rucio's rule engine will then figure out *how* to get it there.

We create two rules:
- `test:dataset9` → 1 copy on **XRD1** (will trigger S3→XRootD transfers via multihop through XRD3)
- `test:dataset2` → 1 copy on **MINIO2** (standard test dataset replication)

In [None]:
%%bash
docker exec -i dev-rucio-1 /bin/bash <<END
rucio rule add test:dataset9 --copies 1 --rses XRD1
rucio rule add test:dataset2 --copies 1 --rses MINIO2
END

---
## Step 8 — Execute the Transfer Pipeline

Run the Rucio **conveyor pipeline** to actually execute the transfers created by the rules above. In production, these run as long-lived daemons. Here we run each stage once:

1. **`rucio-judge-evaluator`** — Evaluates replication rules and creates transfer requests
2. **`rucio-conveyor-submitter`** — Submits transfer requests to FTS3
3. **`rucio-conveyor-poller`** — Polls FTS3 for transfer completion status
4. **`rucio-conveyor-finisher`** — Finalizes completed transfers and updates the replica catalog

We list the rules before and after to observe the state change.

> **Tip:** You may need to **run this cell multiple times**. Transfers take time and the poller/finisher need to catch up with FTS3. Re-run until you see all rule states change to `OK`.

In [None]:
%%bash
docker exec -i dev-rucio-1 /bin/bash <<END
rucio rule list --account root
rucio-judge-evaluator --run-once
rucio-conveyor-submitter --run-once
rucio-conveyor-poller --run-once  --older-than 0
rucio-conveyor-finisher --run-once

rucio rule list --account root
END

---
## Inspect & Debug

Before tearing down, you can open an interactive shell inside the Rucio container to inspect the state of your transfers, replicas, and rules.

In [None]:
%%bash
# Check rule states, replica locations, and dataset contents
docker exec -i dev-rucio-1 /bin/bash <<END
echo "=== Rules ==="
rucio rule list --account root

echo ""
echo "=== Replicas for dataset9 ==="
rucio list-file-replicas test:dataset9

echo ""
echo "=== Replicas for dataset2 ==="
rucio list-file-replicas test:dataset2

echo ""
echo "=== RSE info ==="
rucio rse show MINIO1
rucio rse show MINIO2
END

---
## Teardown

Stop all containers, prune stopped containers and unused volumes to free disk space.

> **Warning:** This destroys all state. You will need to re-run from Step 0 to start again.
>
> **Volume Persistence:** `docker volume prune -f` only removes *unused* volumes. PostgreSQL database volumes may persist between runs. To fully clean up:
> ```
> docker volume rm dev_vol-ruciodb-data1 2>/dev/null
> docker volume rm dev_vol-ftsdb-mysql1 2>/dev/null
> ```

In [None]:
%%bash
docker compose --file etc/docker/dev/docker-compose-qt.yml --profile storage down
docker container prune -f
docker volume prune -f

---
## Appendix: Multi-Hop Transfer Flow

The playground demonstrates multi-hop transfers where data flows from MinIO S3 through XRD3 to XRootD destinations:

```
User creates replication rule
        │
        ▼
Judge Evaluator processes rule
        │
        ▼
Conveyor submits transfer to FTS
        │
        ▼
FTS orchestrates multi-hop transfer:
   MINIO1/2 ──(S3→HTTPS)──▶ XRD3 ──(XRootD)──▶ XRD1/2
```

XRD3 acts as the **multihop intermediary** because it speaks both HTTPS (needed for S3) and native XRootD protocol. Without the HTTPS protocol added in Step 2, Rucio would have no path between the S3 and XRootD worlds.