# 04-d1 Generative computer-vision pattern with Ray Train
This notebook builds a **mini diffusion pipeline** on the **Food-101-Lite** dataset and runs it end-to-end on an Anyscale cluster with **Ray Train V2**.

### What you learn and take away  
* How to use **Ray Data** to decode and preprocess large image datasets in parallel  
* How to split and shard datasets for **distributed training** across multiple Ray workers  
* How to wrap a custom `LightningModule` with Ray Train to scale out **PyTorch code without boilerplate**  
* How to **enable fault tolerance** by saving and restoring model checkpoints with `ray.train.report()`  
* How to run training and evaluation with **no changes to your core model code** as Ray handles multi-node orchestration  
* How to generate images post-training using the same Ray-hosted environment  

## What problem are you solving? (Diffusion as image de-noising)

You’re training a **generative model** that learns to produce realistic Red-Green-Blue (RGB) images from pure noise  
by learning how to *reverse* a noising process.

This approach builds on **de-noising diffusion models**: instead of modeling the full image distribution $p(x)$ directly,  
teach the model to reverse a *known* corruption process that gradually adds noise to clean images.

---

## Input: Images as tensors

Each training example is a 3-channel RGB image:

$$
x_0 \in [-1, 1]^{3 \times H \times W}
$$

Normalize pixel values to \[-1, 1\] and train on **Food-101-Lite**, a small 10-class subset of Food-101.

---

## Forward process: adding noise

During training, sample a timestep $t \in \{0, \dots, T{-}1\}$  
and inject Gaussian noise into the image:

$$\varepsilon \sim \mathcal{N}(0, 1), \quad x_{t} = x_0 + \varepsilon$$

The model sees $x_{t}$ and must learn to recover the corrupting noise $\varepsilon$.

---

## Training objective

Train a convolutional network $f_\theta$ to predict the noise:

$$\mathcal{L} = \mathbb{E}_{x_0, \varepsilon, t}\ \big\|f_\theta(x_{t}, t) - \varepsilon\big\|_2^2$$

This is an **Mean Squared Error (MSE) loss**, and it encourages the model to de-noise corrupted images.

---

## Reverse diffusion: sampling new images

At generation time, start from pure noise $x_T \sim \mathcal{N}(0, 1)$ and step backward:

$$x_{t} \leftarrow x_{t} - \eta \cdot f_\theta(x_{t}, t), \quad t = T{-}1, \dots, 0$$

After $T$ steps, $x_0$ is a fully generated image — a sample from the learned data distribution.

---

## Why this works

- Diffusion models sidestep unstable Generative Adversarial Network (GAN) training and can model complex, multimodal image distributions  
- The forward process stays fixed and simple (just add noise), which makes the learning problem tractable  
- At inference time, sampling becomes iterative de-noising — easy to debug, modify, and extend

### How to migrate this diffusion-policy workload to a distributed setup using Ray on Anyscale

This tutorial walks through the end-to-end process of **migrating a local image-based diffusion policy to a distributed Ray cluster running on Anyscale**.

Here’s how you make that transition:

1. **Local Joint Photographic Experts Groups (JPEG) → Distributed Ray Dataset**  
   Preprocess and store Food-101 images as Parquet, then use **Ray Data** to load and decode the dataset in parallel across the cluster. Each worker gets its own shard, streamed efficiently for GPU training.

2. **Single-GPU PyTorch → Multi-node Distributed Training**  
   Wrap your Lightning model in a Ray Train `train_loop`, then launch distributed training using **TorchTrainer** with 8 GPU workers—each operating on its own data partition with no manual coordination.

3. **Manual Checkpoints → Automatic Fault Tolerance**  
  Save a checkpoint after every epoch using `ray.train.report(checkpoint=...)`, and configure Ray to **auto-resume from the most recent checkpoint** if a job fails or you relaunch it.

4. **Manual Data Management → Declarative Scaling with Ray**  
   Instead of slicing data or managing worker processes yourself, declare your intent with `ScalingConfig`, `CheckpointConfig`, and `FailureConfig`, and let **Ray + Anyscale handle the orchestration**.

5. **Single-node Sampling → Remote Inference Tasks**  
   After training, run **reverse diffusion sampling** as Ray tasks on GPU nodes, making it easy to scale post-training inference or build a lightweight visual demo.

This pattern transforms a simple single-node PyTorch loop into a **scalable, fault-tolerant, multi-node training pipeline** with just a few lines of Ray-specific code, and it runs seamlessly on any cluster provisioned with Anyscale.


## 1. Imports and setup  
Pull in standard Python utilities, Ray (core, Data, Train, Lightning), and PyTorch Lightning.  
Make sure you set the Anyscale cluster to Ray ≥ 2.48, so you get Ray Train V2 semantics automatically enabled.

In [1]:
# 00. Runtime setup — install same deps and set env vars
import os, sys, subprocess

# Non-secret env var (safe to set here)
os.environ["RAY_TRAIN_V2_ENABLED"] = "1"

# Install Python dependencies (same pinned versions as build.sh)
subprocess.check_call([
    sys.executable, "-m", "pip", "install", "--no-cache-dir",
    "torch==2.8.0",
    "torchvision==0.23.0",
    "matplotlib==3.10.6",
    "pyarrow==14.0.2",
    "datasets==2.19.2",
    "lightning==2.5.5",
])



0

In [2]:
# 01. Imports

# Standard libraries
import os, io, json, shutil
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from PIL import Image

# Ray
import ray, ray.data
from ray.train import ScalingConfig, get_context, RunConfig, FailureConfig, CheckpointConfig, Checkpoint, get_checkpoint
from ray.train.torch import TorchTrainer
from ray.train.lightning import RayLightningEnvironment

# PyTorch / Lightning
import lightning.pytorch as pl
import torch
from torch import nn

# Dataset
from datasets import load_dataset
import pyarrow as pa
import pyarrow.parquet as pq
from tqdm import tqdm  
from torchvision.transforms import Compose, Resize, CenterCrop
import random

## 2. Load 10 % of Food-101  
Next, grab roughly 7 500 images, exactly 10 % of Food-101—using a single call to `load_dataset`. This trimmed subset trains quickly while still being large enough to demonstrate Ray’s scaling behaviour.

NOTE: skip cells 02-05 if the dataset is already downloaded, as this is the same dataset as in tutorial 04a.

In [None]:
# 02. Load 10% of food101 (~7,500 images)
ds = load_dataset("food101", split="train[:10%]") 

## 3. Resize and encode Iimages  
Preprocess each image: resize to 256 pixel, center-crop to 224 pixel (the size expected by most ImageNet models), and then convert the result to raw JPEG bytes. By storing bytes instead of full Python Imaging Library (PIL) objects, you keep the dataset compact and Parquet-friendly.

In [None]:
# 03. Resize + encode as JPEG bytes
transform = Compose([Resize(256), CenterCrop(224)])
records = []

for example in tqdm(ds, desc="Preprocessing images", unit="img"):
    try:
        img = transform(example["image"])
        buf = io.BytesIO()
        img.save(buf, format="JPEG")
        records.append({
            "image_bytes": buf.getvalue(),
            "label": example["label"]
        })
    except Exception as e:
        continue

## 4. Visual sanity check  
Before committing to hours of training, take nine random samples and plot them with their class names. This quick inspection lets you confirm that images are correctly resized and preprocessed.

In [None]:
# 04. Visualize the dataset

label_names = ds.features["label"].names  # maps int → string

samples = random.sample(records, 9)

fig, axs = plt.subplots(3, 3, figsize=(8, 8))
fig.suptitle("Sample Resized Images from food101-lite", fontsize=16)

for ax, rec in zip(axs.flatten(), samples):
    img = Image.open(io.BytesIO(rec["image_bytes"]))
    label_name = label_names[rec["label"]]
    ax.imshow(img)
    ax.set_title(label_name)
    ax.axis("off")

plt.tight_layout()
plt.show()

## 5. Persist to Parquet  
Now, write the images and labels to a Parquet file. Because Parquet is columnar, you can read just the columns you need during training, which speeds up IO---especially when multiple workers are reading in parallel under Ray.

In [None]:
# 05. Write Dataset to Parquet

output_dir = "/mnt/cluster_storage/food101_lite/parquet_256"
os.makedirs(output_dir, exist_ok=True)

table = pa.Table.from_pydict({
    "image_bytes": [r["image_bytes"] for r in records],
    "label": [r["label"] for r in records]
})
pq.write_table(table, os.path.join(output_dir, "shard_0.parquet"))

print(f"Wrote {len(records)} records to {output_dir}")

## 6. Load and decode with Ray Data  
Read the Parquet shard into a **Ray Dataset**, decode the JPEG bytes to ** Channel-Height-Width (CHW) float32 tensors**, scale to \[-1, 1\], and drop the original byte column.  
Because `decode_and_normalize` is stateless, the default **task-based** execution is perfect.

In [None]:
# 06. Load & Decode Food-101-Lite

# Path to Parquet shards written earlier
PARQUET_PATH = "/mnt/cluster_storage/food101_lite/parquet_256"

# Read the Parquet files (≈7 500 rows with JPEG bytes + label)
ds = ray.data.read_parquet(PARQUET_PATH)
print("Raw rows:", ds.count())

# Decode JPEG → CHW float32 in [‑1, 1]

def decode_and_normalize(batch_df):
    """Decode JPEG bytes and scale to [-1, 1]."""
    images = []
    for b in batch_df["image_bytes"]:
        img = Image.open(io.BytesIO(b)).convert("RGB")
        arr = np.asarray(img, dtype=np.float32) / 255.0       # H × W × 3, 0‑1
        arr = (arr - 0.5) / 0.5                               # ‑1 … 1
        arr = arr.transpose(2, 0, 1)                          # 3 × H × W (CHW)
        images.append(arr)
    return {"image": images}

# Apply in parallel
#   batch_format="pandas" → batch_df is a DataFrame, return dict of lists.
#   default task‑based compute is sufficient for a stateless function.

ds = ds.map_batches(
    decode_and_normalize,
    batch_format="pandas",
    # Use the default (task‑based) compute strategy since `decode_and_normalize` is a plain function.
    num_cpus=1,
)

# Drop the original JPEG column to save memory
if "image_bytes" in ds.schema().names:
    ds = ds.drop_columns(["image_bytes", "label"])

print("Decoded rows:", ds.count())

## 7. Shuffle and Train/Val split  
Perform a reproducible shuffle, then split 80 % / 20 % into `train_ds` and `val_ds`.  
Each split remains a first-class Ray Dataset, enabling distributed, sharded DataLoaders later on.

In [None]:
# 07. Shuffle & Train/Val Split

# Typical 80 / 20 split
TOTAL = ds.count()
train_count = int(TOTAL * 0.8)
ds = ds.random_shuffle()
train_ds, val_ds = ds.split_at_indices([train_count])
print("Train rows:", train_ds.count())
print("Val rows:",   val_ds.count())

## 8. Pixel diffusion LightningModule  
A minimal **de-noising diffusion** policy:  
* Input = noisy image + scalar timestep (packed as a 4-channel tensor)  
* Output = predicted noise ϵ  
Log per-epoch losses and save them to a shared JSON so every worker can later plot global curves.

In [None]:
# 08. Pixel De-noising Diffusion Model

class PixelDiffusion(pl.LightningModule):
    """Tiny CNN that predicts noise ϵ given noisy image + timestep."""

    def __init__(self, max_t=1000, log_path=None):
        super().__init__()
        self.max_t = max_t
        self.log_path = log_path or "/mnt/cluster_storage/generative_cv/epoch_metrics.json"

        # Network: (3 + 1)‑channel input → 3‑channel noise prediction
        self.net = nn.Sequential(
            nn.Conv2d(4, 32, 3, padding=1), nn.ReLU(),
            nn.Conv2d(32, 32, 3, padding=1), nn.ReLU(),
            nn.Conv2d(32, 3, 3, padding=1),
        )
        self.loss_fn = nn.MSELoss()
        self._train_losses, self._val_losses = [], []

    # ---------- forward ----------
    def forward(self, noisy_img, t):
        """noisy_img: Bx3xHxW,  t: B (int) or Bx1 scalar"""
        b, _, h, w = noisy_img.shape
        t_scaled = (t / self.max_t).view(-1, 1, 1, 1).float().to(noisy_img.device)
        t_img = t_scaled.expand(-1, 1, h, w)
        x = torch.cat([noisy_img, t_img], dim=1)  # 4 channels
        return self.net(x)
    
    # ---------- training / validation steps ----------
    def _shared_step(self, batch):
        clean = batch["image"].to(self.device)             # Bx3xHxW, ‑1…1
        noise = torch.randn_like(clean)                    # ϵ ~ N(0, 1)
        t = torch.randint(0, self.max_t, (clean.size(0),), device=self.device)
        noisy = clean + noise                              # x_t = x_0 + ϵ
        pred_noise = self(noisy, t)
        return self.loss_fn(pred_noise, noise)

    def training_step(self, batch, batch_idx):
        loss = self._shared_step(batch)
        self._train_losses.append(loss.item())
        return loss

    def validation_step(self, batch, batch_idx):
        loss = self._shared_step(batch)
        self._val_losses.append(loss.item())
        return loss

    # ---------- epoch end logging ----------
    def on_train_epoch_end(self):
        rank = get_context().get_world_rank()
        if rank == 0:
            train_avg = np.mean(self._train_losses)
            val_avg   = np.mean(self._val_losses) if self._val_losses else None
            if val_avg is not None:
                print(f"[Epoch {self.current_epoch}] train={train_avg:.4f}  val={val_avg:.4f}")
            else:
                print(f"[Epoch {self.current_epoch}] train={train_avg:.4f}  val=N/A")

            # Append to shared JSON so you can plot later
            if os.path.exists(self.log_path):
                with open(self.log_path, "r") as f: logs = json.load(f)
            else:
                logs = []
            logs.append({"epoch": self.current_epoch+1, "train_loss": train_avg, "val_loss": val_avg})
            with open(self.log_path, "w") as f: json.dump(logs, f)

        # Clear per‑epoch trackers
        self._train_losses.clear(); self._val_losses.clear()

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=2e-4)

## 9. Ray Train `train_loop` (checkpoint and resume)  
Core training logic run **once per Ray worker**:  
1. Shard-aware DataLoaders with `get_dataset_shard`.  
2. Auto-resume from the latest Ray Checkpoint (if present).  
3. Manual per-epoch checkpointing: save `model.pt` and `meta.pt`, then call `report(metrics, checkpoint=…)`.  
This makes the run fully **fault-tolerant**. If a worker crashes, Ray restarts the group and re-enters the loop with the latest checkpoint.

In [None]:
# 09. Train loop for Ray TorchTrainer

def train_loop(config):
    """Ray Train per-worker function with checkpointing and resume support."""
    import os, torch, uuid, json
    from ray.train import get_checkpoint, get_context, report, Checkpoint

    # Paths
    LOG_PATH = "/mnt/cluster_storage/generative_cv/epoch_metrics.json"
    CKPT_ROOT = "/mnt/cluster_storage/generative_cv/food101_diffusion_ckpts"

    rank = get_context().get_world_rank()
    if rank == 0:
        os.makedirs(CKPT_ROOT, exist_ok=True)
        if not get_checkpoint() and os.path.exists(LOG_PATH):
            os.remove(LOG_PATH)

    # Data
    train_ds = ray.train.get_dataset_shard("train")
    val_ds   = ray.train.get_dataset_shard("val")
    train_loader = train_ds.iter_torch_batches(batch_size=32)
    val_loader   = val_ds.iter_torch_batches(batch_size=32)

    # Model
    model = PixelDiffusion()
    start_epoch = 0

    # Resume from checkpoint if present
    ckpt = get_checkpoint()
    if ckpt:
        with ckpt.as_directory() as d:
            model.load_state_dict(torch.load(os.path.join(d, "model.pt"), map_location="cpu"))
            start_epoch = torch.load(os.path.join(d, "meta.pt")).get("epoch", 0) + 1
        if rank == 0:
            print(f"[Rank {rank}] Resumed from checkpoint at epoch {start_epoch}")

    # Trainer
    trainer = pl.Trainer(
        max_epochs=config.get("epochs", 10),
        accelerator="gpu" if torch.cuda.is_available() else "cpu",
        devices=1,
        plugins=[RayLightningEnvironment()],
        enable_progress_bar=False,
        check_val_every_n_epoch=1,
    )

    # Train loop: run each epoch, checkpoint manually
    for epoch in range(start_epoch, config.get("epochs", 10)):
        trainer.fit_loop.max_epochs = epoch + 1
        trainer.fit_loop.current_epoch = epoch
        trainer.fit(model, train_dataloaders=train_loader, val_dataloaders=val_loader)

        if rank == 0:
            # Save model checkpoint
            out_dir = os.path.join(CKPT_ROOT, f"epoch_{epoch}_{uuid.uuid4().hex}")
            os.makedirs(out_dir, exist_ok=True)
            torch.save(model.state_dict(), os.path.join(out_dir, "model.pt"))
            torch.save({"epoch": epoch}, os.path.join(out_dir, "meta.pt"))
            ckpt_out = Checkpoint.from_directory(out_dir)
        else:
            ckpt_out = None

        # Report with checkpoint so Ray saves it
        report({"epoch": epoch}, checkpoint=ckpt_out)

## 10. Launch distributed Training with TorchTrainer  
Ask for **8 GPU workers**, keep the five most-recent checkpoints, and allow up to three automatic retries.  
`result.checkpoint` captures the checkpoint from the highest epoch (because you used `epoch` as the score attribute---you can change this to other metrics such as validation loss or training loss).

In [None]:
# 10. Launch distributed training

trainer = TorchTrainer(
    train_loop,
    scaling_config=ScalingConfig(num_workers=8, use_gpu=True),
    datasets={"train": train_ds, "val": val_ds},
    run_config=RunConfig(
        name="food101_diffusion_ft",
        storage_path="/mnt/cluster_storage/generative_cv/food101_diffusion_results",
        checkpoint_config=CheckpointConfig(
            checkpoint_frequency=1,
            num_to_keep=5,
            checkpoint_score_attribute="epoch",
            checkpoint_score_order="max",
        ),
        failure_config=FailureConfig(max_failures=3),
    ),
)

result = trainer.fit()
print("Training complete →", result.metrics)
best_ckpt = result.checkpoint  # checkpoint from highest reported epoch (you can change score attr)

## 11. Plot loss curves  
Parse the JSON written by `PixelDiffusion.on_train_epoch_end`, convert to a DataFrame, and render train versus val MSE loss.  
This is a good practice for quick health checks without external tooling.

**Why is validation loss lower than training loss?**  
You measure training loss *before* weights update and include fresh noise every step, while validation runs in `eval()` mode with no gradient updates, often making it slightly lower, especially early in training.  
This is normal behavior in this sort of scenario and usually means the model is generalizing well, and not over-fitting.


In [None]:
# 11. Plot train/val loss curves

LOG_PATH = "/mnt/cluster_storage/generative_cv/epoch_metrics.json"
with open(LOG_PATH, "r") as f:
    logs = json.load(f)

df = pd.DataFrame(logs)
df["val_loss"] = pd.to_numeric(df["val_loss"], errors="coerce")

plt.figure(figsize=(7,4))
plt.plot(df["epoch"], df["train_loss"], marker="o", label="Train")
plt.plot(df["epoch"], df["val_loss"],   marker="o", label="Val")
plt.xlabel("Epoch"); plt.ylabel("MSE Loss"); plt.title("Pixel Diffusion - Loss per Epoch")
plt.grid(True); plt.legend(); plt.tight_layout(); plt.show()

## 12. Resume from latest checkpoint  
Calling `trainer.fit()` again detects the run snapshot, loads the latest checkpoint, and (because `epochs=10`) exits immediately, proving that the resume path works.

In [None]:
# 12. Run the trainer again to demonstrate resuming from latest checkpoint  

result = trainer.fit()
print("Training complete →", result.metrics)

## 13. Reverse diffusion sampler  
A simple Euler-style loop that starts from Gaussian noise and iteratively subtracts the model’s predicted noise.  
This isn't production-grade sampling, but it's suitable for illustrating inference after training.

In [None]:
# 13. Reverse diffusion sampling

def sample_image(model, steps=50, device="cpu"):
    """Generate an image by iteratively de-noising random noise."""
    model.eval()
    with torch.no_grad():
        img = torch.randn(1, 3, 224, 224, device=device)
        for step in reversed(range(steps)):
            t = torch.tensor([step], device=device)
            pred_noise = model(img, t)
            img = img - pred_noise * 0.1                      # simple Euler update
        # Rescale back to [0,1]
        img = torch.clamp((img * 0.5 + 0.5), 0.0, 1.0)
        return img.squeeze(0).cpu().permute(1,2,0).numpy()

## 14. Generate and display samples from the **best checkpoint**  
Load the model weights from `best_ckpt`, move to GPU if available, generate three images, and show them side-by-side.  
Remember: with a tiny CNN and only 10 epochs, these samples look noise-like. If you replace the backbone or train longer, you expect to see better quality.


In [None]:
# 14. Generate and display samples

# Load model from Ray Train checkpoint
from ray.train import Checkpoint

assert best_ckpt is not None, "Checkpoint is missing. Did training run and complete?"

with best_ckpt.as_directory() as ckpt_dir:
    model = PixelDiffusion()
    model.load_state_dict(torch.load(os.path.join(ckpt_dir, "model.pt"), map_location="cpu"))

model = model.to("cuda" if torch.cuda.is_available() else "cpu")

# Generate three images
samples = [sample_image(model, steps=50, device=model.device) for _ in range(3)]

fig, axs = plt.subplots(1, 3, figsize=(9, 3))
for ax, img in zip(axs, samples):
    ax.imshow(img)
    ax.axis("off")
plt.suptitle("Food‑101 Diffusion Samples (unconditional)")
plt.tight_layout()
plt.show()

## 15. Clean up shared storage  
Reclaim cluster disk space by deleting the entire tutorial output directory.  
Run this only when you’re **sure** you don’t need the checkpoints or metrics anymore.

In [None]:
# 15. Cleanup -- delete checkpoints and metrics from model training

TARGET_PATH = "/mnt/cluster_storage/generative_cv"

if os.path.exists(TARGET_PATH):
    shutil.rmtree(TARGET_PATH)
    print(f"✅ Deleted everything under {TARGET_PATH}")
else:
    print(f"⚠️ Path does not exist: {TARGET_PATH}")

## Wrap up and next steps

In this tutorial, you used **Ray Train and Ray Data on Anyscale** to scale a compact diffusion-policy workload, from raw JPEG bytes to distributed training and sampling, without changing the core PyTorch logic. You should now feel confident:

* Using **Ray Data** to decode, normalize, and shard large image datasets in parallel  
* Scaling training across multiple GPUs using **TorchTrainer** and a Ray-native `train_loop`  
* Managing distributed training state with **Ray Checkpoints** and automatic resume  
* Running fault-tolerant multi-node jobs on Anyscale without orchestration scripts  
* Performing post-training sampling or evaluation using **Ray tasks** on GPU workers


---

### Where can you take this next?

Below are a few directions you might explore to adapt or extend the pattern:

1. **Backbones and architecture upgrades**  
   * Swap in a larger ResNet or another vision model for much better generative performance.  
   * Try pre-trained encoders and fine-tune only the diffusion-specific layers.

2. **Conditional diffusion**  
   * Use the `label` column to condition the model (for example, class-conditioning).  
   * Compare unconditional versus conditional generation side by side.

3. **Sampling improvements**  
   * Replace naive reverse diffusion with De-noising Diffusion Implicit Models (DDIM), Pseudo Numerical Methods for Diffusion Models (PNDM), or learned de-noisers.  
   * Add timestep embeddings or noise schedules to increase model expressiveness.

4. **Longer training and mixed precision**  
   * Increase the `max_epochs` and enable Automatic Mixed Precision (AMP) for faster training with less memory.  
   * Visualize convergence and training stability across longer runs.

5. **Hyperparameter sweeps**  
   * Use **Ray Tune** to search over learning rates, model size, or sampling steps.  
   * Leverage Tune’s reporting to schedule early stopping or checkpoint pruning.

6. **Data handling and scaling**  
   * Shard the dataset into multiple Parquet files and distribute across more workers.  
   * Store and load datasets from S3 or other cloud storage.

7. **Image quality evaluation**  
   * Log Fréchet Inception Distance (FID) scores, perceptual similarity, or diffusion-specific metrics.  
   * Compare generated samples from different checkpoints or backbones.

8. **Model serving**  
   * Package the reverse sampler into a Ray task or **Ray Serve** endpoint.  
   * Run a demo app that generates images on demand from a class name or random seed.

9. **End-to-end MLOps**  
   * Register the best checkpoint with MLflow or Weights & Biases.  
   * Wrap the training loop in a Ray Job and run it on a schedule with Anyscale.