## 🖼️ 04-d1 · Generative Computer-Vision Pattern with Ray Train
In this notebook you build a **mini diffusion pipeline** on the **Food-101-Lite** dataset and run it end-to-end on an Anyscale cluster with **Ray Train V2**.

### What you’ll learn & take away  
* How to use **Ray Data** to decode and preprocess large image datasets in parallel  
* How to split and shard datasets for **distributed training** across multiple Ray workers  
* How to wrap a custom `LightningModule` with Ray Train to scale out **PyTorch code without boilerplate**  
* How to **enable fault tolerance** by saving and restoring model checkpoints with `ray.train.report()`  
* How to run training and evaluation with **no changes to your core model code** as Ray handles multi-node orchestration  
* How to generate images post-training using the same Ray-hosted environment  

### 🔢 What problem are you solving? (Diffusion as image de-noising)

You’re training a **generative model** that learns to produce realistic Red-Green-Blue (RGB) images from pure noise  
by learning how to *reverse* a noising process.

This approach builds on **de-noising diffusion models**: instead of modeling the full image distribution $p(x)$ directly,  
teach the model to reverse a *known* corruption process that gradually adds noise to clean images.

---

### Input: Images as tensors

Each training example is a 3-channel RGB image:

$$
x_0 \in [-1, 1]^{3 \times H \times W}
$$

Normalize pixel values to \[-1, 1\] and train on **Food-101-Lite**, a small 10-class subset of Food-101.

---

### Forward process: adding noise

During training, sample a timestep $t \in \{0, \dots, T{-}1\}$  
and inject Gaussian noise into the image:

$$\varepsilon \sim \mathcal{N}(0, 1), \quad x_{t} = x_0 + \varepsilon$$

The model sees $x_{t}$ and must learn to recover the corrupting noise $\varepsilon$.

---

### Training objective

Train a convolutional network $f_\theta$ to predict the noise:

$$\mathcal{L} = \mathbb{E}_{x_0, \varepsilon, t}\ \big\|f_\theta(x_{t}, t) - \varepsilon\big\|_2^2$$

This is an **Mean Squared Error (MSE) loss**, and it encourages the model to de-noise corrupted images.

---

### Reverse diffusion: sampling new images

At generation time, start from pure noise $x_T \sim \mathcal{N}(0, 1)$ and step backward:

$$x_{t} \leftarrow x_{t} - \eta \cdot f_\theta(x_{t}, t), \quad t = T{-}1, \dots, 0$$

After $T$ steps, $x_0$ is a fully generated image — a sample from the learned data distribution.

---

### Why this works

- Diffusion models sidestep unstable Generative Adversarial Network (GAN) training and can model complex, multimodal image distributions  
- The forward process stays fixed and simple (just add noise), which makes the learning problem tractable  
- At inference time, sampling becomes iterative de-noising — easy to debug, modify, and extend

### 🧭 How you’ll migrate this diffusion-policy workload to a distributed setup using Ray on Anyscale

This tutorial walks through the end-to-end process of **migrating a local image-based diffusion policy to a distributed Ray cluster running on Anyscale**.

Here’s how you make that transition:

1. **Local Joint Photographic Experts Groups (JPEG) → Distributed Ray Dataset**  
   Preprocess and store Food-101 images as Parquet, then use **Ray Data** to load and decode the dataset in parallel across the cluster. Each worker gets its own shard, streamed efficiently for GPU training.

2. **Single-GPU PyTorch → Multi-node Distributed Training**  
   Wrap your Lightning model in a Ray Train `train_loop`, then launch distributed training using **TorchTrainer** with 8 GPU workers—each operating on its own data partition with no manual coordination.

3. **Manual Checkpoints → Automatic Fault Tolerance**  
  Save a checkpoint after every epoch using `ray.train.report(checkpoint=...)`, and configure Ray to **auto-resume from the most recent checkpoint** if a job fails or you relaunch it.

4. **Manual Data Management → Declarative Scaling with Ray**  
   Instead of slicing data or managing worker processes yourself, declare your intent with `ScalingConfig`, `CheckpointConfig`, and `FailureConfig`, and let **Ray + Anyscale handle the orchestration**.

5. **Single-node Sampling → Remote Inference Tasks**  
   After training, run **reverse diffusion sampling** as Ray tasks on GPU nodes, making it easy to scale post-training inference or build a lightweight visual demo.

This pattern transforms a simple single-node PyTorch loop into a **scalable, fault-tolerant, multi-node training pipeline** with just a few lines of Ray-specific code, and it runs seamlessly on any cluster provisioned with Anyscale.
