## 01. Imports & Setup

Standard scientific-Python stack, plus **Ray** for distributed data/training
and **Lightning** for ergonomic model training.
Make sure your Anyscale cluster has Ray ≥ 2.48, so you get Ray Train V2 semantics.

In [None]:
# 01. Imports

# Standard Python packages for math, plotting, and data handling
import os, shutil
import json
import uuid
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import gymnasium as gym

# Ray libraries for distributed data and training
import ray
import ray.data
from ray.train.lightning import RayLightningEnvironment  # Make sure RAY_TRAIN_V2_ENABLED=1 in "Environment variables"
from ray.train import ScalingConfig, RunConfig, FailureConfig, CheckpointConfig, get_context, get_checkpoint, report, Checkpoint
from ray.train.torch import TorchTrainer

# PyTorch Lightning and base PyTorch for model definition and training
import lightning.pytorch as pl
import torch
from torch.utils.data import DataLoader
from torch import nn

### 02. Generate a Real Pendulum Dataset

Roll out a random policy for **10 000 steps**, logging:

| field | shape | description |
|-------|-------|-------------|
| `obs`          | `(3,)`  | `[cos θ, sin θ, θ̇]` |
| `noisy_action` | `(1,)`  | ground-truth action + Gaussian noise |
| `noise`        | `(1,)`  | the injected noise (supervision target) |
| `timestep`     | `()`    | random diffusion step ∈ [0, 999] |

You wrap the list of dicts in a **Ray Dataset** for automatic sharding.

In [None]:
# 02. Generate Pendulum offline dataset 

def make_pendulum_dataset(n_steps: int = 10_000):
    """
    Roll out a random policy in Pendulum-v1 and log (obs, noisy_action, noise, timestep).
    Returns a Ray Dataset ready for sharding.
    """
    env = gym.make("Pendulum-v1")
    obs, _ = env.reset(seed=0)
    data = []

    for _ in range(n_steps):
        action = env.action_space.sample().astype(np.float32)      # shape (1,)
        noise   = np.random.randn(*action.shape).astype(np.float32)
        noisy_action = action + noise                              # add Gaussian noise
        timestep = np.random.randint(0, 1000, dtype=np.int64)

        data.append(
            {
                "obs":        obs.astype(np.float32),              # shape (3,)
                "noisy_action": noisy_action,                      # shape (1,)
                "noise":        noise,                             # shape (1,)
                "timestep":     timestep,
            }
        )

        # step environment
        obs, _, terminated, truncated, _ = env.step(action)
        if terminated or truncated:
            obs, _ = env.reset()

    return ray.data.from_items(data)

ds = make_pendulum_dataset()

### 03. Normalize & Split

Pendulum states lie roughly in **[–π, π]**.  
Scale to **[–1, 1]**, then **shuffle** and split 80 / 20 into train/val shards.
All transformations execute in parallel across the Ray cluster.

In [None]:
# 03. Normalize & split (vector obs ∈ [-π, π])

# Normalize pixel values from [0, 1] to [-1, 1] for training
def normalize(batch):
    # Pendulum observations are roughly in [-π, π] → scale to [-1, 1]
    batch["obs"] = batch["obs"] / np.pi
    return batch

# Apply normalization in parallel using Ray Data
ds = ds.map_batches(normalize, batch_format="numpy")

# Count total number of items (triggers actual execution)
total = ds.count()
print("Total dataset size:", total)

# Shuffle and split dataset into 80% training and 20% validation
split_idx = int(total * 0.8)
ds = ds.random_shuffle()
train_ds, val_ds = ds.split_at_indices([split_idx])

print("Train size:", train_ds.count())
print("Val size:", val_ds.count())