# Quantum Machine Learning with PennyLane + devqubit

This notebook is a **small, self-contained demo** of how devqubit can track *quantum machine learning* experiments built with PennyLane.

We'll train a tiny **3-qubit variational classifier** (a "quantum neural network" style model) and use devqubit to:

- log **epoch-by-epoch** metrics (loss, train accuracy, test accuracy),
- capture **hyperparameters** and tags,
- run an **architecture sweep** (grouped runs),
- store **trained parameters** as an artifact,
- and compare two runs to see what **changed**.

> Why `default.qubit`?  
> It's PennyLane's standard **state-vector simulator** on CPU. With `shots=None` (the default), it returns **analytic** expectation values, which keeps this demo fast and deterministic.

---


In [None]:
from __future__ import annotations

from importlib.metadata import entry_points


def has_adapter(name: str) -> bool:
    eps = entry_points().select(group="devqubit.adapters")
    return any(ep.name == name for ep in eps)


if not has_adapter("pennylane"):
    raise ImportError(
        "devqubit Pennylane adapter is not installed.\n"
        "Install with: pip install 'devqubit[pennylane]'"
    )

print("Pennylane adapter available!")

In [None]:
from pathlib import Path
import shutil

import numpy as np
from numpy.typing import ArrayLike
import matplotlib.pyplot as plt

import pennylane as qml
from pennylane import numpy as pnp

from devqubit import (
    create_registry,
    create_store,
    track,
    wrap_backend,
    diff,
)

In [None]:
"""Setup workspace and configuration.

We use a local folder as a tiny "experiment workspace":
- devqubit *store* holds artifacts (e.g., trained parameter arrays)
- devqubit *registry* holds run metadata (params, metrics, tags, group IDs)
"""

# ---------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------

SEED: int = 42

# Data
N_SAMPLES: int = 500
NOISE_STD: float = 0.05
TEST_FRACTION: float = 0.20
VAL_FRACTION: float = 0.20

# Quantum model / simulator
DEVICE_NAME: str = "default.qubit"
N_QUBITS: int = 3
SHOTS = None  # None = analytic expectation values (no sampling noise)

# Training
N_EPOCHS: int = 50
LEARNING_RATE: float = 0.3

# Numerics / evaluation
EPS: float = 1e-7
EVAL_CHUNK_SIZE: int = 512  # chunk evaluation to avoid huge tapes

# ---------------------------------------------------------------------
# Reproducibility
# ---------------------------------------------------------------------
np.random.seed(SEED)
pnp.random.seed(SEED)

# ---------------------------------------------------------------------
# Fresh workspace (safe to re-run)
# ---------------------------------------------------------------------
WORKSPACE = Path(".devqubit_qml_demo")
if WORKSPACE.exists():
    shutil.rmtree(WORKSPACE)
WORKSPACE.mkdir(parents=True, exist_ok=True)

store = create_store(f"file://{WORKSPACE}/objects")
registry = create_registry(f"file://{WORKSPACE}")

print(f"Workspace: {WORKSPACE.resolve()}")

## What gets tracked?

devqubit focuses on the pieces you typically want when iterating on QML:

- **Run metadata**: parameters (hyperparameters), tags, and a run ID
- **Metrics over time**: loss and accuracy logged every epoch
- **Backend activity**: when a QNode executes on a device, devqubit can record that execution (via `wrap_backend`)
- **Artifacts**: any JSON/arrays you store (e.g., trained weights)

In this notebook we keep it intentionally simple: one small dataset, two tiny circuits, and a straightforward training loop.


## 1. Generate a tiny toy dataset (train / validation / test)

We create a balanced 2D dataset:

- **Class 1**: points *inside* a circle  
- **Class 0**: points *outside* the circle

Why this dataset?
- It is visually intuitive
- It is small enough to train quickly
- It makes architecture differences easy to notice


In [None]:
def generate_circle_data(n_samples: int, noise_std: float, seed: int):
    """Generate 2D binary data: points inside vs. outside a circle."""

    rng = np.random.default_rng(seed)

    # Balanced classes
    n_per_class = n_samples // 2

    # Class 1: inside circle (radius < 0.5)
    angles_in = rng.uniform(0, 2 * np.pi, n_per_class)
    radii_in = rng.uniform(0.0, 0.45, n_per_class)
    X_in = np.column_stack(
        [
            radii_in * np.cos(angles_in),
            radii_in * np.sin(angles_in),
        ]
    )

    # Class 0: outside circle (radius > 0.5)
    angles_out = rng.uniform(0, 2 * np.pi, n_per_class)
    radii_out = rng.uniform(0.6, 1.0, n_per_class)
    X_out = np.column_stack(
        [
            radii_out * np.cos(angles_out),
            radii_out * np.sin(angles_out),
        ]
    )

    X = np.vstack([X_out, X_in])
    y = np.hstack(
        [
            np.zeros(n_per_class, dtype=int),
            np.ones(n_per_class, dtype=int),
        ]
    )

    # Shuffle and add noise
    perm = rng.permutation(len(y))
    X, y = X[perm], y[perm]
    X = X + rng.normal(
        loc=0.0,
        scale=noise_std,
        size=X.shape,
    )

    return X, y


def train_val_test_split(
    X: ArrayLike,
    y: ArrayLike,
    val_frac: float,
    test_frac: float,
    seed: int = 0,
) -> tuple:
    """Randomly split X, y into train/val/test using NumPy."""

    X = np.asarray(X)
    y = np.asarray(y)
    rng = np.random.default_rng(seed)
    idx = rng.permutation(len(y))

    n_test = int(round(len(y) * test_frac))
    n_val = int(round(len(y) * val_frac))

    test_idx = idx[:n_test]
    val_idx = idx[n_test : n_test + n_val]
    train_idx = idx[n_test + n_val :]

    return (
        X[train_idx],
        X[val_idx],
        X[test_idx],
        y[train_idx],
        y[val_idx],
        y[test_idx],
    )

In [None]:
# --- Create a single dataset, then split cleanly into train/val/test ---
X, y = generate_circle_data(
    n_samples=N_SAMPLES,
    noise_std=NOISE_STD,
    seed=SEED,
)

# Split data into train, validation, and test sets
X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(
    X=X,
    y=y,
    val_frac=VAL_FRACTION,
    test_frac=TEST_FRACTION,
    seed=SEED,
)

rows = [
    ("Train", X_train, y_train),
    ("Val", X_val, y_val),
    ("Test", X_test, y_test),
]

print(f"{'Split':<5} {'N':>5} {'Class0':>7} {'Class1':>7}")
for name, X, y in rows:
    y = np.asarray(y)
    c0 = int((y == 0).sum())
    c1 = int((y == 1).sum())
    print(f"{name:<5} {len(X):>5d} {c0:>7d} {c1:>7d}")

In [None]:
# --- Visualize the dataset ---
fig, ax = plt.subplots(figsize=(5, 5))

ax.scatter(
    X_train[y_train == 0, 0],
    X_train[y_train == 0, 1],
    c="tab:blue",
    alpha=0.5,
    s=10,
    label="Class 0 (outside)",
)

ax.scatter(
    X_train[y_train == 1, 0],
    X_train[y_train == 1, 1],
    c="tab:orange",
    alpha=0.5,
    s=10,
    label="Class 1 (inside)",
)

circle = plt.Circle(
    (0, 0),
    0.5,
    fill=False,
    linestyle="--",
    color="gray",
    linewidth=1.5,
)

ax.add_patch(circle)
ax.set_xlim(-1.2, 1.2)
ax.set_ylim(-1.2, 1.2)
ax.set_aspect("equal")
ax.set_xlabel("x0")
ax.set_ylabel("x1")
ax.set_title("Training Data")
ax.legend(loc="upper right", fontsize=8)
plt.tight_layout()
plt.show()

## 2. Define two tiny quantum classifier architectures

Both models use **3 qubits** and output a single number:

- we measure <Z> on wire 0 (range `[-1, +1]`)
- later we map it into a "probability-like" value in `[0, 1]`

We compare two feature encodings:

1. **Angle encoding**  
   Encode each feature with a rotation gate (simple and common).

2. **IQP-style encoding**  
   Use an entangling feature map (includes a ZZ interaction term).

> Note: these circuits are intentionally small so the focus stays on **devqubit tracking** rather than model performance.


In [None]:
def create_reuploading_radial_classifier(n_layers: int, dev):
    """Data re-uploading classifier for the circle task."""

    @qml.qnode(dev, diff_method="backprop")
    def circuit(x, params):
        # -------------------------
        # 0) Unpack features
        # -------------------------
        x0 = x[..., 0]
        x1 = x[..., 1]
        r2 = x0**2 + x1**2  # radial feature for circle data

        for layer in range(n_layers):
            # -------------------------
            # 1) Re-upload the features
            # -------------------------
            qml.RY(pnp.pi * x0, wires=0)
            qml.RZ(pnp.pi * x1, wires=0)
            qml.RY(pnp.pi * r2, wires=0)

            # -------------------------
            # 2) Trainable "processing" unitary
            # -------------------------
            qml.Rot(
                params[layer, 0],
                params[layer, 1],
                params[layer, 2],
                wires=0,
            )

        # -------------------------
        # 3) Readout
        # -------------------------
        return qml.expval(qml.PauliZ(0))

    # Parameter tensor shape: one angle per qubit per layer
    return circuit, (n_layers, 3)


def create_iqp_encoding_classifier(n_layers: int, dev):
    """IQP-style feature map with StronglyEntanglingLayers variational classifier."""

    @qml.qnode(dev, diff_method="backprop")
    def circuit(x, params):
        # -------------------------
        # 0) Unpack features
        # -------------------------
        x0 = x[..., 0]
        x1 = x[..., 1]
        r2 = x0**2 + x1**2

        # -------------------------
        # 1) IQP-style encoding
        # -------------------------
        for w in [0, 1, 2]:
            qml.Hadamard(wires=w)

        # Single-qubit feature terms
        qml.RZ(pnp.pi * x0, wires=0)
        qml.RZ(pnp.pi * x1, wires=1)
        qml.RZ(pnp.pi * r2, wires=2)

        # Feature interaction terms (ZZ encodes products)
        qml.IsingZZ(pnp.pi * (x0 * x1), wires=[0, 1])
        qml.IsingZZ(pnp.pi * (x1 * r2), wires=[1, 2])
        qml.IsingZZ(pnp.pi * (x0 * r2), wires=[0, 2])

        # -------------------------
        # 2) Trainable variational layers (richer ansatz)
        # -------------------------
        qml.StronglyEntanglingLayers(params, wires=[0, 1, 2])

        # -------------------------
        # 3) Readout
        # -------------------------
        return qml.expval(qml.PauliZ(0))

    # Parameter tensor shape: one trainable angle per qubit per layer
    return circuit, (n_layers, 3, 3)

## 3. Train the classifier and log every epoch

We use a standard "hybrid" loop:

1. Run the quantum circuit to get a scalar output
2. Convert output into a probability-like value in `[0, 1]`
3. Compute **binary cross-entropy** loss
4. Update parameters with **Adam**
5. Log **loss**, **train accuracy**, and **test accuracy** to devqubit each epoch

The training loop below is deliberately plain Python so the tracking is easy to follow.


In [None]:
def train_classifier(
    run,
    circuit,
    param_shape,
    X_train,
    y_train,
    X_val,
    y_val,
    X_test,
    y_test,
    *,
    n_epochs: int,
    learning_rate: float,
    eps: float,
    eval_chunk_size: int,
    seed: int,
):
    """Train a quantum classifier and track metrics epoch-by-epoch."""

    # Convert to PennyLane NumPy arrays to keep math + autograd consistent.
    X_train = pnp.asarray(X_train)
    y_train = pnp.asarray(y_train)
    X_val = pnp.asarray(X_val)
    y_val = pnp.asarray(y_val)
    X_test = pnp.asarray(X_test)
    y_test = pnp.asarray(y_test)

    np.random.default_rng(seed)

    # Initialize trainable parameters
    params = pnp.random.uniform(-np.pi, np.pi, param_shape, requires_grad=True)

    # Adam is a good default optimizer for small demos
    opt = qml.AdamOptimizer(stepsize=learning_rate)

    def predict_batch(X, p):
        """Return predicted probabilities for a batch of inputs."""
        return (circuit(X, p) + 1) / 2  # <Z> in [-1,1] -> [0,1]

    def loss_fn(p, Xb, yb):
        """Binary cross-entropy on a mini-batch (vectorized)."""
        preds = pnp.clip(predict_batch(Xb, p), eps, 1 - eps)
        return -pnp.mean(yb * pnp.log(preds) + (1 - yb) * pnp.log(1 - preds))

    def accuracy(p, X, y):
        """Chunked accuracy to keep the quantum tape size reasonable."""
        correct = 0
        n = len(X)
        for i in range(0, n, eval_chunk_size):
            Xc = X[i : i + eval_chunk_size]
            yc = y[i : i + eval_chunk_size]
            preds = np.asarray(predict_batch(Xc, p))  # detach for fast thresholding
            correct += int(np.sum((preds > 0.5) == np.asarray(yc)))
        return correct / n

    best_val_acc = -1.0
    best_test_acc_at_best_val = 0.0
    best_params = params

    # Store history for plotting
    history = {"loss": [], "train_acc": [], "val_acc": [], "test_acc": []}

    for epoch in range(n_epochs):
        params, epoch_loss = opt.step_and_cost(
            lambda pp: loss_fn(pp, X_train, y_train), params
        )
        epoch_loss = float(epoch_loss)

        # Evaluate
        train_acc = accuracy(params, X_train, y_train)
        val_acc = accuracy(params, X_val, y_val)
        test_acc = accuracy(params, X_test, y_test)

        # Store for plotting
        history["loss"].append(epoch_loss)
        history["train_acc"].append(train_acc)
        history["val_acc"].append(val_acc)
        history["test_acc"].append(test_acc)

        # Log epoch metrics
        run.log_metric("loss", epoch_loss, step=epoch)
        run.log_metric("train_accuracy", train_acc, step=epoch)
        run.log_metric("val_accuracy", val_acc, step=epoch)
        run.log_metric("test_accuracy", test_acc, step=epoch)

        # Track best model *by validation accuracy*
        improved = val_acc > best_val_acc
        if improved:
            best_val_acc = val_acc
            best_test_acc_at_best_val = test_acc
            best_params = params

        print(
            f"  Epoch {epoch:2d}: loss={epoch_loss:.4f}, "
            f"train={train_acc:.2%}, val={val_acc:.2%}, test={test_acc:.2%}"
        )

    return best_params, best_val_acc, best_test_acc_at_best_val, n_epochs, history

In [None]:
"""
Train a single baseline model with full tracking.

devqubit pattern:
1) create a run with `track(...)`
2) wrap the PennyLane device with `wrap_backend(...)`
3) log params/tags
4) train while logging metrics (train/val/test)
5) store artifacts (e.g., best parameters by validation accuracy)
"""

with track(project="qml_classifier", store=store, registry=registry) as run:
    # Device (analytic expectation values by default when shots=None)
    base_dev = qml.device(DEVICE_NAME, wires=N_QUBITS, shots=SHOTS)
    tracked_dev = wrap_backend(run, base_dev)

    # Build the circuit/QNode
    BASELINE_ARCH = "angle_encoding"
    BASELINE_LAYERS = 2
    circuit, param_shape = create_reuploading_radial_classifier(
        n_layers=BASELINE_LAYERS,
        dev=tracked_dev,
    )

    # Log metadata
    run.log_params(
        {
            "architecture": BASELINE_ARCH,
            "n_layers": BASELINE_LAYERS,
            "n_qubits": N_QUBITS,
            "n_params": int(np.prod(param_shape)),
            "device": DEVICE_NAME,
            "shots": SHOTS,
            "seed": SEED,
            "learning_rate": LEARNING_RATE,
            "n_epochs": N_EPOCHS,
        }
    )
    run.set_tags({"task": "binary_classification", "status": "baseline"})

    print(f"Training baseline {BASELINE_ARCH} classifier ({BASELINE_LAYERS} layers):")
    best_params, best_val_acc, best_test_acc, epochs_trained, baseline_history = (
        train_classifier(
            run,
            circuit,
            param_shape,
            X_train,
            y_train,
            X_val,
            y_val,
            X_test,
            y_test,
            n_epochs=N_EPOCHS,
            learning_rate=LEARNING_RATE,
            eps=EPS,
            eval_chunk_size=EVAL_CHUNK_SIZE,
            seed=SEED,
        )
    )

    # Summary metrics (easy to query later)
    run.log_metrics(
        {
            "best_val_accuracy": float(best_val_acc),
            "best_test_accuracy": float(best_test_acc),
            "epochs_trained": int(epochs_trained),
        }
    )

    # Store best parameters (chosen by validation accuracy)
    run.log_json(
        name="best_params",
        obj={"params": np.asarray(best_params).tolist()},
        role="model",
    )

    baseline_id = run.run_id

print(f"\nBaseline run ID:  {baseline_id}")
print(f"Best val accuracy:  {best_val_acc:.2%}")
print(f"Test @ best val:    {best_test_acc:.2%}")

In [None]:
# --- Plot training curves ---
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

epochs = range(len(baseline_history["loss"]))

axes[0].plot(epochs, baseline_history["loss"], "o-", markersize=4)
axes[0].set_xlabel("Epoch")
axes[0].set_ylabel("Loss")
axes[0].set_title("Training Loss")
axes[0].grid(True, alpha=0.3)

axes[1].plot(epochs, baseline_history["train_acc"], "o-", markersize=4, label="Train")
axes[1].plot(epochs, baseline_history["val_acc"], "s-", markersize=4, label="Val")
axes[1].plot(epochs, baseline_history["test_acc"], "^-", markersize=4, label="Test")
axes[1].set_xlabel("Epoch")
axes[1].set_ylabel("Accuracy")
axes[1].set_title("Accuracy")
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### What you should see

During training you'll see a small per-epoch printout. In devqubit, those same values are stored as **time series metrics**.

Typical behavior on this toy dataset:

- loss decreases over epochs,
- training accuracy rises quickly,
- test accuracy usually improves and then stabilizes.

Because we're using `default.qubit` in analytic mode (`shots=None`), results are usually stable across runs with the same seed.


## 4. Architecture sweep (grouped runs)

Now we run a small sweep across:
- two architectures (`angle_encoding`, `iqp_encoding`)
- two depths (2 vs 3 layers)

devqubit groups these runs using a shared `group_id`. This makes it easy to:
- retrieve all runs from a sweep,
- compare their metrics,
- and pick the best-performing configuration.


In [None]:
"""
Sweep over architectures (grouped runs).

Each loop iteration creates a *new* devqubit run.
All runs share the same `group_id`, making it easy to query/compare them later.
"""

sweep_id = "arch_sweep_example"

architectures = [
    ("angle_encoding", create_reuploading_radial_classifier, 2),
    ("angle_encoding", create_reuploading_radial_classifier, 3),
    ("iqp_encoding", create_iqp_encoding_classifier, 2),
    ("iqp_encoding", create_iqp_encoding_classifier, 3),
]

sweep_results = []
sweep_best_params = {}  # Store best params for decision boundary plotting

print(f"Sweep group_id: {sweep_id}")
print("=" * 60)

for arch_name, builder_fn, n_layers in architectures:
    with track(
        project="qml_classifier",
        store=store,
        registry=registry,
        group_id=sweep_id,
    ) as run:
        base_dev = qml.device(
            DEVICE_NAME,
            wires=N_QUBITS,
            shots=SHOTS,
        )
        tracked_dev = wrap_backend(run, base_dev)

        circuit, param_shape = builder_fn(n_layers=n_layers, dev=tracked_dev)

        run.log_params(
            {
                "architecture": arch_name,
                "n_layers": n_layers,
                "n_qubits": N_QUBITS,
                "n_params": int(np.prod(param_shape)),
                "device": DEVICE_NAME,
                "shots": SHOTS,
                "seed": SEED,
                "learning_rate": LEARNING_RATE,
                "n_epochs": N_EPOCHS,
            }
        )
        run.set_tags({"task": "binary_classification", "status": "sweep"})

        print(f"Training {arch_name} (layers={n_layers})")

        best_params, best_val_acc, best_test_acc, epochs_trained, _ = train_classifier(
            run,
            circuit,
            param_shape,
            X_train,
            y_train,
            X_val,
            y_val,
            X_test,
            y_test,
            n_epochs=N_EPOCHS,
            learning_rate=LEARNING_RATE,
            eps=EPS,
            eval_chunk_size=EVAL_CHUNK_SIZE,
            seed=SEED,
        )

        run.log_metrics(
            {
                "best_val_accuracy": float(best_val_acc),
                "best_test_accuracy": float(best_test_acc),
                "epochs_trained": int(epochs_trained),
            }
        )

        run.log_json(
            name="best_params",
            obj={"params": np.asarray(best_params).tolist()},
            role="model",
        )

        sweep_results.append(
            (
                arch_name,
                n_layers,
                float(best_val_acc),
                float(best_test_acc),
                run.run_id,
            )
        )
        sweep_best_params[(arch_name, n_layers)] = (best_params, builder_fn)
        print(f"  -> best val={best_val_acc:.2%}, test@bestval={best_test_acc:.2%}\n")

# Pick the best run by validation accuracy (common best practice)
best = max(sweep_results, key=lambda x: x[2])

print("=" * 60)
print(f"Best by val accuracy: {best[0]} (layers={best[1]})")
print(f"  best val: {best[2]:.2%}")
print(f"  test@bestval: {best[3]:.2%}")
print(f"  run_id: {best[4]}")

In [None]:
# --- Plot sweep results comparison ---
labels = [f"{r[0]}\nL={r[1]}" for r in sweep_results]
val_accs = [r[2] for r in sweep_results]
test_accs = [r[3] for r in sweep_results]

x = np.arange(len(labels))
width = 0.35

fig, ax = plt.subplots(figsize=(8, 4))
bars1 = ax.bar(x - width / 2, val_accs, width, label="Val Accuracy")
bars2 = ax.bar(x + width / 2, test_accs, width, label="Test Accuracy")

ax.set_ylabel("Accuracy")
ax.set_title("Architecture Sweep Results")
ax.set_xticks(x)
ax.set_xticklabels(labels, fontsize=9)
ax.legend()
ax.set_ylim(0, 1.05)
ax.grid(True, axis="y", alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# --- Plot decision boundary of the best model ---
best_arch, best_layers = best[0], best[1]
best_params_final, best_builder_fn = sweep_best_params[(best_arch, best_layers)]

# Create a fresh circuit for prediction (untracked)
eval_dev = qml.device(
    DEVICE_NAME,
    wires=N_QUBITS,
    shots=SHOTS,
)
eval_circuit, _ = best_builder_fn(
    n_layers=best_layers,
    dev=eval_dev,
)

# Create grid for decision boundary
grid_res = 50
xx, yy = np.meshgrid(
    np.linspace(-1.2, 1.2, grid_res),
    np.linspace(-1.2, 1.2, grid_res),
)
grid_points = np.column_stack([xx.ravel(), yy.ravel()])

# Predict on grid
grid_preds = (np.array(eval_circuit(grid_points, best_params_final)) + 1) / 2
grid_preds = grid_preds.reshape(xx.shape)

# Plot
fig, ax = plt.subplots(figsize=(6, 5))
contour = ax.contourf(
    xx,
    yy,
    grid_preds,
    levels=20,
    cmap="RdYlBu",
    alpha=0.7,
)
ax.contour(
    xx,
    yy,
    grid_preds,
    levels=[0.5],
    colors="k",
    linewidths=2,
)
ax.scatter(
    X_test[y_test == 0, 0],
    X_test[y_test == 0, 1],
    c="tab:blue",
    s=15,
    edgecolors="k",
    linewidths=0.5,
    label="Class 0",
)
ax.scatter(
    X_test[y_test == 1, 0],
    X_test[y_test == 1, 1],
    c="tab:orange",
    s=15,
    edgecolors="k",
    linewidths=0.5,
    label="Class 1",
)
circle = plt.Circle(
    (0, 0),
    0.5,
    fill=False,
    linestyle="--",
    color="gray",
    linewidth=1.5,
)
ax.add_patch(circle)
ax.set_xlim(-1.2, 1.2)
ax.set_ylim(-1.2, 1.2)
ax.set_aspect("equal")
ax.set_xlabel("x0")
ax.set_ylabel("x1")
ax.set_title(f"Decision Boundary: {best_arch} (L={best_layers})")
ax.legend(loc="upper right", fontsize=8)
plt.colorbar(contour, ax=ax, label="P(class=1)")
plt.tight_layout()
plt.show()

**How to interpret sweep results**

When comparing architectures on small QML problems, focus on:

- **best test accuracy** (generalization),
- how sensitive results are to **depth** (2 vs 3 layers),
- and whether an encoding seems to learn the geometry of the data faster.

Because this is a toy dataset and tiny circuits, the "best" model can vary a bit.
The key takeaway is how devqubit keeps the sweep organized and comparable.


## 5. Query the registry and compare runs

devqubit stores run metadata in the registry, so you can later:

- list experiment groups (sweeps),
- list runs in a group,
- load individual run records,
- and compare runs side-by-side.

Below we do a tiny "report" that prints architecture + layers + best accuracy for each run in every group.


In [None]:
"""
Query and analyze QML runs.

We print a compact summary: for each group, list runs with their key metrics.
"""

groups = registry.list_groups()

print("QML Experiment Groups")
print("=" * 70)

for group in groups:
    print(f"\n{group['group_name']}  (group_id={group['group_id']})")
    runs_in_group = registry.list_runs_in_group(group["group_id"])

    for run_info in runs_in_group:
        rec = registry.load(run_info["run_id"])
        arch = rec.params.get("architecture", "N/A")
        layers = rec.params.get("n_layers", "N/A")

        best_val = rec.metrics.get("best_val_accuracy", None)
        best_test = rec.metrics.get("best_test_accuracy", None)

        if best_val is None:
            # older/baseline runs might have only test accuracy logged
            best_test = rec.metrics.get("best_test_accuracy", 0.0)
            print(f"  {arch} L={layers}: test(best)={best_test:.2%}")
        else:
            print(
                f"  {arch} L={layers}: val(best)={best_val:.2%} | test@bestval={best_test:.2%}"
            )

**Why grouping matters**

Grouped runs make sweeps easy to work with:

- you can retrieve *all* runs from a sweep without remembering run IDs,
- compare best metrics across candidates,
- and keep the baseline run separate from sweep runs.

This is a simple organizational feature, but it becomes critical once you run dozens (or hundreds) of experiments.


In [None]:
"""
Compare baseline to the best sweep result.

devqubit can generate a compact "diff" report:
- parameters and metrics that differ,
- and (when available) a summary of program/circuit differences.

This helps answer: *what changed between two experiments?*
"""

best_run_id = best[4]

comparison = diff(
    baseline_id,
    best_run_id,
    registry=registry,
    store=store,
)

print(comparison)

**Reading the comparison output**

You'll typically see differences like:

- **params**: architecture name, number of layers, number of parameters
- **metrics**: best test accuracy
- **program/circuit**: gate sequence changes due to different encodings or depths

The goal isn't to "prove" one model is best here - it's to show a clean, repeatable way
to track and compare QML experiments.


## Summary

In ~20 cells we demonstrated a simple QML workflow with devqubit + PennyLane:

| devqubit feature | What we used it for |
|---|---|
| **Run params + tags** | record architecture, depth, hyperparameters |
| **Epoch-wise metrics** | log loss/train acc/test acc each epoch |
| **Device wrapping** | track circuit executions via `wrap_backend` |
| **Artifacts** | store trained parameters in a JSON-friendly form |
| **Grouped runs** | keep an architecture sweep organized under one `group_id` |
| **Comparison report** | quickly see what changed between two runs |

Clean-up is optional - the `WORKSPACE` folder is where the registry and artifacts live.


- We now use a **validation split** for model selection, and report test accuracy only as a final unbiased check.


In [None]:
# Optional cleanup (keeps reruns tidy).
# If you'd like to inspect the registry/artifacts on disk, comment this out.
shutil.rmtree(WORKSPACE)
print("Workspace cleaned up.")