In [None]:
# probly Tutorial — Dropconnect Transformation 

This notebook is meant as a, practical introduction to the **Dropconnect transformation** in `probly`.
The goal is not to be mathematically perfect, but to give you an intuition.

We will slowly build up from the very basic idea of *normal* Dropout to the slightly more advanced idea of
a **Dropout transformation that makes a model uncertainty‑aware**. After that, we look at a small PyTorch
example and inspect how the transformation changes the model.

---
# Introduction to Dropconnect and the Dropconnect Transformation
---


## 1. Concept: What is Dropconnect (normal) vs Dropconnect Transformation?

To understand the DropConnect transformation, it's helpful to first compare it to the more common Dropout.
### 1.1 Normal Dropout (Recap)

Dropout is a regularization technique that works on activations. During training, it randomly sets the outputs of some neurons to zero.
 This prevents the network from relying too heavily on any single neuron.
### 1.2 Normal DropConnect
DropConnect is a similar regularization technique, but it works on weights. Instead of setting a neuron's entire output to zero, 
DropConnect randomly sets a fraction p of the individual weights within a layer to zero for each training step. 
You can imagine this as temporarily deleting connections between neurons.

This is considered a more generalized form of Dropout. Like Dropout, its main purpose during normal training is to prevent overfitting
and improve the model's robustness. At inference time `(model.eval())`, this randomness is disabled, and the model becomes deterministic.

### 1.3 DropConnect Transformation (probly)

The DropConnect transformation in `probly`takes this idea and uses it to make a model **uncertainty‑aware** at prediction time.

The transformation does the following:
 
- It walks through your PyTorch model and finds the relevant linear layers (e.g., `nn.Linear`).
- It programmatically replaces each `nn.Linear`layer with a custom `DropConnectLinear` layer.

- Crucially, this custom layer keeps the DropConnect mechanism **active during inference**.

If we now feed the same input through the transformed model multiple times, we get a cloud of slightly different predictions. The variation in this cloud is a direct measure of the model's uncertainty.

### 1.4 A Short side‑by‑side comparison

| Aspect                       | DropConnect Transformation (probly)                    | Dropout Transformation (probly)                          |
|------------------------------|--------------------------------------------------------|----------------------------------------------------------|
| What is dropped?             | Individual weights inside a layer                      | Entire activations (neuron outputs)                      |
| How it modifies the model    | Replaces `nn.Linear` with `DropConnectLinear`          | Inserts `nn.Dropout layers` before `nn.Linear`           |
| When it's active             | Intentionally in `model.eval()`                        | Intentionally in `model.eval()`                          |
| Main purpose                 | Make predictions uncertainty‑aware                     | Make predictions uncertainty‑aware           |
|Output behaviour in eval      | Stochastic (same input → slightly different outputs)   | Stochastic (same input → slightly different outputs)     |

The rest of this notebook now assumes this picture: **“normal” Dropout is a training regulariser, the
Dropout transformation turns the same mechanism into a tool for estimating uncertainty.**





## 2. Quickstart (PyTorch)

Below: build a small MLP, apply `dropconnect(model, p)`, and inspect the modified architecture.

In [1]:
import torch
from torch import nn

from probly.transformation import dropconnect


def build_mlp(in_dim: int = 10, hidden: int = 32, out_dim: int = 1) -> nn.Sequential:
    return nn.Sequential(
        nn.Linear(in_dim, hidden),
        nn.ReLU(),
        nn.Linear(hidden, hidden),
        nn.ReLU(),
        nn.Linear(hidden, out_dim),
    )


p = 0.2  # dropconnect probability

model = build_mlp()
print("Original model:\n", model)

model_dc = dropconnect(model, p)
print(f"\nWith DropConnect transformation (p={p:.2f}):\n", model_dc)

Original model:
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): Linear(in_features=32, out_features=1, bias=True)
)

With DropConnect transformation (p=0.20):
 Sequential(
  (0): Linear(in_features=10, out_features=32, bias=True)
  (1): ReLU()
  (2): DropConnectLinear(in_features=32, out_features=32, bias=True)
  (3): ReLU()
  (4): DropConnectLinear(in_features=32, out_features=1, bias=True)
)


### Notes on the structure

Notice that each `Linear` layer has been replaced by a `DropConnectLinear` layer.

The overall architecture (`Sequential`, `ReLU`) remains the same, but the core linear modules are now uncertainty-aware.

## 3. Uncertainty via DropConnect

To obtain predictive uncertainty, we run multiple stochastic forward passes (with DropConnect active) and compute the mean and variance of the predictions. The process is identical to MC-Dropout.

In [None]:
# Toy regression data
torch.manual_seed(0)
n = 128
X = torch.randn(n, 10)
true_w = torch.randn(10, 1)
y = X @ true_w + 0.1 * torch.randn(n, 1)

# Build and transform the model
model = build_mlp(in_dim=10, hidden=64, out_dim=1)
model_dc = dropconnect(model, p=0.2)

# Simple training loop
opt = torch.optim.Adam(model_dc.parameters(), lr=1e-2)
loss_fn = nn.MSELoss()

model_dc.train() # Activate DropConnect for training
for _step in range(200):
    opt.zero_grad()
    pred = model_dc(X)
    loss = loss_fn(pred, y)
    loss.backward()
    opt.step()


# MC prediction function
@torch.no_grad()
def mc_predict(
    model_with_dropconnect: nn.Module,
    inputs: torch.Tensor,
    n_samples: int = 50,
) -> tuple[torch.Tensor, torch.Tensor]:
    # Activate training mode to enable the stochasticity of DropConnect
    model_with_dropconnect.train()
    preds = []
    for _ in range(n_samples):
        preds.append(model_with_dropconnect(inputs).detach())
    stacked = torch.stack(preds, dim=0)  # [n_samples, N, out_dim]
    mean = stacked.mean(dim=0)
    var = stacked.var(dim=0, unbiased=False)
    return mean, var


mean_pred, var_pred = mc_predict(model_dc, X[:5], n_samples=100)
print("Predictive mean (first 5):\n", mean_pred.squeeze())
print("\nPredictive variance (first 5):\n", var_pred.squeeze())