In [None]:
# probly Tutorial — Dropconnect Transformation 

This notebook is meant as a, practical introduction to the **Dropconnect transformation** in `probly`.
The goal is not to be mathematically perfect, but to give you an intuition.

We will slowly build up from the very basic idea of *normal* Dropout to the slightly more advanced idea of
a **Dropout transformation that makes a model uncertainty‑aware**. After that, we look at a small PyTorch
example and inspect how the transformation changes the model.

---
# Introduction to Dropconnect and the Dropconnect Transformation
---


## 1. Concept: What is Dropconnect (normal) vs Dropconnect Transformation?

To understand the DropConnect transformation, it's helpful to first compare it to the more common Dropout.
### 1.1 Normal Dropout (Recap)

Dropout is a regularization technique that works on activations. During training, it randomly sets the outputs of some neurons to zero.
 This prevents the network from relying too heavily on any single neuron.
### 1.2 Normal DropConnect
DropConnect is a similar regularization technique, but it works on weights. Instead of setting a neuron's entire output to zero, 
DropConnect randomly sets a fraction p of the individual weights within a layer to zero for each training step. 
You can imagine this as temporarily deleting connections between neurons.

This is considered a more generalized form of Dropout. Like Dropout, its main purpose during normal training is to prevent overfitting
and improve the model's robustness. At inference time `(model.eval())`, this randomness is disabled, and the model becomes deterministic.

### 1.3 DropConnect Transformation (probly)

The DropConnect transformation in `probly`takes this idea and uses it to make a model **uncertainty‑aware** at prediction time.

The transformation does the following:
 
- It walks through your PyTorch model and finds the relevant linear layers (e.g., `nn.Linear`).
- It programmatically replaces each `nn.Linear`layer with a custom `DropConnectLinear` layer.

- Crucially, this custom layer keeps the DropConnect mechanism **active during inference**.

If we now feed the same input through the transformed model multiple times, we get a cloud of slightly different predictions. The variation in this cloud is a direct measure of the model's uncertainty.

### 1.4 A Short side‑by‑side comparison

| Aspect                       | DropConnect Transformation (probly)                    | Dropout Transformation (probly)                          |
|------------------------------|--------------------------------------------------------|----------------------------------------------------------|
| What is dropped?             | Individual weights inside a layer                      | Entire activations (neuron outputs)                      |
| How it modifies the model    | Replaces `nn.Linear` with `DropConnectLinear`          | Inserts `nn.Dropout layers` before `nn.Linear`           |
| When it's active             | Intentionally in `model.eval()`                        | Intentionally in `model.eval()`                          |
| Main purpose                 | Make predictions uncertainty‑aware                     | Make predictions uncertainty‑aware           |
|Output behaviour in eval      | Stochastic (same input → slightly different outputs)   | Stochastic (same input → slightly different outputs)     |

The rest of this notebook now assumes this picture: **“normal” Dropout is a training regulariser, the
Dropout transformation turns the same mechanism into a tool for estimating uncertainty.**





## 2. Quickstart (PyTorch)

Below: build a small MLP, apply `dropconnect(model, p)`, and inspect the modified architecture.

## 3. Uncertainty via DropConnect

