In [1]:
import torch
import torch.nn as nn

### Expected Output Behavior

* `nn.Dropout(p=0.5)` randomly sets 50% of the tensor's elements to **zero** during training.
* The remaining elements are **scaled up** by `1 / (1 - p) = 2.0` to keep the expected sum the same.


Here's **why** the remaining values after dropout are scaled by `1 / (1 - p)`:

Let’s say:

* Your input is a tensor of ones: `x = [1, 1, 1, 1]`
* Dropout probability `p = 0.5` (50% chance to zero out any element)

**Without scaling:**

* Suppose dropout randomly keeps `[1, 0, 1, 0]`
* The average dropped value is 0.5 → network gets a weaker signal

**To fix that**, PyTorch **scales the remaining elements** by `1 / (1 - p)` = `1 / 0.5` = `2.0`

So the new tensor becomes:

```
[2, 0, 2, 0]
```

Now, the **expected value stays at 1**, which matches the original input.

---

### 🧠 Intuition

For each element:

* Probability of keeping it: `1 - p`
* If we don’t scale: expected value = `(1 - p) * 1 + p * 0 = 1 - p`
* To make the expected value = 1 again, we scale by `1 / (1 - p)`

---

### Note
Dropout only works in training mode (model.train()). If you were evaluating, you'd call model.eval() and the dropout would not change the input.

In [2]:
dropout = nn.Dropout(p=0.5)

example = torch.ones(6,6)

print("Input Tensor:")
print(example)

output = dropout(example)
print("\nOutput Tensor after applying Dropout:")
print(output)

Input Tensor:
tensor([[1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.]])

Output Tensor after applying Dropout:
tensor([[2., 2., 0., 0., 2., 2.],
        [0., 0., 2., 2., 2., 2.],
        [0., 0., 2., 2., 2., 2.],
        [0., 0., 0., 0., 2., 0.],
        [2., 2., 2., 0., 2., 2.],
        [2., 2., 2., 2., 0., 0.]])
