# Layer Normalization (from scratch)
This notebook explains LayerNorm and how `Layer_LayerNormalization` works.

LayerNorm normalizes **within each sample** (across features), so it does NOT need running statistics.


## 1) Formula
For each sample `i` (row vector):
- `mu_i = mean(x_i)` across features
- `var_i = var(x_i)` across features
- `x_hat_i = (x_i - mu_i) / sqrt(var_i + eps)`
- `y_i = gamma * x_hat_i + beta`

Compared to BatchNorm, LayerNorm behaves consistently between train and eval.


In [None]:
import numpy as np
from LayerNorm.layernorm import Layer_LayerNormalization

np.random.seed(0)
x = np.random.randn(4, 6) * 4 + 10
ln = Layer_LayerNormalization(n_features=6)
y = ln.forward(x)

print('per-sample mean:', y.mean(axis=1))
print('per-sample var :', y.var(axis=1))

## 2) Backward pass intuition
LayerNorm's backward is similar in spirit to BatchNorm but computed per-sample.
The provided `layernorm.py` implements a closed-form gradient.


## 3) Quick check: run the gradient test
The repo includes a finite-difference gradient check.


In [None]:
# In terminal, run:
# python -m LayerNorm.test_layernorm
