# Neural Networks: Data Preprocessing, Initialization, and Regularization
This notebook covers the following topics:
- Data Preprocessing
- Weight Initialization
- Batch Normalization
- Regularization


## Data Preprocessing
### Mean Subtraction
Mean subtraction is the process of subtracting the mean of each feature from the dataset, ensuring that each feature has a mean of zero. This is often done to center the data.
```python
X -= np.mean(X, axis=0)
```

### Normalization
Normalization scales the features so that they have a standard deviation of one.
```python
X /= np.std(X, axis=0)
```

### PCA and Whitening
PCA (Principal Component Analysis) projects data to a lower-dimensional space. Whitening further decorrelates the data and scales it to have unit variance.


```python
X -= np.mean(X, axis=0)
cov = np.dot(X.T, X) / X.shape[0]
U, S, V = np.linalg.svd(cov)
Xrot = np.dot(X, U)
Xwhite = Xrot / np.sqrt(S + 1e-5)
```


## Weight Initialization
### Small Random Numbers
Initialize weights with small random values to break symmetry and allow the network to learn.
```python
W = 0.01 * np.random.randn(D, H)
```

### Variance Scaling
Scale the weights by the inverse square root of the number of input units to maintain a consistent variance.
```python
W = np.random.randn(D, H) / np.sqrt(D)
```


## Batch Normalization
Batch normalization normalizes the inputs of each layer to have zero mean and unit variance.
```python
def batchnorm_forward(x, gamma, beta, eps=1e-5):
    N, D = x.shape
    mu = np.mean(x, axis=0)
    var = np.var(x, axis=0)
    x_normalized = (x - mu) / np.sqrt(var + eps)
    out = gamma * x_normalized + beta
    cache = (x, mu, var, x_normalized, gamma, beta, eps)
    return out, cache
```


## Regularization
### L2 Regularization
L2 regularization adds a penalty equal to the sum of the squared weights to the loss function.
```python
loss += 0.5 * reg * np.sum(W * W)
```

### L1 Regularization
L1 regularization adds a penalty equal to the sum of the absolute values of the weights to the loss function.
```python
loss += reg * np.sum(np.abs(W))
```

### Dropout
Dropout randomly sets a fraction of the input units to zero at each update during training time, which helps prevent overfitting.
```python
def dropout_forward(x, p=0.5, train=True):
    if train:
        mask = (np.random.rand(*x.shape) < p) / p
        out = x * mask
    else:
        out = x
    return out
```
