## Normalizing Inputs in Neural Networks

### Introduction
When training a neural network, one effective technique to speed up training is normalizing your inputs. This process involves two key steps: zero-centering the mean and normalizing the variance of the input features.

### Step-by-Step Guide to Normalizing Inputs

1. **Zero-Center the Mean**:
   - Compute the mean \(\mu\) of each feature in the training set.
     \[
     \mu = \frac{1}{m} \sum_{i=1}^{m} x_i
     \]
   - Subtract the mean \(\mu\) from each training example to center the data around zero.
     \[
     x_i = x_i - \mu
     \]

2. **Normalize the Variance**:
   - Compute the variance \(\sigma^2\) of each feature after subtracting the mean.
     \[
     \sigma^2 = \frac{1}{m} \sum_{i=1}^{m} (x_i - \mu)^2
     \]
   - Normalize each feature by dividing by the standard deviation \(\sigma\).
     \[
     x_i = \frac{x_i - \mu}{\sigma}
     \]


### Example Code in Python

```python
import numpy as np

# Example input features
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])

# Zero-center the mean
mu = np.mean(X, axis=0)
X_centered = X - mu

# Normalize the variance
sigma = np.std(X_centered, axis=0)
X_normalized = X_centered / sigma
```

### Important Tips

- **Consistent Normalization**: Use the same \(\mu\) and \(\sigma\) calculated from the training set to normalize the test set. This ensures that the training and test data are transformed consistently.
- **Why Normalize?**: Normalizing features helps in optimizing the cost function more effectively. Without normalization, features with different scales can lead to an elongated cost function, making gradient descent slow and inefficient.
- **Effect on Cost Function**: Normalized features make the cost function more symmetric and easier to optimize, allowing for larger gradient descent steps and faster convergence.


![image.png](attachment:image.png)


### Intuition Behind Normalization

- **Non-Normalized Features**: If features have vastly different ranges (e.g., one feature ranges from 1-1000 while another ranges from 0-1), the cost function contours can be very elongated. This requires smaller learning rates and many steps for gradient descent to converge.
- **Normalized Features**: When features are normalized to have zero mean and unit variance, the cost function contours become more spherical. This allows for larger learning rates and faster convergence in gradient descent.

![image.png](attachment:image.png)



### Practical Example

- **Different Scales**: If your input features have drastically different scales, normalizing is crucial. For instance, if \( x_1 \) ranges from 1-1000 and \( x_2 \) ranges from 0-1, normalizing will significantly improve optimization efficiency.
- **Similar Scales**: If the input features already have similar scales (e.g., \( x_1 \) ranges from 0-1, \( x_2 \) from -1 to 1), normalization is less critical but still beneficial and generally doesn't harm the performance.

### Conclusion

Normalizing input features by zero-centering the mean and normalizing the variance can significantly speed up the training process of neural networks. This ensures that all features are on a similar scale, making the cost function easier to optimize and allowing for faster convergence of gradient descent.