# Introduction to Neural Networks

---

## 1. Neural Networks Overview

A **Neural Network (NN)** is a computational model inspired by the human brain, designed to recognize patterns and solve complex tasks such as classification, regression, and function approximation. Neural networks consist of layers of interconnected nodes (neurons), where each node performs a simple computation.

**Key Components:**
- **Input layer:** Receives the features of the data.  
- **Hidden layers:** Intermediate layers that perform transformations on inputs.  
- **Output layer:** Produces the final prediction or classification.

---

## 2. Perceptrons

A **Perceptron** is the simplest type of neural network, representing a single neuron. It performs a **weighted sum of inputs** followed by an **activation function**.

Mathematical representation:

\[
y = f\left(\sum_{i=1}^{n} w_i x_i + b\right)
\]

Where:  
- \(x_i\) = input features  
- \(w_i\) = weights  
- \(b\) = bias  
- \(f\) = activation function  
- \(y\) = output  

**Perceptrons** are mainly used for **binary classification tasks**.  

---

## 3. Activation Functions

Activation functions introduce **non-linearity** in the network, allowing it to learn complex patterns.

| Function | Formula | Range | Use Case |
|----------|---------|-------|----------|
| **Step** | \(f(x) = 1 \text{ if } x \ge 0 \text{ else } 0\) | 0 or 1 | Early perceptrons, binary classification |
| **Sigmoid** | \(f(x) = \frac{1}{1+e^{-x}}\) | 0 to 1 | Probabilities, output layer for binary classification |
| **Tanh** | \(f(x) = \tanh(x)\) | -1 to 1 | Hidden layers, zero-centered output |
| **ReLU** | \(f(x) = \max(0, x)\) | 0 to ∞ | Hidden layers, efficient training |
| **Leaky ReLU** | \(f(x) = x \text{ if } x>0 \text{ else } 0.01x\) | -∞ to ∞ | Avoids dying ReLU problem |

---

## 4. Loss Functions

Loss functions measure how well a neural network's predictions match the actual target values. Minimizing the loss guides the network to learn optimal weights.

**Common Loss Functions:**

| Loss Function | Formula | Use Case |
|---------------|--------|----------|
| **Mean Squared Error (MSE)** | \( \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \) | Regression tasks |
| **Mean Absolute Error (MAE)** | \( \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \) | Regression, robust to outliers |
| **Binary Cross-Entropy** | \( -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)] \) | Binary classification |
| **Categorical Cross-Entropy** | \( -\sum_{i} y_i \log(\hat{y}_i) \) | Multi-class classification |

---

## 5. Backpropagation

**Backpropagation** is the algorithm used to **train neural networks** by updating weights to minimize the loss function. It involves two main steps:

1. **Forward Pass:** Compute outputs of the network layer by layer and calculate the loss.  
2. **Backward Pass:** Propagate the error backward through the network using the **chain rule of calculus** to compute gradients of the loss with respect to each weight.  

Weights are updated using **gradient descent**:

\[
w \leftarrow w - \eta \frac{\partial L}{\partial w}
\]

Where:  
- \(w\) = weight  
- \(\eta\) = learning rate  
- \(L\) = loss function  

---

### Summary

Neural Networks are powerful models that can approximate complex functions.  
- **Perceptrons** form the building blocks of neural networks.  
- **Activation functions** introduce non-linearity.  
- **Loss functions** quantify errors.  
- **Backpropagation** with gradient descent allows the network to learn from data.  
