```{contents}
```

## Neural Network

A **neural network (NN)** is a computational model inspired by the human brain.

* It consists of **nodes (neurons)** connected by **edges (weights)**.
* Neural networks are used to **learn patterns** from data and make predictions or decisions.

**Key idea:** A neural network approximates a function ( f(x) ) that maps inputs ( x ) to outputs ( y ).

---

### Structure of a Neural Network

Neural networks are organized in **layers**:

1. **Input Layer**

   * Receives raw features from the dataset.
   * Example: In house price prediction: size, bedrooms, zip code, neighborhood wealth.

2. **Hidden Layers**

   * Intermediate layers that process the inputs.
   * Extract higher-level features and patterns.
   * Can be **one or many layers** (deep networks = many hidden layers).

3. **Output Layer**

   * Produces the final result: a number (regression) or class (classification).

**Notation:**

* Each layer contains **neurons**.
* Neurons compute a weighted sum of inputs, apply an **activation function**, and pass the output to the next layer.

![alt text](../images/single_neuron.png)

---

### Neuron Function

Each neuron performs:


$$z = \sum_{i=1}^{n} w_i x_i + b$$

$$
a = \text{activation}(z)
$$

Where:

* $x_i$ = input feature
*  w_i$  = weight of that input
* b$  = bias term
* z  = weighted sum
* a  = output after activation

---

### Activation Function

Activation functions introduce **non-linearity**, allowing networks to model complex relationships. Common functions:

| Function    | Formula                               | Use Case                                 |
| ----------- | ------------------------------------- | ---------------------------------------- |
| **ReLU**    | $\max(0, x)$                        | Hidden layers, avoids vanishing gradient |
| **Sigmoid** | $\frac{1}{1 + e^{-x}}$              | Outputs probability (0–1)                |
| **Tanh**    | $\frac{e^x - e^{-x}}{e^x + e^{-x}}$ | Outputs range -1 to 1                    |
| **Softmax** | $\frac{e^{x_i}}{\sum_j e^{x_j}}$    | Multi-class classification               |

---

### Forward Propagation

* Process of computing outputs from inputs.
* Each neuron calculates its weighted sum, applies activation, and passes it forward.
* Output of one layer becomes input to the next layer.

---

### Loss Function

* Measures how far the predicted output is from the actual target.
* Common loss functions:

  * **Mean Squared Error (MSE):** Regression
  * **Cross-Entropy Loss:** Classification

**Goal:** Minimize the loss during training.

---

### Backpropagation

* Method to **update network weights** using **gradients** of the loss function.
* Steps:

  1. Compute loss
  2. Calculate gradient of loss w.r.t each weight
  3. Update weights in the **opposite direction of gradient**

$$
w_{\text{new}} = w_{\text{old}} - \eta \frac{\partial \text{Loss}}{\partial w}
$$

* $\eta$ = learning rate

---

### Optimizers

Algorithms that adjust weights to minimize loss efficiently. Examples:

* **SGD (Stochastic Gradient Descent)**
* **Adam** (Adaptive Moment Estimation)
* **RMSProp**

---

### Training Process

1. Initialize weights randomly.
2. Feed inputs through the network (**forward propagation**).
3. Calculate loss.
4. Propagate error backward (**backpropagation**).
5. Update weights using optimizer.
6. Repeat for many epochs until convergence.

---

### Key Concepts

* **Overfitting:** Network memorizes training data but fails on new data.
* **Regularization Techniques:** Dropout, L1/L2 regularization, early stopping.
* **Deep Networks:** Many hidden layers can learn very complex patterns.
* **Densely Connected Layers:** Every neuron in one layer connects to every neuron in the next layer.

---

### Applications of Neural Networks

* Image recognition (CNNs)
* Speech recognition (RNNs, LSTMs)
* Natural language processing (Transformers, GPT models)
* Autonomous vehicles
* Recommendation systems

---
**Summary:**
A neural network is a system of interconnected neurons that learns to map inputs to outputs. It uses **activation functions**, **forward propagation**, **loss functions**, and **backpropagation** to iteratively improve predictions. Deep networks with multiple hidden layers can model highly complex functions, which is the essence of deep learning.
