```{contents}
```

# Neural Network

A **neural network (NN)** is a computational model inspired by the human brain.

* It consists of **nodes (neurons)** connected by **edges (weights)**.
* Neural networks are used to **learn patterns** from data and make predictions or decisions.

**Key idea:** A neural network approximates a function ( f(x) ) that maps inputs ( x ) to outputs ( y ).

---

## Structure of a Neural Network

Neural networks are organized in **layers**:

1. **Input Layer**

   * Receives raw features from the dataset.
   * Example: In house price prediction: size, bedrooms, zip code, neighborhood wealth.

2. **Hidden Layers**

   * Intermediate layers that process the inputs.
   * Extract higher-level features and patterns.
   * Can be **one or many layers** (deep networks = many hidden layers).

3. **Output Layer**

   * Produces the final result: a number (regression) or class (classification).

**Notation:**

* Each layer contains **neurons**.
* Neurons compute a weighted sum of inputs, apply an **activation function**, and pass the output to the next layer.

![alt text](../images/single_neuron.png)

---

## Neuron Function

Each neuron performs:


$$z = \sum_{i=1}^{n} w_i x_i + b$$

$$
a = \text{activation}(z)
$$

Where:

* $x_i$ = input feature
*  w_i$  = weight of that input
* b$  = bias term
* z  = weighted sum
* a  = output after activation

---

## Activation Function

Activation functions introduce **non-linearity**, allowing networks to model complex relationships. Common functions:

| Function    | Formula                               | Use Case                                 |
| ----------- | ------------------------------------- | ---------------------------------------- |
| **ReLU**    | $\max(0, x)$                        | Hidden layers, avoids vanishing gradient |
| **Sigmoid** | $\frac{1}{1 + e^{-x}}$              | Outputs probability (0–1)                |
| **Tanh**    | $\frac{e^x - e^{-x}}{e^x + e^{-x}}$ | Outputs range -1 to 1                    |
| **Softmax** | $\frac{e^{x_i}}{\sum_j e^{x_j}}$    | Multi-class classification               |

---

## Forward Propagation

* Process of computing outputs from inputs.
* Each neuron calculates its weighted sum, applies activation, and passes it forward.
* Output of one layer becomes input to the next layer.

---

## Loss Function

* Measures how far the predicted output is from the actual target.
* Common loss functions:

  * **Mean Squared Error (MSE):** Regression
  * **Cross-Entropy Loss:** Classification

**Goal:** Minimize the loss during training.

---

## Backpropagation

* Method to **update network weights** using **gradients** of the loss function.
* Steps:

  1. Compute loss
  2. Calculate gradient of loss w.r.t each weight
  3. Update weights in the **opposite direction of gradient**

$$
w_{\text{new}} = w_{\text{old}} - \eta \frac{\partial \text{Loss}}{\partial w}
$$

* $\eta$ = learning rate

---

## Optimizers

Algorithms that adjust weights to minimize loss efficiently. Examples:

* **SGD (Stochastic Gradient Descent)**
* **Adam** (Adaptive Moment Estimation)
* **RMSProp**

---

## Training Process

1. Initialize weights randomly.
2. Feed inputs through the network (**forward propagation**).
3. Calculate loss.
4. Propagate error backward (**backpropagation**).
5. Update weights using optimizer.
6. Repeat for many epochs until convergence.

---

## Key Concepts

* **Overfitting:** Network memorizes training data but fails on new data.
* **Regularization Techniques:** Dropout, L1/L2 regularization, early stopping.
* **Deep Networks:** Many hidden layers can learn very complex patterns.
* **Densely Connected Layers:** Every neuron in one layer connects to every neuron in the next layer.

---

## Applications of Neural Networks

* Image recognition (CNNs)
* Speech recognition (RNNs, LSTMs)
* Natural language processing (Transformers, GPT models)
* Autonomous vehicles
* Recommendation systems

---
**Summary:**
A neural network is a system of interconnected neurons that learns to map inputs to outputs. It uses **activation functions**, **forward propagation**, **loss functions**, and **backpropagation** to iteratively improve predictions. Deep networks with multiple hidden layers can model highly complex functions, which is the essence of deep learning.


## Artificial Neural Network (ANN)

An ANN is a **computational model inspired by the human brain**.

* It consists of **neurons (nodes)** arranged in layers.
* These neurons are **connected by weights**, which adjust during learning.
* ANNs are designed to **learn complex relationships** from data, both linear and nonlinear.

**Analogy:**

* Input features → sensory neurons in brain.
* Hidden layers → processing neurons in brain that extract patterns.
* Output → brain’s decision or response.

---

### Intuition 

The **core idea**:

* A neural network **combines inputs in weighted ways** to compute an output.
* The network **learns the best weights** to approximate a target function.

**Example:** Predicting house prices.

* Inputs: size, location, age.
* Neurons combine these inputs in different ways (weighted sum + bias).
* Activation function transforms them → captures nonlinear patterns (e.g., big house in a bad neighborhood might cost less).
* Output: predicted price.

**Key intuition points:**

1. Each neuron is a **function approximator**.
2. Multiple neurons → can model **complex functions**.
3. Layers allow **hierarchical feature extraction**:

   * Early layers → basic features (edges in images).
   * Deeper layers → complex features (objects in images).

---

### Components of an ANN

#### Neurons

* Input: (x_1, x_2, \dots, x_n)
* Weights: (w_1, w_2, \dots, w_n)
* Bias: (b)
* Output: (y = f(\sum w_i x_i + b))

**Intuition:**

* Weights = importance of each input.
* Bias = shifts the decision boundary.
* Activation function = introduces **non-linearity**.

---

#### **Layers**

* **Input layer**: raw features.
* **Hidden layers**: learn **intermediate patterns**.
* **Output layer**: final prediction.

**Intuition:**

* Without hidden layers → linear models.
* Hidden layers → network can model **nonlinear relationships**.

---

### **Activation Functions**

* Transform neuron output.
* **Why needed?** Without them, multiple layers collapse into a single linear layer.

Common functions:

1. **Sigmoid** → maps output to (0,1), used in probabilities.
2. **Tanh** → maps output to (-1,1), zero-centered.
3. **ReLU** → outputs max(0, x), solves vanishing gradient problem.
4. **Softmax** → converts outputs into probabilities for multi-class classification.

**Intuition:**

* Activation decides **which neurons “fire”**.
* Like brain neurons: only active neurons contribute.

---

### **Forward Propagation**

* **Goal:** compute output from inputs.
* Each neuron → weighted sum → activation → next layer.

**Intuition:**

* Signals flow like **electrical signals in brain**.
* Each layer extracts increasingly complex features.

---

### **Loss Function**

* Measures **error between predicted and true output**.
* Examples:

  * Regression → MSE
  * Classification → Cross-Entropy

**Intuition:**

* Loss tells the network **how wrong it is**.
* Guides learning through gradients.

---

### **Backpropagation**

* Computes **gradients of loss w.r.t weights** using chain rule.
* Updates weights to **reduce loss** (learning).

**Intuition:**

* Like adjusting knobs to minimize error.
* Deeper layers receive **gradient feedback** to improve pattern detection.

---

### **Optimizers**

* Algorithms to adjust weights efficiently.
* Examples: SGD, Momentum, RMSProp, Adam.

**Intuition:**

* Optimizer = **strategy to climb down the error hill** toward minimum.
* Adam → combines momentum (smoother path) + adaptive learning rate.

---

### **Regularization**

* Prevents overfitting.
* Techniques: dropout, L1/L2 penalties, early stopping.

**Intuition:**

* Forces network to **generalize**, not memorize training data.
* Dropout = randomly deactivate neurons → network learns **robust features**.

---

## **4. Why ANNs are powerful**

* Can approximate **any function** (Universal Approximation Theorem).
* Handle **high-dimensional data** (images, text, speech).
* Learn **hierarchical features** automatically.

**Intuition:**

* Instead of manually engineering features, ANN **learns features by itself**.

---

### **5. Training Process Summary**

1. Initialize weights & biases.
2. Forward pass → compute outputs.
3. Compute loss.
4. Backpropagation → compute gradients.
5. Update weights via optimizer.
6. Repeat for multiple epochs until convergence.

---

**6. Key Problems in ANN**

* **Vanishing gradients** → small gradients → slow learning (sigmoid/tanh).
* **Exploding gradients** → large gradients → unstable learning.
* **Overfitting** → network memorizes training data → poor generalization.

**Intuition:**

* Proper weight initialization, activation choice, and regularization solve these.

