```{contents}
```

## Artificial Neural Network (ANN)

An ANN is a **computational model inspired by the human brain**.

* It consists of **neurons (nodes)** arranged in layers.
* These neurons are **connected by weights**, which adjust during learning.
* ANNs are designed to **learn complex relationships** from data, both linear and nonlinear.

**Analogy:**

* Input features → sensory neurons in brain.
* Hidden layers → processing neurons in brain that extract patterns.
* Output → brain’s decision or response.

---

### Intuition 

The **core idea**:

* A neural network **combines inputs in weighted ways** to compute an output.
* The network **learns the best weights** to approximate a target function.

**Example:** Predicting house prices.

* Inputs: size, location, age.
* Neurons combine these inputs in different ways (weighted sum + bias).
* Activation function transforms them → captures nonlinear patterns (e.g., big house in a bad neighborhood might cost less).
* Output: predicted price.

**Key intuition points:**

1. Each neuron is a **function approximator**.
2. Multiple neurons → can model **complex functions**.
3. Layers allow **hierarchical feature extraction**:

   * Early layers → basic features (edges in images).
   * Deeper layers → complex features (objects in images).

---

### Components of an ANN

#### Neurons

* Input: (x_1, x_2, \dots, x_n)
* Weights: (w_1, w_2, \dots, w_n)
* Bias: (b)
* Output: (y = f(\sum w_i x_i + b))

**Intuition:**

* Weights = importance of each input.
* Bias = shifts the decision boundary.
* Activation function = introduces **non-linearity**.

---

#### **Layers**

* **Input layer**: raw features.
* **Hidden layers**: learn **intermediate patterns**.
* **Output layer**: final prediction.

**Intuition:**

* Without hidden layers → linear models.
* Hidden layers → network can model **nonlinear relationships**.

---

### **Activation Functions**

* Transform neuron output.
* **Why needed?** Without them, multiple layers collapse into a single linear layer.

Common functions:

1. **Sigmoid** → maps output to (0,1), used in probabilities.
2. **Tanh** → maps output to (-1,1), zero-centered.
3. **ReLU** → outputs max(0, x), solves vanishing gradient problem.
4. **Softmax** → converts outputs into probabilities for multi-class classification.

**Intuition:**

* Activation decides **which neurons “fire”**.
* Like brain neurons: only active neurons contribute.

---

### **Forward Propagation**

* **Goal:** compute output from inputs.
* Each neuron → weighted sum → activation → next layer.

**Intuition:**

* Signals flow like **electrical signals in brain**.
* Each layer extracts increasingly complex features.

---

### **Loss Function**

* Measures **error between predicted and true output**.
* Examples:

  * Regression → MSE
  * Classification → Cross-Entropy

**Intuition:**

* Loss tells the network **how wrong it is**.
* Guides learning through gradients.

---

### **Backpropagation**

* Computes **gradients of loss w.r.t weights** using chain rule.
* Updates weights to **reduce loss** (learning).

**Intuition:**

* Like adjusting knobs to minimize error.
* Deeper layers receive **gradient feedback** to improve pattern detection.

---

### **Optimizers**

* Algorithms to adjust weights efficiently.
* Examples: SGD, Momentum, RMSProp, Adam.

**Intuition:**

* Optimizer = **strategy to climb down the error hill** toward minimum.
* Adam → combines momentum (smoother path) + adaptive learning rate.

---

### **Regularization**

* Prevents overfitting.
* Techniques: dropout, L1/L2 penalties, early stopping.

**Intuition:**

* Forces network to **generalize**, not memorize training data.
* Dropout = randomly deactivate neurons → network learns **robust features**.

---

###  Why ANNs are powerful

* Can approximate **any function** (Universal Approximation Theorem).
* Handle **high-dimensional data** (images, text, speech).
* Learn **hierarchical features** automatically.

**Intuition:**

* Instead of manually engineering features, ANN **learns features by itself**.

---

### Training Process Summary

1. Initialize weights & biases.
2. Forward pass → compute outputs.
3. Compute loss.
4. Backpropagation → compute gradients.
5. Update weights via optimizer.
6. Repeat for multiple epochs until convergence.

---

**6. Key Problems in ANN**

* **Vanishing gradients** → small gradients → slow learning (sigmoid/tanh).
* **Exploding gradients** → large gradients → unstable learning.
* **Overfitting** → network memorizes training data → poor generalization.

**Intuition:**

* Proper weight initialization, activation choice, and regularization solve these.


### Types of Neural Network

#### Feedforward Neural Network (FNN)

* **Structure:** Data moves only forward — input → hidden → output layers.
* **No feedback or memory.**
* **Use:** Basic tasks like image or text classification, regression, tabular data.
* **Example:** Predicting house prices, digit recognition (MNIST).

---

#### Convolutional Neural Network (CNN)

* **Structure:** Uses convolution layers to extract spatial patterns from data.
* **Key idea:** Detects features like edges → shapes → objects.
* **Use:** Image recognition, video analysis, medical imaging.
* **Example:** Face detection, self-driving car vision.

---

#### Recurrent Neural Network (RNN)

* **Structure:** Loops within layers, allowing information from previous steps to influence the current output.
* **Purpose:** Handles sequential data.
* **Use:** Time-series forecasting, language modeling, speech recognition.
* **Variants:**

  * **LSTM (Long Short-Term Memory):** Handles long dependencies.
  * **GRU (Gated Recurrent Unit):** Simplified version of LSTM.

---

#### Autoencoder

* **Structure:** Encoder compresses input → Decoder reconstructs it.
* **Goal:** Learn efficient data representation (latent space).
* **Use:** Data compression, denoising, anomaly detection, feature extraction.

---

#### Generative Adversarial Network (GAN)

* **Structure:** Two networks —

  * **Generator:** creates fake data.
  * **Discriminator:** detects if data is real or fake.
* **Use:** Image generation, data augmentation, deepfakes, art creation.

---

#### Transformer

* **Structure:** Based on *self-attention* — each token (word, image patch, etc.) attends to others.
* **No recurrence.**
* **Use:** NLP (translation, summarization), vision (ViT), multimodal GenAI (GPT, BERT, CLIP).

---

#### Radial Basis Function Network (RBFN)

* **Structure:** Hidden layer uses radial basis (Gaussian) functions.
* **Use:** Function approximation, classification with smooth decision boundaries.

---

#### Graph Neural Network (GNN)

* **Structure:** Operates on graph data (nodes + edges).
* **Goal:** Aggregate information from connected nodes.
* **Use:** Social network analysis, molecular prediction, recommender systems.

---

#### Modular / Hybrid Networks

* Combine different types for complex tasks.
* **Examples:**

  * CNN + LSTM → video activity recognition.
  * Transformer + CNN → vision-language models.

| **Network Type**                         | **Input Type**                       | **Core Structure / Idea**                             | **Memory / Feedback**        | **Main Applications**                                   |
| ---------------------------------------- | ------------------------------------ | ----------------------------------------------------- | ---------------------------- | ------------------------------------------------------- |
| **Feedforward Neural Network (FNN)**     | Fixed-size vectors (tabular, images) | Sequential layers, data flows forward only            | ❌ No                         | Classification, regression, pattern recognition         |
| **Convolutional Neural Network (CNN)**   | Images, videos, spatial data         | Convolution + pooling layers extract spatial features | ❌ No                         | Image recognition, object detection, medical imaging    |
| **Recurrent Neural Network (RNN)**       | Sequential/time-series data          | Loops connect previous outputs to current inputs      | ✅ Yes                        | Text, speech, time-series forecasting                   |
| **LSTM (Long Short-Term Memory)**        | Sequential data                      | Specialized RNN with gates for long-term memory       | ✅ Yes                        | NLP, speech-to-text, stock prediction                   |
| **GRU (Gated Recurrent Unit)**           | Sequential data                      | Simplified LSTM with fewer parameters                 | ✅ Yes                        | Similar to LSTM but faster                              |
| **Autoencoder**                          | Any numeric data                     | Encoder compresses, decoder reconstructs input        | ⚙️ Partial                   | Denoising, anomaly detection, dimensionality reduction  |
| **GAN (Generative Adversarial Network)** | Any structured data                  | Generator vs. discriminator in adversarial setup      | ⚙️ Indirect                  | Image synthesis, data augmentation, creative generation |
| **Transformer**                          | Sequential or tokenized data         | Self-attention layers learn relationships globally    | ✅ Contextual (not recurrent) | NLP (GPT, BERT), multimodal AI, vision transformers     |
| **Radial Basis Function Network (RBFN)** | Numeric data                         | Uses Gaussian basis functions for hidden activations  | ❌ No                         | Function approximation, simple classification           |
| **Graph Neural Network (GNN)**           | Graphs (nodes & edges)               | Aggregates features from connected nodes              | ✅ Yes                        | Social networks, molecule analysis, recommender systems |
| **Hybrid / Modular Networks**            | Mixed data                           | Combines multiple architectures (e.g., CNN + RNN)     | ⚙️ Mixed                     | Complex multimodal tasks (video, audio-text fusion)     |



```{dropdown} Click here for Sections
```{tableofcontents}