Great — here’s a **clear and simple explanation** for each topic in **3. Training & Optimization**, with **interview-focused points and examples**.

---

### **3.1 Weight Initialization**

**What:** How we set the initial weights of a neural network before training starts.

**Why it matters:**
Bad initialization can cause vanishing or exploding gradients. Good initialization makes training faster and more stable.

**Common techniques:**

* **Zero Initialization:**
  Set all weights to 0 → Bad! Makes all neurons behave the same (no learning).

* **Random Initialization:**
  Random small numbers → Better than zero, but may still cause instability.

* **Xavier/Glorot Initialization (for sigmoid/tanh):**
  Keeps variance same across layers.
  $Var(W) = 2 / (n_in + n_out)$

* **He Initialization (for ReLU):**
  Designed for ReLU activations.
  $Var(W) = 2 / n_in$

✅ **Interview tip:** Be ready to explain why Xavier is better for sigmoid/tanh and He for ReLU.

---

### **3.2 Batch Normalization & Layer Normalization**

#### **Batch Normalization (BN)**

**What:** Normalize the inputs of each layer using mean & std from the **batch**.

**Why:**

* Reduces internal covariate shift.
* Speeds up training.
* Acts like a regularizer (less overfitting).

**How:**

1. Normalize: subtract batch mean, divide by batch std.
2. Scale & shift with learnable parameters.

**Used in:** CNNs, MLPs.

#### **Layer Normalization**

**What:** Normalize across **features** (not batch).

**When to use:**

* Works better in RNNs, Transformers (because batch size can vary or be 1).

✅ **Interview Tip:** "BatchNorm depends on batch size. LayerNorm doesn’t — useful for sequential data."

---

### **3.3 Transfer Learning & Fine-Tuning**

#### **Transfer Learning**

**What:** Use a pre-trained model (like ResNet, BERT) and reuse its knowledge for a new task.

**Example:**
Use ImageNet-trained ResNet to classify medical images.

* **Freeze** early layers (they learn general features).
* Replace final layer to match your new task.

#### **Fine-Tuning**

**What:** Train the entire pre-trained model (or last few layers) **a little more** on your dataset.

✅ **Interview Tip:**
Explain when you’d freeze vs fine-tune:

* Freeze: Small dataset.
* Fine-tune: Large enough dataset to adjust features.

---

### **3.4 Meta-Learning & Few-Shot Learning**

#### **Meta-Learning ("Learning to Learn")**

**What:** A model learns **how to learn new tasks quickly**.

**Why:** In real-world tasks, we may have **few examples** per task.

**Example:**
Train a model on many tasks with few examples (like different character styles), so it generalizes fast.

#### **Few-Shot Learning**

**What:** Learning a new task with very few labeled examples (e.g., 1-shot, 5-shot learning).

**Example:**
Learn to classify new animals with 1 image per class.

**Popular approaches:**

* **Siamese Networks**
* **Prototypical Networks**
* **MAML (Model-Agnostic Meta-Learning)**

✅ **Interview Tip:**
If asked: “How would you classify new categories with 1 example per class?” — mention **Few-shot + Meta-Learning**.

---

Would you like a quick **PDF of all sections you’ve covered so far** for revision?
