### **Q1. What is the purpose of Grid Search CV in machine learning, and how does it work?**

**Purpose:**  
Grid Search CV (Cross-Validation) is used to **tune hyperparameters** of a machine learning model to find the combination that yields the best performance.

**How it works:**
- You define a **grid of hyperparameters** and their possible values.
- The model is trained and evaluated using **cross-validation** for **each combination**.
- The combination with the best evaluation score is selected.

🔍 **Example:**  
For an SVM model, you can tune `C`, `kernel`, and `gamma` using Grid Search CV.

---

### **Q2. Describe the difference between Grid Search CV and Randomized Search CV, and when might you choose one over the other?**

| Feature                  | Grid Search CV                         | Randomized Search CV                     |
|--------------------------|----------------------------------------|------------------------------------------|
| Search Method            | Exhaustively tries all combinations    | Randomly samples combinations            |
| Time Complexity          | High (if grid is large)                | Lower (can specify number of iterations) |
| When to Use              | Small search space                     | Large or complex search space            |

✅ **Choose Randomized Search** when:
- You have **limited computational resources**.
- The **parameter space is large** or continuous.

---

### **Q3. What is data leakage, and why is it a problem in machine learning? Provide an example.**

**Data Leakage:**  
When **information from outside the training dataset** is used to create the model, leading to overly optimistic performance.

**Why it’s a problem:**  
It causes **unrealistically high accuracy** during training and poor **generalization to new data**.

🔍 **Example:**  
If a feature in the dataset is a **future value** (like target sales next month), and it's used during training to predict current sales — that’s data leakage.

---

### **Q4. How can you prevent data leakage when building a machine learning model?**

✅ **Prevention Techniques:**
1. **Separate preprocessing**: Always apply transformations (e.g., scaling, imputation) **after** splitting into train/test.
2. **Avoid using future data** for training.
3. **Be cautious with derived features** (ensure they don’t encode the target).
4. Use **pipelines** (e.g., `sklearn.pipeline`) to ensure no leakage during preprocessing.

---

### **Q5. What is a confusion matrix, and what does it tell you about the performance of a classification model?**

A **confusion matrix** is a table used to evaluate the performance of a classification algorithm.

|                      | **Predicted Positive** | **Predicted Negative** |
|----------------------|------------------------|------------------------|
| **Actual Positive**  | True Positive (TP)     | False Negative (FN)    |
| **Actual Negative**  | False Positive (FP)    | True Negative (TN)     |

It gives detailed insight into **correct** and **incorrect** predictions, which is crucial beyond simple accuracy.

---

### **Q6. Explain the difference between precision and recall in the context of a confusion matrix.**

- **Precision** = TP / (TP + FP)  
  → Of all predicted positives, how many are truly positive?

- **Recall** = TP / (TP + FN)  
  → Of all actual positives, how many did we correctly predict?

🔍 Example:  
In a cancer detection model:
- **High precision** = few false alarms.
- **High recall** = few missed actual cancer cases.

---

### **Q7. How can you interpret a confusion matrix to determine which types of errors your model is making?**

- **High FP (False Positives):** Model is over-predicting the positive class.
- **High FN (False Negatives):** Model is missing actual positive instances.
- Use this to **analyze trade-offs** and tune model accordingly (e.g., changing threshold, balancing classes).

---

### **Q8. What are some common metrics that can be derived from a confusion matrix, and how are they calculated?**

1. **Accuracy:**  
   $$
   \frac{TP + TN}{TP + FP + FN + TN}
   $$

2. **Precision:**  
   $$
   \frac{TP}{TP + FP}
   $$

3. **Recall (Sensitivity):**  
   $$
   \frac{TP}{TP + FN}
   $$

4. **Specificity:**  
   $$
   \frac{TN}{TN + FP}
   $$

5. **F1 Score (harmonic mean of precision & recall):**  
   $$
   2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}
   $$

---

### **Q9. What is the relationship between the accuracy of a model and the values in its confusion matrix?**

- **Accuracy** is directly computed from the confusion matrix:
  \[
  \text{Accuracy} = \frac{TP + TN}{Total}
  \]

However, in **imbalanced datasets**, a high accuracy can be **misleading**. Example:  
If 95% of the data is negative, predicting all negatives gives 95% accuracy but fails completely at identifying positives.

---

### **Q10. How can you use a confusion matrix to identify potential biases or limitations in your machine learning model?**

✅ **Use Cases:**
- **Bias toward majority class:** Many FNs or FPs for minority class.
- **Unbalanced predictions:** Model predicts only one class frequently.
- **Systematic errors:** Repeated errors in a specific class indicate **label imbalance**, poor features, or data quality issues.

➡️ Helps you fine-tune preprocessing, sampling strategies, or even the loss function to **mitigate bias and improve fairness**.
