# **Assignment - Support Vector Machine (SVM) & Naive Bayes**  

---


### **1. What is a Support Vector Machine (SVM)?**  

- SVM is a **supervised learning algorithm** used for **classification and regression** tasks.  

- It finds the **optimal hyperplane** that maximally separates different classes in a dataset.  

- Works well for both **linear and non-linear** classification using the **kernel trick**.  

### **2. What is the difference between Hard Margin and Soft Margin SVM?**  

- **Hard Margin SVM**: Only works if the data is **linearly separable** (strict separation, no misclassification).  

- **Soft Margin SVM**: Allows some misclassifications, controlled by **C parameter**, for handling noisy or overlapping data.  

- Soft Margin is **more practical** for real-world datasets where perfect separation is rare.  


### **3. What is the mathematical intuition behind SVM?**  
- The objective is to **maximize the margin** (distance between the separating hyperplane and nearest points).  

- The decision boundary is given by:  

  w . x + b = 0
   
- The optimization problem minimizes **‖w‖²** while satisfying constraints for correctly classified points.  

### **4. What is the role of Lagrange Multipliers in SVM?**  

- They are used in **constrained optimization** to solve the SVM objective function.  

- Convert the problem into a **dual formulation**, making it easier to optimize with kernel functions.  

- Ensure that only **support vectors contribute** to the final decision boundary.  


### **5. What are Support Vectors in SVM?**  

- Data points that **lie closest** to the decision boundary (margin).  

- These points **influence** the hyperplane's position and orientation.  

- Removing a support vector **changes the boundary**, proving their importance.  


---


### **6. What is a Support Vector Classifier (SVC)?**  

- **SVC is the classification version** of SVM.  


- Finds the **best hyperplane** to separate data into classes.  

- Uses **kernels** to handle both **linear and non-linear** classification problems.  


### **7. What is a Support Vector Regressor (SVR)?**  

- **SVR is the regression version** of SVM.  

- It tries to fit a hyperplane such that most data points fall within a **margin (ε-tube)** around it.  

- Controls error tolerance using the **epsilon (ε) parameter**.  



### **8. What is the Kernel Trick in SVM?**  

- A method to **transform non-linearly separable data** into a higher-dimensional space.  

- Allows SVM to work with **complex decision boundaries**.  

- Examples of kernels: **Linear, Polynomial, RBF, Sigmoid**.  



### **9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.**  

- **Linear Kernel**: Best for **linearly separable** data, faster to compute.  

- **Polynomial Kernel**: Captures **non-linear relationships** with degree \(d\).  

- **RBF Kernel**: Most commonly used, handles **highly non-linear** data.  


### **10. What is the effect of the C parameter in SVM?**  

- **Controls margin width**:  

  - **High C** → Narrow margin, less misclassification (risk of overfitting).  
  
  - **Low C** → Wider margin, allows some misclassification (better generalization).  


----


### **11. What is the role of the Gamma parameter in RBF Kernel SVM?**  

- Controls how **far a single training example’s influence reaches**.  

- **High Gamma** → Each point has **high influence**, leading to overfitting.  

- **Low Gamma** → Points influence a **larger region**, better generalization.  



## **Naïve Bayes Classifier**  

### **12. What is the Naïve Bayes classifier, and why is it called "Naïve"?**  

- A **probabilistic classifier** based on **Bayes’ Theorem**.  

- Assumes that **features are independent**, which is often unrealistic (hence, "Naïve").  

- Works well for **text classification, spam detection, sentiment analysis**.  



### **13. What is Bayes’ Theorem?**  
- A formula for updating probabilities based on new evidence: 

    P(A∣B)= {P(B∣A) . P(A)} / P(B)


- Used in Naïve Bayes to **calculate class probabilities** given input features.  

### **14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.**  

- **Gaussian Naïve Bayes**: Assumes data follows a **normal distribution** (used for continuous data).  

- **Multinomial Naïve Bayes**: Used for **discrete count data** (e.g., text word counts).  

- **Bernoulli Naïve Bayes**: Used for **binary feature data** (e.g., word presence/absence in text).  



### **15. When should you use Gaussian Naïve Bayes over other variants?**  

- When features are **continuous and normally distributed**.  

- Works well for **medical datasets, fraud detection, and sensor data**.  

- If data is non-Gaussian, other models (like Decision Trees) may work better.  

----


### **16. What are the key assumptions made by Naïve Bayes?**  

- **Feature independence** (each feature contributes independently to the outcome).  

- **Equal importance of all features** (not always true in real-world data).  

- **All features contribute to class probability estimation**.  



### **17. What are the advantages and disadvantages of Naïve Bayes?**  

**Advantages** :  

- **Fast and efficient**, works well on **large datasets**.  

- Performs well in **text classification** and spam filtering.  


**Disadvantages** :  

- Assumes **feature independence**, which is rarely true.  

- Struggles with **correlated features** (e.g., height and weight).  


### **18. Why is Naïve Bayes a good choice for text classification?**  

- **Handles high-dimensional data** well (text datasets have many features).  

- **Fast training and prediction** (even on large datasets).  

- **Robust to irrelevant features**, reducing noise in classification.  

### **19. Compare SVM and Naïve Bayes for classification tasks.**  

- **SVM** works well for **complex relationships** and **high-dimensional data**, while **Naïve Bayes** is best for **text classification** and **fast predictions**.  

- **SVM is computationally expensive**, especially with kernels, whereas **Naïve Bayes is much faster** and works well with small datasets.  

- **SVM handles non-linearity** using kernel tricks, but **Naïve Bayes assumes feature independence** and cannot model complex relationships.  

- **SVM is sensitive to outliers**, while **Naïve Bayes is robust** since it relies on probability distributions.  

- **Use SVM** for large, complex datasets and **Naïve Bayes** for fast, efficient classification, especially in NLP tasks.  




### **20. How does Laplace Smoothing help in Naïve Bayes?**  
- Prevents **zero probability issues** when a word/feature is missing in training data.  
- Adds a small constant **(α, usually 1)** to all counts to ensure no probability is exactly zero.  
- Helps **generalize better**, avoiding overfitting in small datasets.  
- Improves model performance, especially in **text classification** where many words may be unseen during training.  
- Ensures **new/unseen words** still have a nonzero probability, making the model more robust.  

   **Formula with Laplace Smoothing:**  

      - P(w|c) = (count(w, c) +  ) / (count(c) + α x V)

----