1. What is a Support Vector Machine (SVM)?
 - SVM is a supervised machine learning algorithm used for classification and regression. It finds the optimal decision boundary (hyperplane) that separates different classes with the maximum margin.

2. What is the difference between Hard Margin and Soft Margin SVM?\
 - Hard Margin:
Assumes data is perfectly linearly separable.
No misclassification allowed.
Very strict → works poorly if data has noise/outliers.

- Soft Margin:
Allows some misclassifications using slack variables (ξ).
Balances between maximizing margin and minimizing classification error.
Controlled by parameter C.

3. What is the mathematical intuition behind SVM?
- We want to maximize the margin (distance between the separating hyperplane and closest data points). This ensures correct classification with maximum margin.

4. What is the role of Lagrange Multipliers in SVM?
- SVM uses constrained optimization. Lagrange multipliers help transform it into a dual problem, which is easier to solve. The solution depends only on support vectors, not all training points.

5. What are Support Vectors in SVM?
 - The critical data points closest to the decision boundary. They define the margin and decision hyperplane. Removing non-support vectors does not affect the decision boundary.

6. What is a Support Vector Classifier (SVC)?
 - It’s the classification model built using SVM. Finds the best hyperplane (linear or kernel-transformed) to separate classes.

7. What is a Support Vector Regressor (SVR)?
- Regression version of SVM. Instead of classification, it tries to fit a function within a tolerance margin (ε-insensitive zone).

8. What is the Kernel Trick in SVM?
- Used when data is not linearly separable. Kernel trick maps data into a higher-dimensional space without explicitly computing it, making it easier to separate.

9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel?
- Linear Kernel:
Works well for linearly separable data.
Faster, less complex.
- Polynomial Kernel:
Captures polynomial relationships.
Can model more complex decision boundaries.
- RBF (Radial Basis Function) Kernel:
Maps data into infinite dimensions.
Very flexible, handles non-linear data well.
Needs tuning of Gamma.

10. What is the effect of the C parameter in SVM?
- C = Regularization parameter. High C → less tolerance for misclassification (overfitting risk). Low C → allows more margin violations (better generalization).

11. What is the role of the Gamma parameter in RBF Kernel SVM?
- Controls how far the influence of a single training example reaches. High Gamma → each point has narrow influence (overfitting risk). Low Gamma → points have broad influence (underfitting risk).

12. What is the Naïve Bayes classifier, and why is it called "Naïve"?
- A probabilistic classifier based on Bayes’ theorem. Assumes features are independent, which is rarely true in practice → hence "Naïve".

13. What is Bayes’ Theorem?
- P(A∣B)=P(B∣A)⋅P(A) / P(B)

14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes?
- Gaussian NB: For continuous features, assumes data follows Gaussian distribution. Multinomial NB: For count-based features (e.g., word frequencies in text). Bernoulli NB: For binary features (e.g., word present/not present).

15. When should you use Gaussian Naïve Bayes over other variants?
- When features are continuous and approximately normally distributed (e.g., height, weight, sensor data).

16. What are the key assumptions made by Naïve Bayes?
- Features are conditionally independent given the class. Each feature contributes equally to the prediction.

17. What are the advantages and disadvantages of Naïve Bayes?
- Advantages:
Fast, simple, scalable.
Works well with high-dimensional data (like text).
Requires small training data.

- Disadvantages:
Independence assumption often unrealistic.
Performs poorly if features are highly correlated.

18. Why is Naïve Bayes a good choice for text classification?
- Text features (words) are mostly independent. Handles high-dimensional sparse data well. Efficient for large vocabularies.

19. Compare SVM and Naïve Bayes for classification tasks?
- SVM:
Works well for small/medium datasets with clear margins.
More accurate but computationally expensive.

- Naïve Bayes:
Works well for large, high-dimensional datasets (e.g., text).
Faster, less accurate if independence assumption is violated.

20. How does Laplace Smoothing help in Naïve Bayes?
- Solves the zero-probability problem when a word/class combination never occurs in training.

21. Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [11]:
from sklearn.datasets import load_iris

df = load_iris()
data = pd.DataFrame(df.data, columns=df.feature_names)

In [10]:
df.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [12]:
X = data
y = df.target

In [13]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=1)

In [15]:
from sklearn.svm import SVC
clf = SVC()
clf.fit(X_train,y_train)

0,1,2
,C,1.0
,kernel,'rbf'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [16]:
y_pred = clf.predict(X_test)

In [25]:
from sklearn.metrics import accuracy_score
acc = accuracy_score(y_test,y_pred)
print(f"The accuracy of our model is {acc:.4}")

The accuracy of our model is 0.9778


22. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then
compare their accuracies.

In [31]:
from sklearn.datasets import load_wine
wine = load_wine()
wine_data = pd.DataFrame(wine.data, columns=wine.feature_names)


In [33]:
X = wine_data
y = wine.target

In [34]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=1)

In [38]:
# Train the model with Linear Kernel
from sklearn.svm import SVC
lnr_svc = SVC(kernel="linear")

In [39]:
lnr_svc.fit(X_train, y_train)

0,1,2
,C,1.0
,kernel,'linear'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [41]:
y_pred_lnr = lnr_svc.predict(X_test)
lnr_acc = accuracy_score(y_test,y_pred_lnr)

In [42]:
rbf_svc = SVC(kernel="rbf")
rbf_svc.fit(X_train,y_train)

0,1,2
,C,1.0
,kernel,'rbf'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [43]:
y_pred_rbf = lnr_svc.predict(X_test)
rbf_acc = accuracy_score(y_test,y_pred_rbf)

In [44]:
print(f"the accuracy when kernels is linear is {lnr_acc:.4}, and the accuracy when kernels is RBF is {rbf_acc:.4}")

the accuracy when kernels is linear is 0.963, and the accuracy when kernels is RBF is 0.963


23. Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean
Squared Error (MSE).

In [48]:
from sklearn.datasets import load_diabetes
dbt = load_diabetes()
X = dbt.data
y = dbt.target

In [50]:
from sklearn.model_selection import train_test_split
X_train,X_test, y_train,y_test = train_test_split(X,y,test_size=0.30,random_state=1)

In [51]:
from sklearn.svm import SVC
svc = SVC()
svc.fit(X_train,y_train)

0,1,2
,C,1.0
,kernel,'rbf'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [53]:
y_pred = svc.predict(X_test)

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print(f"The Mean Squared Error (MSE) of these model is {mse}")

The Mean Squared Error (MSE) of these model is 6271.248120300752


24. Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision
boundary.

In [54]:
from sklearn.datasets import load_iris
irs = load_iris()
X = irs.data
y = irs.target

In [57]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.25, random_state=1)

In [58]:
from sklearn.svm import SVC
clf = SVC(kernel="poly")
clf.fit(X_train,y_train)

0,1,2
,C,1.0
,kernel,'poly'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [61]:
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test,y_pred)
print(f"the accuracy score of these model is {acc:.4}")

the accuracy score of these model is 0.9737


25. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and
evaluate accuracy.

In [63]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X = cancer.data
y=cancer.target

In [64]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train,y_train)

0,1,2
,priors,
,var_smoothing,1e-09


In [65]:
y_pred = model.predict(X_test)
acc = accuracy_score(y_test,y_pred)
print(f"accuracy of this model is {acc:.4}")

accuracy of this model is 0.9737


26. Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20
Newsgroups dataset.

In [67]:
from sklearn.datasets import fetch_20newsgroups

In [70]:
news = fetch_20newsgroups(subset='all')

In [None]:
newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'))
newsgroups_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'))


In [98]:
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
model = make_pipeline(TfidfVectorizer(), MultinomialNB())


In [99]:
model.fit(newsgroups_train.data, newsgroups_train.target)

0,1,2
,steps,"[('tfidfvectorizer', ...), ('multinomialnb', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,input,'content'
,encoding,'utf-8'
,decode_error,'strict'
,strip_accents,
,lowercase,True
,preprocessor,
,tokenizer,
,analyzer,'word'
,stop_words,
,token_pattern,'(?u)\\b\\w\\w+\\b'

0,1,2
,alpha,1.0
,force_alpha,True
,fit_prior,True
,class_prior,


In [102]:
y_pred = model.predict(newsgroups_test.data)
from sklearn.metrics import classification_report, accuracy_score
print(f"Accuracy: {accuracy_score(newsgroups_test.target, y_pred):.4f}")
print(classification_report(newsgroups_test.target, y_pred, target_names=newsgroups_test.target_names))


Accuracy: 0.6062
                          precision    recall  f1-score   support

             alt.atheism       0.81      0.07      0.13       319
           comp.graphics       0.72      0.62      0.67       389
 comp.os.ms-windows.misc       0.70      0.50      0.59       394
comp.sys.ibm.pc.hardware       0.55      0.75      0.64       392
   comp.sys.mac.hardware       0.81      0.61      0.69       385
          comp.windows.x       0.83      0.74      0.78       395
            misc.forsale       0.86      0.69      0.77       390
               rec.autos       0.82      0.68      0.74       396
         rec.motorcycles       0.89      0.63      0.73       398
      rec.sport.baseball       0.95      0.69      0.80       397
        rec.sport.hockey       0.59      0.90      0.71       399
               sci.crypt       0.47      0.80      0.59       396
         sci.electronics       0.77      0.43      0.55       393
                 sci.med       0.86      0.63      0.73   

27. Write a Python program to train an SVM Classifier with different C values and compare the decision
boundaries visually.

In [105]:
from sklearn.datasets import load_iris
df = load_iris()

In [106]:
X = df.data
y=df.target

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3, random_state=1)

In [114]:
param = {'C':[1,2,3,10,20,50,100,120],
         'gamma':[1,0.1,0.2,0.02,0.03,0.001,0.003],
         'kernel':['linear','poly','rbf','sigmoid']}

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
grid = GridSearchCV(SVC(), param_grid=param, cv=5,verbose=3)

In [115]:
grid.fit(X_train,y_train)

Fitting 5 folds for each of 224 candidates, totalling 1120 fits
[CV 1/5] END .......C=1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END .......C=1, gamma=1, kernel=linear;, score=0.952 total time=   0.0s
[CV 3/5] END .......C=1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 4/5] END .......C=1, gamma=1, kernel=linear;, score=0.952 total time=   0.0s
[CV 5/5] END .......C=1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 1/5] END .........C=1, gamma=1, kernel=poly;, score=0.905 total time=   0.0s
[CV 2/5] END .........C=1, gamma=1, kernel=poly;, score=0.952 total time=   0.0s
[CV 3/5] END .........C=1, gamma=1, kernel=poly;, score=0.952 total time=   0.0s
[CV 4/5] END .........C=1, gamma=1, kernel=poly;, score=0.905 total time=   0.0s
[CV 5/5] END .........C=1, gamma=1, kernel=poly;, score=0.810 total time=   0.0s
[CV 1/5] END ..........C=1, gamma=1, kernel=rbf;, score=1.000 total time=   0.0s
[CV 2/5] END ..........C=1, gamma=1, kernel=r

0,1,2
,estimator,SVC()
,param_grid,"{'C': [1, 2, ...], 'gamma': [1, 0.1, ...], 'kernel': ['linear', 'poly', ...]}"
,scoring,
,n_jobs,
,refit,True
,cv,5
,verbose,3
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,C,1
,kernel,'linear'
,degree,3
,gamma,1
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [118]:
grid.best_params_

{'C': 1, 'gamma': 1, 'kernel': 'linear'}

In [119]:
grid.best_score_

np.float64(0.980952380952381)

28. Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with
binary features.

In [121]:
X = np.array([
    [1, 0, 1, 0],
    [1, 1, 0, 1],
    [0, 1, 1, 0],
    [0, 0, 0, 1],
    [1, 0, 0, 0],
    [0, 1, 1, 1],
    [1, 1, 1, 0],
    [0, 0, 1, 1]
])
y = np.array([0, 1, 0, 1, 0, 1, 0, 1])

In [122]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

from sklearn.naive_bayes import BernoulliNB
model = BernoulliNB()
model.fit(X_train, y_train)

0,1,2
,alpha,1.0
,force_alpha,True
,binarize,0.0
,fit_prior,True
,class_prior,


In [125]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)


print(f"Accuracy of the Bernoulli Naive Bayes classifier: {accuracy:.4}")

Accuracy of the Bernoulli Naive Bayes classifier: 0.6667


29. Write a Python program to apply feature scaling before training an SVM model and compare results with
unscaled data.

In [126]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

svm_unscaled = SVC()
svm_unscaled.fit(X_train, y_train)

y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)
print(f"Accuracy with unscaled data: {accuracy_unscaled:.4f}")

Accuracy with unscaled data: 1.0000


In [128]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm_scaled = SVC()
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)
print(f"Accuracy with scaled data: {accuracy_scaled:.4f}")

Accuracy with scaled data: 1.0000
