# Supervised Classification:


1. What is Information Gain, and how is it used in Decision Trees?

- Information Gain is a metric used in decision trees to measure how well a feature splits the data into pure subsets. It is defined as the reduction in entropy after a dataset is split based on a feature. In other words, it quantifies how much ‚Äúinformation‚Äù a feature gives about the class labels.

Decision trees use Information Gain to select the best attribute to split on at each node. The attribute with the highest Information Gain is chosen because it produces the most homogeneous child nodes, improving classification accuracy.




2. What is the difference between Gini Impurity and Entropy?

- Gini Impurity measures how often a randomly chosen sample would be misclassified if labels were assigned according to class proportions.

Formula:
1
‚àí
‚àë
ùëù
ùëñ
2
1‚àí‚àëp
i
2
	‚Äã


Faster to compute (no logarithms).

Often used in CART decision trees.

Entropy measures the amount of uncertainty or randomness in the data.

Formula:
‚àí
‚àë
ùëù
ùëñ
log
‚Å°
2
ùëù
ùëñ
‚àí‚àëp
i
	‚Äã

log
2
	‚Äã

p
i
	‚Äã


Slower to compute but more theoretically grounded (from information theory).

Used in ID3, C4.5 trees.

In simple terms: Gini is computationally simpler, while Entropy gives a more accurate measure of disorder.

3. What is Pre-Pruning in Decision Trees?

- Pre-pruning (also called early stopping) is a technique used to stop a decision tree from growing too deep and overfitting the training data. The tree stops splitting a node based on certain conditions, such as:

Minimum number of samples required to split

Maximum depth of the tree

Minimum improvement in impurity

Minimum leaf size

Pre-pruning helps control model complexity and improves generalization to unseen data.


In [5]:
#4. Write a Python program to train a Decision Tree Classifier using Gini
#Impurity as the criterion and print the feature importances

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Train Decision Tree with Gini Impurity
clf = DecisionTreeClassifier(criterion='gini')
clf.fit(X, y)

# Print feature importances
for name, importance in zip(data.feature_names, clf.feature_importances_):
    print(f"{name}: {importance}")


sepal length (cm): 0.026666666666666658
sepal width (cm): 0.0
petal length (cm): 0.05072262479871173
petal width (cm): 0.9226107085346216


5. What is a Support Vector Machine (SVM)?

- A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. SVM attempts to find the best separating hyperplane that maximizes the margin between classes. The points closest to the margin are called support vectors, and they determine the decision boundary.

SVM works well in high-dimensional spaces and can handle both linear and non-linear decision boundaries using kernels.

6. What is the Kernel Trick in SVM?

- The Kernel Trick is a technique that allows SVMs to learn non-linear decision boundaries by implicitly mapping data into a higher-dimensional space without computing the transformation explicitly.

Common kernels:

Linear

Polynomial

RBF (Gaussian)

Sigmoid

The Kernel Trick makes SVMs highly powerful for complex datasets.

In [6]:
#7. Write a Python program to train two SVM classifiers with Linear and RBF
#kernels on the Wine dataset, then compare their accuracies ?

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Linear SVM
clf_linear = SVC(kernel='linear')
clf_linear.fit(X_train, y_train)
acc_linear = accuracy_score(y_test, clf_linear.predict(X_test))

# RBF SVM
clf_rbf = SVC(kernel='rbf')
clf_rbf.fit(X_train, y_train)
acc_rbf = accuracy_score(y_test, clf_rbf.predict(X_test))

print("Linear Kernel Accuracy:", acc_linear)
print("RBF Kernel Accuracy:", acc_rbf)



Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


8.  What is the Na√Øve Bayes classifier, and why is it called "Na√Øve"?

- Naive Bayes is a probabilistic classification algorithm based on Bayes‚Äô Theorem, which predicts class membership probabilities based on feature likelihoods.

It is called ‚ÄúNaive‚Äù because it assumes that all features are independent, which is rarely true in real-world data. Despite this unrealistic assumption, Naive Bayes works surprisingly well in many applications like text classification and spam detection.

9.  Explain the differences between Gaussian Na√Øve Bayes, Multinomial Na√Øve
Bayes, and Bernoulli Na√Øve Bayes

- The three Na√Øve Bayes models differ based on the type of data they are designed for:

1. Gaussian Na√Øve Bayes

Used for continuous numerical features.

Assumes that features follow a normal (Gaussian) distribution.

Suitable for datasets like iris measurements, medical data, sensor readings, etc.

2. Multinomial Na√Øve Bayes

Used for count-based features, such as word frequencies.

Commonly applied in text classification, spam filtering, NLP tasks.

Assumes feature values represent counts (non-negative integers).

3. Bernoulli Na√Øve Bayes

Used for binary features (0/1).

Each feature indicates presence/absence of a word or event.

Works well for text classification when using binary bag-of-words models.

Summary:

Gaussian NB ‚Üí continuous data

Multinomial NB ‚Üí count data

Bernoulli NB ‚Üí binary data

In [7]:
"""10.Breast Cancer Dataset
Write a Python program to train a Gaussian Na√Øve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy."""

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Train Gaussian Naive Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy on Breast Cancer Dataset:", accuracy)


Accuracy on Breast Cancer Dataset: 0.9415204678362573
