Question 1 : What is Information Gain, and how is it used in Decision Trees?

---

Answer:

Information Gain measures how much uncertainty (entropy) is reduced after splitting a dataset based on a particular feature.

In simple terms:

It tells us which feature gives the most “information” about the target variable.

Entropy (Measure of Impurity)

Before understanding Information Gain, we need Entropy, which measures how mixed the data is.

Entropy(S) = - Summation (p(i) log2 p(i))

Entropy = 0 → perfectly pure

Entropy = 1 → highly impure (for binary classification)

Information Gain Formula

IG(S,A)= Entropy(S) - Summation (Sv)/(S) x Entropy(S)

Where:

S = original dataset
A = feature used for splitting
Sv= subset of data where feature A has value v

How Information Gain is Used in Decision Trees

Calculate entropy of the entire dataset

For each feature:

Split the dataset based on feature values

Calculate entropy for each split

Compute Information Gain

Select the feature with the highest Information Gain

Repeat the process recursively for child nodes

This approach is commonly used in ID3 and C4.5 decision tree algorithms.

Example (Conceptual)

If splitting on “Weather” reduces uncertainty more than splitting on “Temperature”, then “Weather” will be chosen as the root node.


---



---



Question 2: What is the difference between Gini Impurity and Entropy?
Hint: Directly compares the two main impurity measures, highlighting strengths,
weaknesses, and appropriate use cases.

---

Answer:
Gini Impurity measures the probability that a randomly selected data point would be misclassified based on the node’s class distribution.

Entropy measures the amount of uncertainty or randomness in the data using concepts from information theory.

Gini Impurity is computationally faster because it does not involve logarithmic calculations.

Entropy is computationally more expensive due to the use of logarithms.

Gini Impurity is less sensitive to small changes in class probabilities.

Entropy is more sensitive to changes, especially when nodes are near pure.

Gini Impurity is commonly used in the CART decision tree algorithm (e.g., in scikit-learn).

Entropy is commonly used in ID3 and C4.5 decision tree algorithms.

Gini Impurity is preferred for large datasets where speed is important.

Entropy is preferred when theoretical interpretability and balanced splits are more important.

Practical Insight

In practice, both often produce very similar trees, and performance differences are usually negligible. Choice often depends on:

Algorithm used

Dataset size

Computational constraints


Appropriate Use Cases

Scenario	------------------------------------------------- Preferred Measure

Large datasets / speed critical	------------------------------Gini Impurity

Need for interpretability / theory-------------------------------Entropy

Standard ML libraries----------------------------------------Gini Impurity

Academic or conceptual models	-------------------------------- Entropy



---



---



Question 3:What is Pre-Pruning in Decision Trees?

---

Answer:
Pre-pruning in Decision Trees is a technique used to stop the tree from growing too deep during training in order to prevent overfitting.

It works by setting stopping conditions in advance, so the tree does not split a node if certain criteria are not met. Instead of growing the full tree and cutting it back later, pre-pruning controls complexity while the tree is being built.

Common pre-pruning criteria include:

Limiting the maximum depth of the tree

Setting a minimum number of samples required to split a node

Setting a minimum number of samples required in a leaf node

Requiring a minimum information gain or impurity decrease for a split

The main advantage of pre-pruning is that it reduces overfitting, training time, and model complexity. However, if the stopping rules are too strict, it can lead to underfitting, where the model fails to capture important patterns in the data.



---



---



Question 4:Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).
Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_.
(Include your Python code and output in the code box below.)

---



In [1]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create and train the Decision Tree model using Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Get feature importances
importances = model.feature_importances_

# Display feature importances with feature names
feature_importance_df = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': importances
})

print(feature_importance_df)


             Feature  Importance
0  sepal length (cm)    0.013333
1   sepal width (cm)    0.000000
2  petal length (cm)    0.564056
3   petal width (cm)    0.422611


Question 5: What is a Support Vector Machine (SVM)?

---

Answer:
A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression.

It works by finding an optimal decision boundary called a hyperplane.

The goal of SVM is to maximize the margin between different classes.

The margin is the distance between the hyperplane and the closest data points.

These closest data points are known as support vectors.

Maximizing the margin helps improve model generalization.

SVM can handle linearly separable data.

It can also handle non-linearly separable data using kernel functions.

Common kernels include linear, polynomial, and radial basis function (RBF).

SVM performs well in high-dimensional feature spaces and is resistant to overfitting.

Example:

Suppose we want to classify emails as Spam or Not Spam.

Each email is represented using features such as word frequency and email length.

SVM finds a line (in 2D) or a plane (in higher dimensions) that best separates spam and non-spam emails.

The emails closest to the separating line become the support vectors.

By maximizing the distance from these emails, SVM creates a robust classifier.


---



---



Question 6: What is the Kernel Trick in SVM?

---
Answer:
The Kernel Trick is a technique used in Support Vector Machines to handle non-linearly separable data.

It allows SVM to transform data into a higher-dimensional space where separation is possible.

Instead of explicitly computing this transformation, the kernel trick computes inner products directly.

This makes computation efficient even in very high or infinite dimensions.
A kernel function measures similarity between pairs of data points.

Using kernels, SVM can create non-linear decision boundaries in the original space.

Common kernel functions include Linear, Polynomial, and Radial Basis Function (RBF).

The kernel trick enables SVM to solve complex classification problems efficiently.


---



---




Question 7: Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.
Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting
on the same dataset.
(Include your Python code and output in the code box below.)

---



In [2]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print(f"Accuracy of SVM with Linear Kernel: {accuracy_linear:.2f}")
print(f"Accuracy of SVM with RBF Kernel: {accuracy_rbf:.2f}")


Accuracy of SVM with Linear Kernel: 0.98
Accuracy of SVM with RBF Kernel: 0.76


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?


---

Answer:
The Naïve Bayes classifier is a probabilistic machine learning algorithm used for classification tasks.

It is based on Bayes’ Theorem, which calculates the probability of a class given the features.

The formula is:

P(class|Features) = P(Features|class) x P(class) / P(features)

It is called “Naïve” because it assumes that all features are independent of each other, even though in real-world data this is rarely true.

This “naïve” assumption simplifies computation and allows the model to work efficiently on high-dimensional data.

Example:
Suppose we want to classify an email as Spam or Not Spam.

Features could include words like “offer”, “win”, or “free”.

Naïve Bayes calculates the probability of an email being spam based on the presence of these words, assuming each word contributes independently to the final decision.

Despite the independence assumption, Naïve Bayes often performs surprisingly well in text classification, email filtering, and sentiment analysis.


---



---



Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes


---
Answer:
***Gaussian Naïve Bayes:***

Assumes that the features follow a normal (Gaussian) distribution.

Used for continuous numerical data.

Calculates probabilities using the mean and variance of each feature.

Example:

Predicting whether a person has diabetes based on continuous features like blood sugar or BMI.

***Multinomial Naïve Bayes:***

Designed for discrete count data, such as word counts in text classification.

Assumes features represent the frequency or occurrence of events.

Commonly used in document classification or spam detection.

Example:

Classifying emails as spam based on the number of times certain words appear.

***Bernoulli Naïve Bayes:***

Works with binary/boolean features, representing presence or absence.

Each feature is either 0 (absent) or 1 (present).

Often used in text classification with binary word occurrence.

Example:

 Determining if an email is spam based on whether certain keywords are present or not, ignoring frequency.


---



---



Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from
sklearn.datasets.
(Include your Python code and output in the code box below.)

---



In [3]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train the Gaussian Naïve Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Gaussian Naïve Bayes classifier: {accuracy:.2f}")


Accuracy of Gaussian Naïve Bayes classifier: 0.94
