# Question 1 :  What is Information Gain, and how is it used in Decision Trees?

## Information Gain (IG)

Information Gain is a metric used in Decision Trees to decide which feature should be used to split the data at each node.
It tells us how much uncertainty (impurity) is reduced after splitting the dataset on a particular feature.

# Question 2: What is the difference between Gini Impurity and Entropy?
# Hint: Directly compares the two main impurity measures, highlighting strengths,
# weaknesses, and appropriate use cases.

**Gini Impurity** and **Entropy** are two commonly used impurity measures in decision tree algorithms to evaluate the quality of a split. Both aim to measure how mixed or impure a dataset is, but they differ in their interpretation and computation. Entropy measures the amount of uncertainty or randomness in the data and is calculated using logarithmic functions, making it slightly more complex and computationally expensive. It is more sensitive to changes in class probabilities and is often considered more theoretically informative. In contrast, Gini Impurity measures the probability that a randomly selected data point would be incorrectly classified if it were labeled according to the class distribution of the node. It uses squared probabilities, making it simpler and faster to compute. Because of this efficiency, Gini Impurity is widely used in practice, especially for large datasets, and is the default criterion in CART-based decision trees. While entropy can produce slightly purer splits in some cases, in most real-world scenarios both measures yield very similar results, with the choice often depending on performance needs and implementation preference.


# Question 3:What is Pre-Pruning in Decision Trees?

## Pre-Pruning in Decision Trees

Pre-Pruning (also called early stopping) is a technique used in Decision Trees to stop the growth of the tree before it becomes too complex. Instead of allowing the tree to grow fully and then trimming it later, pre-pruning halts further splitting of a node when certain conditions are met, such as when the information gain is too small, the node contains very few samples, or a maximum tree depth has been reached.

# Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).
## Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_.
## (Include your Python code and output in the code box below.)

Below is a practical Python example that trains a Decision Tree Classifier using Gini Impurity and prints the feature importances, exactly as asked.

In [1]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
import pandas as pd

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train Decision Tree with Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Create a DataFrame for feature importances
feature_importances = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': model.feature_importances_
})

# Print feature importances
print(feature_importances)


             Feature  Importance
0  sepal length (cm)    0.013333
1   sepal width (cm)    0.000000
2  petal length (cm)    0.564056
3   petal width (cm)    0.422611


# Question 5: What is a Support Vector Machine (SVM)?

### **Support Vector Machine (SVM)**

A **Support Vector Machine (SVM)** is a **supervised machine learning algorithm** used for **classification and regression** tasks. Its main objective is to find an **optimal decision boundary (called a hyperplane)** that separates data points of different classes with the **maximum possible margin**. The data points that lie closest to this boundary are known as **support vectors**, and they play a critical role in defining the position of the hyperplane.

# Question 6:  What is the Kernel Trick in SVM?

Kernel Trick in Support Vector Machine (SVM)

The Kernel Trick is a powerful technique used in Support Vector Machines (SVMs) to handle non-linearly separable data. When data cannot be separated by a straight line (or hyperplane) in its original feature space, the kernel trick allows SVM to implicitly map the data into a higher-dimensional space where a linear separation becomes possible—without explicitly computing that transformation. This makes the method both efficient and effective.

# Question 7:  Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
## Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting on the same dataset.
## (Include your Python code and output in the code box below.)

## Below is a complete practical Python program that trains two SVM classifiers (Linear and RBF kernels) on the Wine dataset and then compares their accuracies, exactly as asked.

In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
linear_accuracy = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print("Linear Kernel Accuracy:", linear_accuracy)
print("RBF Kernel Accuracy:", rbf_accuracy)


Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


## Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

### **Naïve Bayes Classifier**

The **Naïve Bayes classifier** is a **supervised probabilistic machine learning algorithm** based on **Bayes’ Theorem**, which is used mainly for **classification tasks** such as spam detection, sentiment analysis, and document classification. It predicts the class of a data point by calculating the **posterior probability** of each class given the input features and then choosing the class with the highest probability.

It is called **“Naïve”** because it makes a **strong simplifying assumption** that **all features are conditionally independent of each other given the class label**. In real-world data, this assumption is often not true—for example, in text classification, words are usually related—but surprisingly, Naïve Bayes still performs very well in many practical applications.

Despite its simplicity, Naïve Bayes is **fast, scalable, and effective with high-dimensional data**, especially in text-based problems. However, its performance can degrade when features are highly correlated.


# Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

### **Differences Between Gaussian, Multinomial, and Bernoulli Naïve Bayes**

Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes are three variants of the Naïve Bayes classifier that differ mainly in the **type of data they are designed to handle** and the **probability distribution they assume for features**. **Gaussian Naïve Bayes** assumes that continuous features follow a normal (Gaussian) distribution, making it suitable for real-valued data such as height, weight, or sensor readings. **Multinomial Naïve Bayes** is designed for discrete count-based data and is widely used in text classification problems like spam detection, where features represent word frequencies or term counts. **Bernoulli Naïve Bayes**, on the other hand, works with binary features and models whether a feature is present or absent, which is especially useful for text classification using binary word occurrence rather than counts. In summary, Gaussian Naïve Bayes is best for continuous data, Multinomial Naïve Bayes is ideal for frequency-based discrete data, and Bernoulli Naïve Bayes is suitable when features are binary; the choice depends on the nature of the dataset and feature representation.


# Question 10:  Breast Cancer Dataset Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
### Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from sklearn.datasets.
### (Include your Python code and output in the code box below.)

### Below is a complete practical Python program that trains a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluates its accuracy, exactly as asked.

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("Gaussian Naïve Bayes Accuracy:", accuracy)


Gaussian Naïve Bayes Accuracy: 0.9415204678362573
