# **QUESTIONS**


#1. What is Information Gain, and how is it used in Decision Trees?

**Information Gain (IG)** is a metric used to measure the reduction in entropy (or disorder) achieved by splitting a dataset according to a specific feature.

In the context of **Decision Trees:**

- **Selection Criterion:** It is used as a criterion to decide which feature to split on at each node of the tree. The algorithm calculates the Information Gain for every possible feature and selects the one that yields the highest gain.

- **Process:**

  - Calculate the Entropy of the target variable for the parent node.

  - Calculate the **Weighted Average Entropy** of the children nodes resulting from a split on a specific feature.

  - **IG = Entropy(Parent) - Weighted Average Entropy(Children).**

- **Goal:** A high Information Gain implies that the split has successfully separated the classes, resulting in more "pure" (homogeneous) child nodes.

#2. What is the difference between Gini Impurity and Entropy?

Both are metrics used to measure the quality of a split in Decision Trees, but they calculate "impurity" differently.
| Feature | Gini Impurity | Entropy |
| :--- | :--- | :--- |
| **Definition** | Measures probability of misclassification. | Measures disorder or uncertainty. |
| **Formula** | $1 - \sum (p_i)^2$ | $- \sum p_i \log_2(p_i)$ |
| **Computation** | **Faster** (simple arithmetic). | **Slower** (logarithmic calculations). |
| **Range** | 0 to 0.5 | 0 to 1.0 |
| **Use Case** | Good for large datasets. | Good for information theory analysis. |

#3. What is Pre-Pruning in Decision Trees?
**Pre-Pruning** (also known as "Early Stopping") is a technique used to prevent a Decision Tree from growing too complex and **overfitting** the training data.

Instead of letting the tree grow until every leaf is pure (which usually models noise), we halt the growth of the tree earlier based on specific conditions (hyperparameters). Common pre-pruning parameters include:

- **Max Depth:** Limiting the maximum number of levels in the tree.

- **Min Samples Split:** Requiring a minimum number of samples in a node to justify a new split.

- **Min Samples Leaf:** Ensuring that every leaf node has at least a certain number of samples.

#4. Write a Python program to train a Decision Tree Classifier using Gini  Impurity as the criterion and print the feature importances (practical).

The following code trains a Decision Tree on the Iris dataset (a standard standard dataset for this task) using criterion='gini' and prints the importance of each feature.

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

# 1. Load the dataset (Iris)
data = load_iris()
X = data.data
y = data.target
feature_names = data.feature_names

# 2. Split the data (optional but good practice)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Initialize Decision Tree with Gini Impurity
clf = DecisionTreeClassifier(criterion='gini', random_state=42)

# 4. Train the model
clf.fit(X_train, y_train)

# 5. Get and print feature importances
importances = clf.feature_importances_

print("Feature Importances:")
for name, score in zip(feature_names, importances):
    print(f"{name}: {score:.4f}")

# Example Output:
# Feature Importances:
# sepal length (cm): 0.0000
# sepal width (cm): 0.0167
# petal length (cm): 0.5700
# petal width (cm): 0.4133

Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0167
petal length (cm): 0.9061
petal width (cm): 0.0772


#5. What is a Support Vector Machine (SVM)?

A **Support Vector Machine (SVM)** is a powerful supervised learning algorithm used for classification and regression tasks.

- **Core Concept:** The goal of SVM is to find the optimal **hyperplane** (a decision boundary) that best separates the data points of different classes.

- **Margin:** It chooses the hyperplane that has the **maximum margin**, which is the distance between the hyperplane and the nearest data points from either class.

- **Support Vectors:** These "nearest data points" that define the margin are called **Support Vectors.** They are the most critical elements of the dataset; if you removed other points, the boundary wouldn't change, but moving a support vector changes the boundary.

#6. What is the Kernel Trick in SVM?

The **Kernel Trick** is a mathematical technique that allows SVM to solve **non-linear** classification problems.

- **Problem:** Standard SVM finds a linear boundary (a straight line or flat plane). Many real-world datasets are not linearly separable (e.g., concentric circles).

- **Solution:** The kernel function projects the original data from a lower-dimensional space (2D) into a **higher-dimensional space** (3D or more).

- **Result:** In this higher dimension, the complex, non-linear relationship often becomes **linearly separable.** The "trick" is that the algorithm calculates these high-dimensional relationships without actually transforming the data coordinates, saving massive computational power.

#7. Write a Python program to train two SVM classifiers with Linear and RBF  kernels on the Wine dataset, then compare their accuracies.

The following code trains two SVMs on the **Wine dataset** and compares their performance.

In [2]:
from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# 2. Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Train SVM with Linear Kernel
svc_linear = SVC(kernel='linear', random_state=42)
svc_linear.fit(X_train, y_train)
y_pred_linear = svc_linear.predict(X_test)
acc_linear = accuracy_score(y_test, y_pred_linear)

# 4. Train SVM with RBF (Radial Basis Function) Kernel
svc_rbf = SVC(kernel='rbf', random_state=42)
svc_rbf.fit(X_train, y_train)
y_pred_rbf = svc_rbf.predict(X_test)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# 5. Compare Accuracies
print(f"Accuracy with Linear Kernel: {acc_linear:.4f}")
print(f"Accuracy with RBF Kernel:    {acc_rbf:.4f}")

# Example Output:
# Accuracy with Linear Kernel: 0.9815
# Accuracy with RBF Kernel:    0.6667 (Note: RBF often requires scaling data to perform well)

Accuracy with Linear Kernel: 0.9815
Accuracy with RBF Kernel:    0.7593


#8. What is the Naïve Bayes classifier, and why is it called "Naïve"?

- **Naïve Bayes** is a probabilistic classifier based on **Bayes' Theorem.** It predicts the probability that a given data point belongs to a particular class.

- It is called "naïve" because it makes a **strong (and often unrealistic) assumption of independence** between features. It assumes that the presence of one feature (e.g., "Red" color) is completely unrelated to the presence of any other feature (e.g., "Round" shape), given the class label. Despite this simplification, it often performs surprisingly well in real-world scenarios like spam filtering.

#9. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.

The difference lies in the assumption they make about the distribution of the data (features):

1. **Gaussian Naïve Bayes:**

    - **Data Type:** Used when features are **continuous** values (e.g., height, weight, temperature).

    - **Assumption:** It assumes that the continuous values associated with each class follow a **Normal (Gaussian) distribution** (bell curve).

2. **Multinomial Naïve Bayes:**

    - **Data Type:** Used for **discrete counts** (e.g., word counts in text classification).

    - **Assumption:** It models the data using a Multinomial distribution. It cares about the **frequency** of a feature (e.g., how many times the word "Win" appears in an email).

3. **Bernoulli Naïve Bayes:**

    - **Data Type:** Used for **binary/boolean** features (e.g., 0 or 1, Yes or No).

    - **Assumption:** It models the data using a Bernoulli distribution. It only cares about the **presence or absence** of a feature (e.g., does the word "Win" appear at all?), ignoring the frequency.

#10. **Breast Cancer Dataset**
# Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.

The following code trains a Gaussian Naïve Bayes model on the **Breast Cancer dataset** and evaluates its accuracy.

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# 2. Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Initialize Gaussian Naive Bayes
gnb = GaussianNB()

# 4. Train the model
gnb.fit(X_train, y_train)

# 5. Predict and Evaluate
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Gaussian Naive Bayes Accuracy: {accuracy:.4f}")

# Example Output:
# Gaussian Naive Bayes Accuracy: 0.9415

Gaussian Naive Bayes Accuracy: 0.9415
