# **Decision Tree**

Question 1: What is Information Gain, and how is it used in Decision Trees?

Answer


Information Gain is a metric used to train decision trees by measuring the reduction in "entropy" or randomness after a dataset is split on a specific attribute. It helps the algorithm decide which feature should be placed at a node to best separate the data into distinct classes.

- The process follows these steps:

1. Calculate Entropy: Measure the impurity of the current dataset.

2. Split Data: Temporarily split the data based on a feature.

3. Calculate Weighted Entropy: Find the average entropy of the resulting branches.

4. Subtract: Information Gain = (Original Entropy) - (Weighted Entropy of the split). The attribute with the highest Information Gain is selected as the splitting node.

Question 2: What is the difference between Gini Impurity and Entropy?

Answer

 - Here is the difference between Gini Impurity and Entropy presented in simple lines (points):

1. **Concept**: Gini Impurity measures the probability of mislabeling a randomly chosen element, while Entropy measures the amount of information or disorder (uncertainty) in the system.

2. **Formula Base**: Gini Impurity uses the sum of squared probabilities, whereas Entropy uses logarithmic calculations (base 2).

3. **Computational Cost**: Gini Impurity is computationally faster because squaring numbers is easier for a processor than calculating logarithms, which Entropy requires.

4. **Value Range**: Gini Impurity ranges from 0 to 0.5 (for binary classification), whereas Entropy ranges from 0 to 1.

5. **Maximum Impurity**: A Gini value of 0.5 indicates maximum impurity (randomness), while for Entropy, a value of 1.0 indicates maximum uncertainty.

6. **Sensitivity**: Entropy is slightly more sensitive to changes in the probability distribution and tends to penalize impurities more heavily than Gini.

7. **Tree Structure**: Gini Impurity tends to isolate the most frequent class in its own branch, whereas Entropy tends to create slightly more balanced trees.

8. **Algorithms**: Gini is the default criterion for the CART algorithm (used in standard Decision Trees), while Entropy is used in ID3 and C4.5 algorithms.

9. **Practicality**: In 95% of real-world use cases, both yield very similar model performance, so Gini is often preferred simply for its speed.

Question 3: What is Pre-Pruning in Decision Trees?

Answer

 **Pre-pruning**, also known as "early stopping," involves halting the growth of a decision tree before it perfectly classifies the training set. This is done to prevent overfitting.

- Common pre-pruning techniques include:

1. Setting a maximum depth for the tree.

2. Setting a minimum number of samples required to split a node.

3. Defining a threshold for the minimum Information Gain required to continue splitting.

In [6]:
'''
Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical)?
Answer
'''

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train Classifier with Gini Impurity [cite: 22]
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

# Print Feature Importances [cite: 22]
feature_importances = pd.Series(clf.feature_importances_, index=iris.feature_names)
print("Feature Importances:")
print(feature_importances.sort_values(ascending=False))

Feature Importances:
petal length (cm)    0.564056
petal width (cm)     0.422611
sepal length (cm)    0.013333
sepal width (cm)     0.000000
dtype: float64


# **Support Vector Machines (SVM)**

Question 5: What is a Support Vector Machine (SVM)?

Answer

A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression. Its primary goal is to find the optimal hyperplane in an N-dimensional space that maximizes the "margin" between data points of different classes. The points closest to the hyperplane that influence its position are called Support Vectors.

Question 6: What is the Kernel Trick in SVM?

Answer

The Kernel Trick is a mathematical technique that allows SVMs to solve non-linear problems by projecting the data into a higher-dimensional space where a linear separator (hyperplane) can be found. Instead of performing expensive transformations, it uses "kernel functions" (like RBF or Polynomial) to calculate the inner products of the data in that high-dimensional space directly.

In [7]:
'''
Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies ?
Answer
'''
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3, random_state=42)

# Linear Kernel [cite: 33]
linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
linear_acc = accuracy_score(y_test, linear_svm.predict(X_test))

# RBF Kernel [cite: 33]
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
rbf_acc = accuracy_score(y_test, rbf_svm.predict(X_test))

print(f"Linear Kernel Accuracy: {linear_acc:.4f}")
print(f"RBF Kernel Accuracy: {rbf_acc:.4f}")



Linear Kernel Accuracy: 0.9815
RBF Kernel Accuracy: 0.7593


# **Naive Bayes**

Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

Answer

Naive Bayes is a classification algorithm based on Bayes' Theorem. It is called "Naive" because it makes the strong (and often unrealistic) assumption that all features are independent of each other given the class label. For example, in a fruit classifier, a fruit may be considered an apple if it is red, round, and 3 inches in diameter; Naïve Bayes assumes each of these features contributes independently to the probability, regardless of any correlations.

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial
Naïve Bayes, and Bernoulli Naïve Bayes ?

Answer

These variants are chosen based on the distribution of the features:

- Gaussian NB: Used when features follow a normal (Gaussian) distribution (e.g., height, weight).

- Multinomial NB: Used for discrete counts (e.g., word counts in text classification).

- Bernoulli NB: Used for binary/boolean features (e.g., whether a word occurs in a document or not).

In [8]:
'''
Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy ?

Answer
'''

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset [cite: 45]
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42)

# Train GaussianNB [cite: 45]
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Evaluate Accuracy [cite: 44]
y_pred = gnb.predict(X_test)
print(f"Gaussian Naïve Bayes Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Gaussian Naïve Bayes Accuracy: 0.9415
