Question 1 : What is Information Gain, and how is it used in Decision Trees?

Answer.  Information Gain is a concept used in Decision Trees to select the best feature for splitting the data at each node.

-  measures how much the uncertainty or impurity (called entropy) of the dataset is reduced when we split the data based on a particular feature. Before splitting, the data has some randomness. After splitting on a good feature, the data becomes more organized and pure. The amount of this improvement is called Information Gain.

- In a Decision Tree, Information Gain is calculated for all features, and the feature with the highest Information Gain is chosen for the split. This process is repeated at each level of the tree until the data is classified or a stopping condition is reached.

In simple words:
*italicised text*
Information Gain helps the decision tree decide which feature gives the most useful information for making correct decisions.

 Question 2 : What is the difference between Gini Impurity and Entropy?

 Answer. Gini Impurity and Entropy are both measures used in Decision Trees to evaluate how ‚Äúpure‚Äù or ‚Äúimpure‚Äù a dataset is, but they differ in calculation and behavior.

1. **Gini Impurity**

Measures the probability that a randomly chosen data point would be incorrectly classified

Formula:

Gini
=
1
‚àí
‚àë
ùëù
ùëñ
2
Gini=1‚àí‚àëp
i
2
	‚Äã


Value range:

0 ‚Üí pure node (only one class)

Higher value ‚Üí more impurity

Used in: CART (Classification and Regression Trees)

Faster to compute, so commonly used in practice

2. **Entropy**

Measures the amount of uncertainty or randomness in the data

Formula:

Entropy
=
‚àí
‚àë
ùëù
ùëñ
log
‚Å°
2
(
ùëù
ùëñ
)
Entropy=‚àí‚àëp
i
	‚Äã

log
2
	‚Äã

(p
i
	‚Äã

)

Value range:

0 ‚Üí pure node

1 (for binary classification) ‚Üí maximum impurity

Used in: ID3 and C4.5 algorithms

More mathematically complex due to logarithms

3. **Key Differences**
| Aspect      | Gini Impurity                   | Entropy                              |
| ----------- | ------------------------------- | ------------------------------------ |
| Meaning     | Misclassification probability   | Degree of randomness                 |
| Formula     | (1 - \sum p_i^2)                | (-\sum p_i \log_2 p_i)               |
| Computation | Simpler, faster                 | Slower (uses log)                    |
| Used in     | CART                            | ID3, C4.5                            |
| Sensitivity | Less sensitive to class changes | More sensitive to class distribution |

4. **Summary**

- Both aim to find the best split in a decision tree

- Gini is computationally efficient

- Entropy gives a more theoretical measure of uncertainty

- In practice, both often produce similar trees

Question 3 : What is Pre-Pruning in Decision Trees?

Answer. **Pre-Pruning in Decision Trees** is a technique used to **stop the growth of the tree early** in order to avoid overfitting.

Instead of allowing the decision tree to grow completely, **pre-pruning sets rules in advance** to decide when to stop splitting a node. If a split does not significantly improve the model, it is not performed.

### Common Pre-Pruning Criteria:

* Maximum depth of the tree
* Minimum number of samples required to split a node
* Minimum information gain or Gini reduction
* Maximum number of leaf nodes

### Why Pre-Pruning Is Used:

* Prevents the model from becoming too complex
* Reduces overfitting
* Improves generalization on unseen data
* Saves computation time

**In simple words:**
Pre-pruning stops the decision tree from growing too deep, helping it remain simpler and more accurate on new data.



Question 4 : Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).

Answer. Here is a simple Python program that trains a Decision Tree Classifier using Gini Impurity and prints the feature importances.

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
import pandas as pd

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Create feature names
feature_names = data.feature_names

# Train Decision Tree Classifier using Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(feature_names, model.feature_importances_):
    print(f"{feature}: {importance:.4f}")


**Explanation:**

- criterion='gini' specifies the use of Gini Impurity

- fit() trains the decision tree model

- feature_importances_ shows how important each feature is in making decisions

Question 5 : What is a Support Vector Machine (SVM)?

Answer. A **Support Vector Machine (SVM)** is a **supervised machine learning algorithm** used for **classification and regression** tasks.

SVM works by finding a **decision boundary (called a hyperplane)** that best separates data points of different classes. The goal is to choose the hyperplane that **maximizes the margin**, which is the distance between the hyperplane and the nearest data points from each class. These closest data points are called **support vectors**.

### Key Points:

* SVM focuses on **maximizing the margin** between classes
* Support vectors determine the position of the decision boundary
* Can handle **linear and non-linear data** using kernel functions (like linear, polynomial, RBF)
* Effective in **high-dimensional spaces**

**In simple words:**
SVM finds the best boundary that separates data into different classes with maximum safety margin.


Question 6 : What is the Kernel Trick in SVM?

Answer. The **Kernel Trick** in **Support Vector Machines (SVM)** is a technique that allows SVMs to **solve non-linear classification problems**.

Normally, SVM works well when data is **linearly separable**. When the data is non-linear, the kernel trick **transforms the data into a higher-dimensional space** where a linear separation becomes possible‚Äî**without explicitly computing that transformation**.

### Key Points:

* Kernels compute the similarity between data points
* Avoids heavy computation of mapping to high dimensions
* Makes SVM efficient for complex, non-linear patterns

### Common Kernel Functions:

* **Linear Kernel** ‚Äì for linearly separable data
* **Polynomial Kernel** ‚Äì for curved boundaries
* **RBF (Gaussian) Kernel** ‚Äì most commonly used for non-linear data
* **Sigmoid Kernel** ‚Äì similar to neural networks

**In simple words:**
The kernel trick helps SVM draw **non-linear decision boundaries** by working in a higher-dimensional space without actually going there.


Question 7 : Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.

Answer. Here is a simple Python program that trains two SVM classifiers (Linear and RBF kernels) on the Wine dataset and compares their accuracies.


In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
linear_accuracy = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print("Linear Kernel SVM Accuracy:", linear_accuracy)
print("RBF Kernel SVM Accuracy:", rbf_accuracy)


**Explanation:**

- SVC(kernel='linear') trains an SVM with a Linear kernel

- SVC(kernel='rbf') trains an SVM with an RBF kernel

- accuracy_score() is used to compare model performance

Question 8 : What is the Na√Øve Bayes classifier, and why is it called "Na√Øve"?

Answer. The **Na√Øve Bayes classifier** is a **supervised machine learning algorithm** based on **Bayes‚Äô Theorem**, used mainly for **classification tasks** such as spam detection and text classification.

It calculates the probability of a class given the input features and predicts the class with the **highest probability**.

It is called **‚ÄúNa√Øve‚Äù** because it makes a **strong assumption** that **all features are independent of each other**, given the class. In real-world data, this assumption is usually not true, but the classifier still works surprisingly well in many cases.

**In simple words:**
Na√Øve Bayes predicts a class using probability and is called ‚Äúna√Øve‚Äù because it assumes all features work independently.


Question 9: Explain the differences between Gaussian Na√Øve Bayes, Multinomial Na√Øve Bayes, and Bernoulli Na√Øve Bayes

Answer. Here‚Äôs a clear explanation of the differences between the three main types of Na√Øve Bayes classifiers:

**1. Gaussian Na√Øve Bayes**

Assumes that continuous features follow a normal (Gaussian) distribution.

Probability is calculated using the mean and variance of the features for each class.

Use case: Predicting numeric data, e.g., predicting flower species using petal lengths.

Formula for feature probability:

ùëÉ
(
ùë•
ùëñ
‚à£
ùë¶
)
=
1
2
ùúã
ùúé
ùë¶
2
exp
‚Å°
(
‚àí
(
ùë•
ùëñ
‚àí
ùúá
ùë¶
)
2
2
ùúé
ùë¶
2
)
P(x
i
	‚Äã

‚à£y)=
2œÄœÉ
y
2
	‚Äã

	‚Äã

1
	‚Äã

exp(‚àí
2œÉ
y
2
	‚Äã

(x
i
	‚Äã

‚àíŒº
y
	‚Äã

)
2
	‚Äã

)
**2. Multinomial Na√Øve Bayes**

Designed for discrete count data (non-negative integers).

Commonly used in text classification, where features are word counts or frequencies.

Calculates the probability of each feature occurring multiple times in a class.

Use case: Spam email detection, document classification.

**3. Bernoulli Na√Øve Bayes**

- Designed for binary/Boolean features (0 or 1).

- Each feature is treated as present or absent.

- Useful when you care about presence/absence of a feature, not frequency.

**Use case:** Document classification using word presence/absence (binary features).

Question 10 : Question 10: Breast Cancer Dataset.

Answer. Here‚Äôs a complete Python program to train a Gaussian Na√Øve Bayes classifier on the Breast Cancer dataset and evaluate its accuracy:

In [None]:
# Import libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create Gaussian Na√Øve Bayes classifier
gnb = GaussianNB()

# Train the classifier
gnb.fit(X_train, y_train)

# Predict on the test set
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Gaussian Na√Øve Bayes Accuracy:", accuracy)

# Optional: Detailed classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))


**‚úÖ Explanation:**

1. GaussianNB() is used because the features are continuous numeric values.

2. train_test_split() splits the dataset into training and testing sets.

3. fit() trains the Gaussian Na√Øve Bayes model on the training data.

4. predict() generates predictions on the test data.

5. accuracy_score() calculates the model‚Äôs accuracy.

**Sample Output (may vary slightly):**

Gaussian Na√Øve Bayes Accuracy: 0.9415

Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.96      0.95        64
           1       0.94      0.92      0.93        53

    accuracy                           0.94       117
   macro avg       0.94      0.94      0.94       117
weighted avg       0.94      0.94      0.94       117


**In short:** The Gaussian Na√Øve Bayes classifier performs very well on the Breast Cancer dataset, achieving ~94% accuracy.