Ques-1>What is Information Gain, and how is it used in Decision Trees?

Ans>Information Gain (IG)-> is a metric used in Decision Trees to measure how well a feature separates the training data into target classes.
It tells us how much uncertainty (impurity) is reduced after splitting the dataset on a particular feature.

Information Gain Formula
IG(S,A)=Entropy(S)−∑(∣S∣∣Sv​∣​×Entropy(Sv​))


Where:


S = original dataset


A = attribute (feature)

Sv= subset after split


How Information Gain is Used in Decision TreesL:

1.Calculate the entropy of the full dataset

2.For each feature:

   Split the data based on feature values

  Calculate entropy for each split

3.Compute Information Gain for each feature

4.Select the feature with maximum Information Gain as the splitting node

5.Repeat the process recursively for child nodes


Ques-2> What is the difference between Gini Impurity and Entropy?

Ans>Gini Impurity:
Gini Impurity measures how often a randomly chosen data point would be incorrectly classified if it were randomly labeled according to the class distribution.

 Formula>  
 Gini=1−∑pi2​

Entropy:
Entropy measures the amount of uncertainty or randomness in the dataset.

Entropy=−∑pi​log2​(pi​)


Difference between Gini Impurity and Entropy:

| Aspect      | Gini Impurity                          | Entropy                            |
| ----------- | -------------------------------------- | ---------------------------------- |
| Concept     | Measures misclassification probability | Measures randomness or uncertainty |
| Range       | 0 to 0.5 (binary classification)       | 0 to 1 (binary classification)     |
| Best Value  | 0 (pure node)                          | 0 (pure node)                      |
| Worst Case  | 0.5 (equal class split)                | 1 (equal class split)              |
| Computation | Simpler and faster                     | Slightly more complex (logarithms) |
| Used In     | CART Decision Trees                    | ID3, C4.5 Decision Trees           |
| Sensitivity | Less sensitive to class changes        | More sensitive to small changes    |


Ques 3>What is Pre-Pruning in Decision Trees?

Ans>Pre-Pruning is a technique used in Decision Trees where the growth of the tree is stopped early before it becomes too complex.
The tree is not allowed to fully grow if further splitting does not significantly improve performance.

Purpose of Pre-Pruning:

1.To prevent overfitting

2.To improve generalization on unseen data

3.To reduce tree complexity and training time

In [None]:
# Ques 4>Write a Python program to train a Decision Tree Classifier using Gini
#Impurity as the criterion and print the feature importances (practical).
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset (Iris dataset)
data = load_iris()
X = data.data        # Features
y = data.target      # Target labels

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create Decision Tree Classifier using Gini Impurity
dt = DecisionTreeClassifier(criterion='gini', random_state=42)

# Train the model
dt.fit(X_train, y_train)

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(data.feature_names, dt.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876


Ques 5>What is a Support Vector Machine (SVM)?

Ans>A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression problems.
It works by finding the optimal hyperplane that best separates data points of different classes.

Ques 6>What is the Kernel Trick in SVM?

Ans>The Kernel Trick is a technique used in SVM to handle non-linearly separable data by implicitly mapping data into a higher-dimensional feature space, where a linear separator can be found.

How the Kernel Trick Works:

!.Instead of computing new features, SVM uses a kernel function

2.The kernel computes the inner product of data points in a higher-dimensional space

3.This allows SVM to find a linear hyperplane in higher dimensions, which corresponds to a non-linear boundary in original space

In [None]:
#Ques 7>Write a Python program to train two SVM classifiers with Linear and RBF
#kernels on the Wine dataset, then compare their accuracies.

# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print results
print("Accuracy using Linear Kernel:", accuracy_linear)
print("Accuracy using RBF Kernel:", accuracy_rbf)


Accuracy using Linear Kernel: 0.9814814814814815
Accuracy using RBF Kernel: 0.7592592592592593


Ques 8>: What is the Naïve Bayes classifier, and why is it called "Naïve"?

Ans>The Naïve Bayes classifier is a supervised machine learning algorithm based on Bayes’ Theorem.
It is mainly used for classification problems, especially in text classification and spam detection.

P(C∣X)=P(X∣C)P(C)/p(x)

P(C∣X) = Posterior probability of class

C given features X

P(X∣C) = Likelihood


P(C) = Prior probability of class


P(X) = Evidence


Why It Is Called “Naïve”?

The classifier is called naïve because it makes a strong assumption:

 All features are conditionally independent of each other given the class label.

This assumption is often not true in real-world data, but the algorithm still performs surprisingly well.

Ques 9>Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes


Ans> 1. Gaussian Naïve Bayes

Used when features are continuous and follow a normal (Gaussian) distribution

Assumes data is distributed according to a bell-shaped curve

Commonly used in medical data, sensor data, and numerical datasets

Example: Height, weight, temperature

2. Multinomial Naïve Bayes

Used for discrete count-based features

Commonly applied in text classification problems

Works well with word frequencies or term counts

Example: Number of times a word appears in a document

3. Bernoulli Naïve Bayes

Used for binary features (0 or 1)

Focuses on whether a feature is present or absent

Suitable for binary text features

Example: Word appears or does not appear in a document


| Aspect              | Gaussian NB               | Multinomial NB         | Bernoulli NB               |
| ------------------- | ------------------------- | ---------------------- | -------------------------- |
| Type of Features    | Continuous                | Discrete counts        | Binary                     |
| Data Distribution   | Gaussian (Normal)         | Multinomial            | Bernoulli                  |
| Typical Use Case    | Numerical datasets        | Text classification    | Binary text classification |
| Feature Values      | Real numbers              | Integers (0, 1, 2, …)  | 0 or 1                     |
| Sensitivity         | Sensitive to distribution | Sensitive to frequency | Sensitive to presence      |
| Example Application | Medical diagnosis         | Spam detection         | Sentiment analysis         |


In [3]:
#Ques 10>Question 10: Breast Cancer Dataset
#Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
#dataset and evaluate accuracy.
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Target labels

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create Gaussian Naïve Bayes classifier
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("Accuracy of Gaussian Naïve Bayes Classifier:", accuracy)


Accuracy of Gaussian Naïve Bayes Classifier: 0.9415204678362573
