1. What is Information Gain, and how is it used in Decision Trees?

   - Information Gain (IG) is a metric used in Decision Trees to decide which feature to split on at each step.
   - It measures how much uncertainty (impurity) is reduced after splitting a dataset using a particular feature.

   How Information Gain Is Used in Decision Trees:
   - Calculate entropy of the entire dataset
   - For each feature:
     - Split the dataset
     - Calculate entropy of each split
     - Compute Information Gain
   - Choose the feature with maximum Information Gain
   - Repeat recursively until:
     - Data is pure, or
     - No features remain



2. What is the difference between Gini Impurity and Entropy?

   - Entropy:
     - Measure of uncertainity
     - Uses logarithms
     - Computation speed is slower
     - Max Value(Binary) is 1
     - More sensitive to class changes

   - Gini Impurity:
     - Probability of misclassification
     - Uses squares
     - Computation speed is Faster
     - Max Value(Binary) is 0.5
     - Less sensitive

3. What is Pre-Pruning in Decision Trees?

   - Pre-Pruning (also called Early Stopping) is a technique used to stop the growth of a Decision Tree early—before it becomes too deep or complex.

4. Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).

In [1]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load a sample dataset (Iris)
data = load_iris()
X = data.data          # Features
y = data.target        # Target labels
feature_names = data.feature_names

# Create Decision Tree model using Gini Impurity
model = DecisionTreeClassifier(
    criterion='gini',
    random_state=42
)

# Train the model
model.fit(X, y)

# Get feature importances
importances = model.feature_importances_

# Display feature importances clearly
feature_importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values(by='Importance', ascending=False)

print(feature_importance_df)


             Feature  Importance
2  petal length (cm)    0.564056
3   petal width (cm)    0.422611
0  sepal length (cm)    0.013333
1   sepal width (cm)    0.000000


5. What is a Support Vector Machine (SVM)?

   - A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression, mainly known for finding the best possible decision boundary between data points of different classes.

6. What is the Kernel Trick in SVM?

   - The Kernel Trick is a powerful technique used in Support Vector Machines (SVMs) that allows them to solve non-linearly separable problems by implicitly mapping data into a higher-dimensional space—without actually computing that transformation.

7. Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.


In [2]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)

# Predict and calculate accuracy
y_pred_linear = svm_linear.predict(X_test)
linear_accuracy = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)

# Predict and calculate accuracy
y_pred_rbf = svm_rbf.predict(X_test)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print("Linear Kernel SVM Accuracy:", linear_accuracy)
print("RBF Kernel SVM Accuracy:", rbf_accuracy)


Linear Kernel SVM Accuracy: 0.9814814814814815
RBF Kernel SVM Accuracy: 0.7592592592592593


- Linear:
  - Accuracy is higher/similar
  - To use when data is linearly seperable
- RBF:
  - Accuracy is slightly lower or higher
  - To use when data is Complex or non-linear

8. What is the Naïve Bayes classifier, and why is it called "Naïve"?

   - Naïve Bayes is a supervised probabilistic classification algorithm based on Bayes’ Theorem.
   - It predicts the class of a data point by calculating the posterior probability of each class given the input features and choosing the class with the highest probability.

   Why Is It Called “Naïve”?
   - The Naive Bayes classifier is called "naive" because it makes a strong, often unrealistic, assumption that all features in a dataset are independent of each other.

9. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes.

  -  Gaussian Naïve Bayes (GNB):
     - Data Type is Continuos.
     - Normal(Gausian) Distribution.
     - Input values is any real number.

  -  Multinomial Naïve Bayes (MNB):
     - Count-baseddata type.
     - Multinomial Distribution.
     - Input values are 0,1,2,...

  -  Bernoulli Naïve Bayes (BNB):
     - Binary data type.
     - Bernoulli Distribution.
     - Input values are 0 or 1.

10. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy

In [3]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict on test data
y_pred = gnb.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("Gaussian Naïve Bayes Accuracy:", accuracy)


Gaussian Naïve Bayes Accuracy: 0.9415204678362573
