Q1. What is Information Gain, and how is it used in Decision Trees?

Ans: Information Gain is a metric used in decision trees to measure how much a feature can reduce the entropy (or impurity) of a dataset. It is used to select the best feature to split a node, with the goal being to choose the feature that results in the most homogenous child nodes.
How Information Gain is used in Decision Trees   
-  Feature selection: At each internal node, the algorithm evaluates all candidate features to see which one provides the highest Information Gain.
- Splitting: The feature with the greatest Information Gain is chosen as the attribute to split the current node into subsets.
- Calculating Information Gain: The formula for Information Gain is the entropy of the parent node minus the weighted average entropy of the child nodes after the split.
- Entropy: A measure of impurity or uncertainty in a dataset. A dataset with a single class has zero entropy, while a dataset with an even mix of classes has maximum entropy.


Q2. What is the difference between Gini Impurity and Entropy?

Ans:Gini Impurity and Entropy are both metrics used in decision trees to measure the purity of a node, but they differ in their calculation and range. Gini Impurity is a faster, less computationally expensive metric that calculates the probability of misclassification, with a range of 0 to 0.5 for binary classification. Entropy measures randomness or disorder, with a range of 0 to 1, and requires more computation due to its logarithmic function.

Q3. :What is Pre-Pruning in Decision Trees?

Ans: Pre-pruning, or early stopping, is a technique in decision trees that stops the tree from growing during construction to prevent overfitting. It uses criteria like maximum depth or a minimum number of samples per split to halt the process before the tree becomes overly complex, resulting in a simpler and more interpretable model.  

Q4. Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).

In [2]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Load the Iris dataset
iris = load_iris()
X = iris.data       # feature variables
y = iris.target     # target variable

# Split the dataset into training and testing data (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Create a Decision Tree Classifier with Gini Impurity as the criterion
clf = DecisionTreeClassifier(criterion='gini', random_state=0)

# Train (fit) the model
clf.fit(X_train, y_train)

# Print the feature importances
print("Feature Importances:")
for feature_name, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{feature_name}: {importance:.4f}")


Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0215
petal length (cm): 0.3977
petal width (cm): 0.5808


Q5.What is a Support Vector Machine (SVM)?

Ans: A Support Vector Machine (SVM) is a supervised machine learning algorithm that classifies data by finding the best-fitting boundary, or hyperplane, to separate different classes. It works by identifying the hyperplane that maximizes the margin, or distance, between itself and the closest data points of each class. These closest data points are called "support vectors," as they are crucial in defining the position and orientation of the hyperplane.  

Q6.What is the Kernel Trick in SVM?

Ans: The Kernel Trick is a mathematical technique that allows SVMs to implicitly map data into a higher-dimensional space without actually computing the transformation.

This makes it possible to find a linear decision boundary in that high-dimensional space, which corresponds to a non-linear boundary in the original space.

why its needed:

- In many real-world datasets, the data is not linearly separable.

- If you can’t separate data with a straight line (in 2D), maybe you can separate it with a curve.

- The kernel trick allows SVMs to find such curved boundaries without explicitly computing new features.

Q7.Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.


In [5]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create two SVM classifiers: one with Linear kernel and one with RBF kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_rbf = SVC(kernel='rbf', random_state=42)

# Train both models
svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

# Make predictions
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# Compute accuracies
acc_linear = accuracy_score(y_test, y_pred_linear)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# Print results
print("Accuracy Comparison of SVM Models:\n")
print(f"Linear Kernel SVM Accuracy: {acc_linear:.4f}")
print(f"RBF Kernel SVM Accuracy:    {acc_rbf:.4f}")


Accuracy Comparison of SVM Models:

Linear Kernel SVM Accuracy: 0.9815
RBF Kernel SVM Accuracy:    0.7593


Q8.What is the Naïve Bayes classifier, and why is it called "Naïve"?

Ans:The Naïve Bayes classifier is a probabilistic classification algorithm that uses Bayes' Theorem with a strong "naïve" assumption of conditional independence among its features. It is called "naïve" because this assumption is often unrealistic in real-world data, but the classifier still performs well in practice, especially for tasks like spam filtering and text classification.  

Q9.Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes

Ans:The main difference is the type of data each classifier handles: Gaussian Naïve Bayes is for continuous data (like height or weight), Multinomial Naïve Bayes is for discrete counts (like word frequencies in a document), and Bernoulli Naïve Bayes is for binary data (presence or absence of a feature, like if a word is in a document or not). Each model uses a different probability distribution for its features: Gaussian uses a normal distribution, Multinomial uses the multinomial distribution, and Bernoulli uses the Bernoulli distribution.

Q10. Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from
sklearn.datasets.


In [6]:
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Target labels (0 = malignant, 1 = benign)

# Split the dataset into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Gaussian Naïve Bayes classifier
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Display results
print("Gaussian Naïve Bayes Classifier Results")
print("----------------------------------------")
print(f"Accuracy: {accuracy:.4f}")


Gaussian Naïve Bayes Classifier Results
----------------------------------------
Accuracy: 0.9415
