**Question 1 : What is Information Gain, and how is it used in Decision Trees?**
-
Information Gain (IG) measures the reduction in entropy or impurity after splitting a dataset based on a particular feature. It quantifies how well a feature separates the training examples according to their target classes.

Formula:

IG(D,A)=Entropy(D)−v∈Values(A)∑​∣D∣∣Dv​∣​×Entropy(Dv​)

where
 - D = dataset
 - A = attribute
 - 𝐷𝑣Dv= subset of 𝐷D where attribute𝐴=𝑣A=v

Use in Decision Trees:
Decision trees (like ID3 and C4.5) use Information Gain to decide which attribute to split the data on at each node. The attribute with the highest information gain is chosen as the decision node because it gives the purest (least uncertain) subsets.

**Question 2: What is the difference between Gini Impurity and Entropy?**
-
| Feature            | **Gini Impurity**                                    | **Entropy**                                            |
| ------------------ | ---------------------------------------------------- | ------------------------------------------------------ |
| **Formula**        | (1 - \sum p_i^2)                                     | (- \sum p_i \log_2 p_i)                                |
| **Range**          | 0 to 0.5                                             | 0 to 1                                                 |
| **Interpretation** | Measures the probability of incorrect classification | Measures the amount of disorder or uncertainty         |
| **Computation**    | Simpler and faster                                   | Slightly more complex (involves logarithm)             |
| **Used in**        | CART algorithm                                       | ID3, C4.5 algorithms                                   |
| **Preference**     | Works well in most cases, less computational cost    | Provides more theoretical understanding of information |

**Question 3:What is Pre-Pruning in Decision Trees?**
-
Pre-Pruning (also called early stopping) prevents overfitting by halting tree growth before it perfectly classifies all training data.

 - Methods include:

-Limiting maximum tree depth (max_depth)

-Minimum samples required to split a node (min_samples_split)

-Minimum samples per leaf (min_samples_leaf)

-Setting a minimum information gain threshold

 - Advantages:

-Reduces overfitting

-Improves model generalization

-Lowers computation time

**Question 5: What is a Support Vector Machine (SVM)?**
-
Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. It finds the optimal hyperplane that separates different classes with the maximum margin.

 - Key Concepts:

Support Vectors: Data points closest to the hyperplane.

Margin: Distance between hyperplane and support vectors.

Objective: Maximize margin to improve generalization.

 - Advantages:

Effective in high-dimensional spaces.

Works well with clear margin of separation.

Can use kernels for non-linear classification.


**Question 6: What is the Kernel Trick in SVM?**
-
The Kernel Trick allows SVMs to perform non-linear classification by transforming data into a higher-dimensional space without explicitly computing coordinates in that space.

 - Common Kernels:

Linear Kernel:K(x,y) = x•y

Polynomial Kernel:K(x,y) = (x•y+c)

RBF (Gaussian) Kernel: K(x,y) = exp(-y|x - y[²)

 - Advantages:

-Handles complex, non-linear data efficiently.

-Reduces computation by using inner products instead of transformations.

**Question 8: What is the Naïve Bayes classifier, and why is it called “Naïve”?**
-
Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem, assuming that all features are independent given the class label.

It’s called “Naïve” because it assumes feature independence — an assumption that is rarely true in real life but still works surprisingly well.

 - Advantages:

-Simple and fast.

-Works well for text classification and spam detection.

-Requires small training data.

**Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes**
-
| Type               | Suitable For            | Assumes                                          | Example Use Case                                      |
| ------------------ | ----------------------- | ------------------------------------------------ | ----------------------------------------------------- |
| **Gaussian NB**    | Continuous data         | Features follow a normal (Gaussian) distribution | Iris dataset, medical data                            |
| **Multinomial NB** | Discrete counts         | Feature values are counts/frequencies            | Text classification (bag of words)                    |
| **Bernoulli NB**   | Binary/boolean features | Features take values 0 or 1                      | Spam detection, sentiment (presence/absence of words) |

Choice depends on the nature of feature values (continuous, counts, or binary).

In [1]:
"""Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical)."""
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Train Decision Tree using Gini Impurity
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

# Print feature importances
for name, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{name}: {importance:.4f}")


sepal length (cm): 0.0133
sepal width (cm): 0.0000
petal length (cm): 0.5641
petal width (cm): 0.4226


In [2]:
"""Question 7: Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.
"""
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
    wine.data, wine.target, test_size=0.3, random_state=42
)

# Train SVM with Linear and RBF kernels
svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')

svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

# Predict and evaluate
acc_linear = accuracy_score(y_test, svm_linear.predict(X_test))
acc_rbf = accuracy_score(y_test, svm_rbf.predict(X_test))

print(f"Linear Kernel Accuracy: {acc_linear:.2f}")
print(f"RBF Kernel Accuracy: {acc_rbf:.2f}")


Linear Kernel Accuracy: 0.98
RBF Kernel Accuracy: 0.76


In [3]:
"""Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
"""
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3, random_state=42
)

# Train model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict and evaluate
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Gaussian Naive Bayes Accuracy: {accuracy:.2f}")


Gaussian Naive Bayes Accuracy: 0.94
