In [None]:
#Question 1 : What is Information Gain, and how is it used in Decision Trees?
'''Information Gain (IG) is a measure used in Decision Trees to determine which feature best splits the data into classes. 
It quantifies the reduction in uncertainty or entropy achieved by splitting the data based on a particular feature. 
The higher the Information Gain, the more useful the feature is for classification. It is calculated as the difference between the entropy 
of the parent node and the weighted average entropy of the child nodes. In simple terms, it tells us how much “information” a feature gives 
about the target variable. Decision Trees select the feature with the highest Information Gain at each split to build the model efficiently.
In Decision Trees, Information Gain is used to decide which feature to split the data on at each node. For every feature, 
the tree calculates the Information Gain — the reduction in entropy (or impurity) after splitting the dataset based on that feature. 
The feature with the highest Information Gain is chosen for the split because it best separates the data into pure subsets. 
This process is repeated recursively at each node until the tree perfectly classifies the data or meets a stopping condition. 
In short, Information Gain helps the tree grow in the most informative and efficient way.'''


In [None]:
#Question 2: What is the difference between Gini Impurity and Entropy?
#Hint: Directly compares the two main impurity measures, highlighting strengths, weaknesses, and appropriate use cases.
'''Both Gini Impurity and Entropy are measures of impurity used in Decision Trees to determine how well a feature splits the data.

| **Aspect**       | **Gini Impurity**                                                              | **Entropy**                                                     |
| ---------------- | ------------------------------------------------------------------------------ | --------------------------------------------------------------- |
| **Definition**   | Measures the probability of incorrectly classifying a randomly chosen element. | Measures the amount of disorder or randomness in the data.      |
| **Formula**      | ( Gini = 1 - \sum p_i^2 )                                                      | ( Entropy = -\sum p_i \log_2(p_i) )                             |
| **Range**        | 0 (pure) to 0.5 (most impure for binary classes)                               | 0 (pure) to 1 (most impure for binary classes)                  |
| **Computation**  | Simpler and faster to compute.                                                 | Slightly more complex due to logarithmic calculations.          |
| **Tendency**     | Prefers larger partitions with dominant classes.                               | More sensitive to class distribution (penalizes impurity more). |
| **Common Usage** | Default measure in **CART** (Classification and Regression Trees).             | Used in **ID3** and **C4.5** Decision Tree algorithms.          |

In summary:
Both aim to create pure nodes, but Gini Impurity is computationally efficient, while Entropy gives a more information-theoretic perspective. 
In practice, they often lead to similar results.'''

In [None]:
#Question 3:What is Pre-Pruning in Decision Trees?
'''Pre-pruning, also known as early stopping, is a technique used to stop the growth of a Decision Tree before it becomes too complex. 
Instead of allowing the tree to grow fully and then pruning it, pre-pruning applies certain conditions during the building process to decide 
when to stop splitting a node.

Common stopping criteria include:

* The node reaches a maximum depth.
* The number of samples in a node is below a set threshold.
* The Information Gain or Gini decrease from a split is too small.
* The node becomes pure (all samples belong to one class).

Pre-pruning helps prevent overfitting, reduces model complexity, and improves generalization by ensuring the tree doesn’t fit noise in the 
training data.'''


In [1]:
#Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).
#Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_.
# Import necessary libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

# Load sample dataset (Iris dataset)
data = load_iris()
X = data.data        # Features
y = data.target      # Target labels

# Create and train Decision Tree Classifier using Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=0)
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
for name, importance in zip(data.feature_names, model.feature_importances_):
    print(f"{name}: {importance:.4f}")


Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0133
petal length (cm): 0.0641
petal width (cm): 0.9226


In [None]:
#Question 5: What is a Support Vector Machine (SVM)?
'''A Support Vector Machine (SVM) is a powerful supervised learning algorithm used for classification and regression tasks. 
It works by finding the optimal hyperplane that best separates the data points of different classes in the feature space.
The goal is to maximize the margin, which is the distance between the hyperplane and the nearest data points from each class, known as support vectors.

SVM can also handle non-linear data using the kernel trick, which transforms data into a higher-dimensional space where it becomes linearly separable. 
It is widely used because of its high accuracy, robustness to overfitting, and effectiveness in high-dimensional spaces such as text classification
and image recognition.

🔹 Key Concepts:

Hyperplane:

A line (in 2D) or plane (in higher dimensions) that separates different classes of data.

SVM tries to find the hyperplane that maximizes the margin — the distance between the hyperplane and the closest data points from each class.

Support Vectors:

The data points closest to the hyperplane that influence its position and orientation.

These are the most critical points in determining the decision boundary.

Margin:

The gap between the support vectors of different classes.

A larger margin generally means better generalization and less overfitting.'''



In [None]:
#Question 6: What is the Kernel Trick in SVM?
'''The Kernel Trick is a mathematical technique used in Support Vector Machines (SVMs) to handle non-linearly separable data. Instead of explicitly
transforming data into a higher-dimensional space, the kernel trick computes the inner products of data points in that higher-dimensional space 
without actually performing the transformation.

This allows SVMs to create non-linear decision boundaries efficiently and with less computation. The idea is to apply a kernel function, 
which measures similarity between two data points, enabling the SVM to find complex boundaries in the original input space.

Common kernel functions include:
Linear Kernel: 
K(x,y)=x⋅y

Polynomial Kernel: 
K(x,y)=(x⋅y+c)d

RBF (Radial Basis Function) Kernel: 
K(x,y)=e−γ∣∣x−y∣∣2
In short, the kernel trick allows SVMs to efficiently solve problems where data cannot be separated by a straight line, by implicitly working
in a higher-dimensional space.'''

In [2]:
#Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
#Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting on the same dataset
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)

# Make predictions
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracies
acc_linear = accuracy_score(y_test, y_pred_linear)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# Print the results
print(f"Accuracy with Linear Kernel: {acc_linear:.4f}")
print(f"Accuracy with RBF Kernel: {acc_rbf:.4f}")


Accuracy with Linear Kernel: 1.0000
Accuracy with RBF Kernel: 0.8056


In [None]:
#Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
'''The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes’ Theorem, used mainly for classification tasks. 
It assumes that all features are independent of each other given the target class — hence the term “naïve.” Despite this simple assumption,
it performs remarkably well in many real-world applications.

The classifier calculates the posterior probability of each class given the input features and assigns the class with the highest probability. 
It is highly efficient, works well with large datasets, and is commonly used in text classification, spam detection, and sentiment analysis. 
Variants include Gaussian, Multinomial, and Bernoulli Naïve Bayes, depending on the data type.

It is called “Naïve” because the algorithm makes a simplifying assumption that all features are independent of each other given the class label. 
In real-world data, features are often correlated (for example, the words “buy” and “offer” in spam emails), but Naïve Bayes ignores these dependencies
to simplify computation. This assumption makes the model mathematically simple and computationally efficient. Despite being “naïve,” it performs 
surprisingly well in many applications like spam detection and text classification, where exact independence isn’t crucial.'''


In [None]:
#Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial NaïveBayes, and Bernoulli Naïve Bayes?
'''
All three are types of Naïve Bayes classifiers, differing mainly in the type of data they are designed to handle and how they model feature 
probabilities.

| **Type**                    | **Data Type**               | **Assumption / Distribution**                                           | **Typical Use Case**                                                      |
| --------------------------- | --------------------------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| **Gaussian Naïve Bayes**    | Continuous (numerical) data | Assumes that features follow a **normal (Gaussian) distribution**       | Used for real-valued features like height, weight, or sensor readings     |
| **Multinomial Naïve Bayes** | Discrete count data         | Assumes features represent **frequency counts** (non-negative integers) | Commonly used in **text classification** (e.g., word counts in documents) |
| **Bernoulli Naïve Bayes**   | Binary data (0s and 1s)     | Assumes features are **Boolean** — presence or absence of a feature     | Used for **binary text features** (e.g., word present or not)             |

In summary:

Gaussian NB → continuous data

Multinomial NB → count data

Bernoulli NB → binary data

Each variant models data differently to match the nature of the input features, improving performance and accuracy.'''

In [3]:
#Question 10: Breast Cancer Dataset
#Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
#Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from sklearn.datasets
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data      # Features
y = data.target    # Target labels

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Gaussian Naïve Bayes classifier
model = GaussianNB()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset: {accuracy:.4f}")


Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset: 0.9737
