#Supervised Classification: Decision
#Trees, SVM, and Naive Bayes|

#**Assignment**

#Question 1 : What is Information Gain, and how is it used in Decision Trees?

# Answer ->
Information Gain is a measure used in decision tree algorithms to determine how well a feature splits the data into different classes. It calculates the reduction in entropy (uncertainty) after the dataset is split on a particular feature.

In decision trees, the feature with the highest Information Gain is selected as the splitting node because it provides the most useful information for classification. This process is repeated at each node to build the complete decision tree.

  

#Question 2: What is the difference between Gini Impurity and Entropy?

Hint: Directly compares the two main impurity measures, highlighting strengths,
weaknesses, and appropriate use cases.

**Answer **->

  Gini Impurity and Entropy are both measures used in decision tree algorithms to evaluate how impure or mixed a dataset is. They help in selecting the best feature for splitting the data.

Gini Impurity measures the probability that a randomly chosen sample would be incorrectly classified if it were randomly labeled according to the class distribution. It is computationally faster and is commonly used in the CART algorithm.

Entropy measures the level of uncertainty or randomness in the dataset using a logarithmic calculation. It is more mathematically rigorous and is used in algorithms like ID3 and C4.5.

In practice, Gini Impurity is preferred when efficiency and speed are important, while Entropy is used when a more informative and theoretically sound split is desired.

| Aspect        | Gini Impurity                    | Entropy                            |
| ------------- | -------------------------------- | ---------------------------------- |
| Concept       | Probability of misclassification | Measure of uncertainty             |
| Formula       | (1 - \sum p_i^2)                 | (- \sum p_i \log_2 p_i)            |
| Speed         | Faster to compute                | Slower due to log                  |
| Used in       | CART                             | ID3, C4.5                          |
| Best use case | Large datasets                   | When information gain is important |



#Question 3:What is Pre-Pruning in Decision Trees?

Answer:->

 Pre-Pruning is a technique used in decision trees to stop the growth of the tree at an early stage in order to prevent overfitting. In this method, the decision tree is restricted from creating further splits if certain predefined conditions are met.

Common stopping criteria in pre-pruning include setting a maximum tree depth, minimum number of samples required to split a node, or a minimum information gain threshold. By limiting the complexity of the tree, pre-pruning helps improve model generalization and reduces computational cost.

#Question 4:Write a Python program to train a Decision Tree Classifier using Gini
#Impurity as the criterion and print the feature importances (practical).
#Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_.
(Include your Python code and output in the code box below.)

Answer  ->


In [1]:
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Decision Tree Classifier using Gini Impurity
dt = DecisionTreeClassifier(criterion='gini', random_state=42)
dt.fit(X_train, y_train)

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(data.feature_names, dt.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0167
petal length (cm): 0.9061
petal width (cm): 0.0772


#Question 5: What is a Support Vector Machine (SVM)?

Answer->  

 A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It works by finding an optimal hyperplane that best separates data points of different classes in a high-dimensional space.



SVM focuses on the data points closest to the decision boundary, called support vectors, and maximizes the margin between classes. It can handle both linear and non-linear data using kernel functions such as linear, polynomial, and radial basis function (RBF).


#Question 6: What is the Kernel Trick in SVM?

Answer ->

 The Kernel Trick is a technique used in Support Vector Machines (SVM) to handle non-linearly separable data. It allows SVM to transform the original input data into a higher-dimensional feature space where a linear separation becomes possible, without explicitly computing the transformation.

By using kernel functions such as linear, polynomial, and radial basis function (RBF), SVM efficiently finds the optimal separating hyperplane while reducing computational complexity.

#Question 7: Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.
Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting
on the same dataset.
(Include your Python code and output in the code box below.)

Answer ->  

In [2]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracies
linear_accuracy = accuracy_score(y_test, y_pred_linear)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print results
print("Linear Kernel Accuracy:", linear_accuracy)
print("RBF Kernel Accuracy:", rbf_accuracy)


Linear Kernel Accuracy: 1.0
RBF Kernel Accuracy: 0.8055555555555556


#Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

 Answer ->

 The Naïve Bayes classifier is a supervised machine learning algorithm based on Bayes’ Theorem and is mainly used for classification tasks. It calculates the probability of a data point belonging to a particular class based on the probabilities of its features.


It is called “Naïve” because it assumes that all features are independent of each other, which is a strong and often unrealistic assumption. Despite this simplification, Naïve Bayes performs well in many real-world applications such as text classification, spam detection, and sentiment analysis.


#Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
#Bayes, and Bernoulli Naïve Bayes


Answer ->

 Gaussian, Multinomial, and Bernoulli Naïve Bayes are different variants of the Naïve Bayes classifier, each designed for specific types of data distributions.

Gaussian Naïve Bayes assumes that the features follow a normal (Gaussian) distribution. It is mainly used for continuous numerical data such as height, weight, or sensor values.

Multinomial Naïve Bayes is used for discrete count data. It is widely applied in text classification problems like spam detection, where features represent word frequencies or term counts.

Bernoulli Naïve Bayes works with binary features, where the presence or absence of a feature is considered. It is useful when features are represented as yes/no or 0/1 values, such as whether a word appears in a document or not.

#Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from
sklearn.datasets.
(Include your Python code and output in the code box below.)



In [3]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("Accuracy of Gaussian Naive Bayes Classifier:", accuracy)


Accuracy of Gaussian Naive Bayes Classifier: 0.9736842105263158
