#Supervised Classification

Q1. What is information gain and how it is used in decision trees
 - Information Gain is a metric used in decision trees to measure how much a feature helps to reduce uncertainty in the data, or "entropy".
 - It is calculated by subtracting the weighted average of the entropies of the child nodes from the entropy of the parent node. In decision trees, the feature with the highest Information Gain at each node is chosen to split the data, as it is the most effective for separating the different classes

Q2. What is differnce bet Gini impurity and entropy
 - Gini Impurity
   - Strengths:
     - High speed: Its computational efficiency makes it a top choice for large-scale datasets and applications where training time is a priority.
     - Simplicity: The formula is easier to understand and calculate, offering straightforward interpretability.
   - Weaknesses:
     - Bias: Can be biased towards favoring the majority class, potentially overlooking important splits for minority classes.
     - Sensitivity to outliers: It can be more sensitive to noise in the data, which might influence the quality of splits.
 - Entropy
   - Strengths:
     - Robustness: Provides a more nuanced and theoretically-grounded measure of uncertainty, which can be more robust for certain complex datasets.
     - Handles imbalanced data: Better at handling imbalanced datasets by producing more balanced and equitable splits.
   - Weaknesses:
     - Slower computation: The logarithmic function makes it slower to compute, which can be a significant drawback for large datasets.
     - Interpretation: The concept of "information gain" based on entropy is more abstract than the misclassification probability of Gini, making it less intuitive for beginners.

 - Use Gini Impurity when:
You are working with a large dataset and need to prioritize computational efficiency and speed.
Your dataset has a fairly balanced class distribution.
The interpretability of a probabilistic misclassification error is important.
 - Use Entropy when:
You are working with a smaller dataset where the computational overhead is not a concern.
Your dataset has a high degree of class imbalance and you want to ensure the tree handles minority classes effectively.
You require the most theoretically pure split, even at a slight expense of speed.

Q3. What is prepruning in decision tree
 - Pre-pruning, also known as early stopping, involves halting the growth of the decision tree before it becomes fully developed

In [None]:
'''Write a python program to train a decision tree classifier using Gini imputy as the criterion and printbthe feature importance '''

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)


y_pred = clf.predict(X_test)


accuracy = clf.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.2f}")


print("\nFeature Importances:")
for feature, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{feature}: {importance:.4f}")

Model Accuracy: 1.00

Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876


Q5. What is svm
 - A support vector machine (SVM) is a supervised machine learning algorithm that classifies data by finding an optimal line or hyperplane that maximizes the distance between each class in an N-dimensional space

Q6. What is the kernel Trick in svm
 - The kernel trick is a method used in Support Vector Machines (SVM) to classify non-linearly separable data by transforming it into a higher-dimensional space where it becomes linearly separable. Instead of explicitly calculating the coordinates in this new space, the kernel trick uses a kernel function to directly compute the dot products between the transformed data points, which is computationally cheaper and faster. This allows SVMs to find a linear "hyperplane" in the higher dimension that separates the data

In [None]:
'''Write a python program to train two classifier with linear and RBF kernela on yhe wine dataset then compare their accuracies '''


from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score


wine = load_wine()
X = wine.data
y = wine.target


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


svm_linear = SVC(kernel='linear', random_state=42)
svm_rbf = SVC(kernel='rbf', random_state=42)


svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)


acc_linear = accuracy_score(y_test, y_pred_linear)
acc_rbf = accuracy_score(y_test, y_pred_rbf)


print(f"Accuracy with Linear Kernel: {acc_linear:.4f}")
print(f"Accuracy with RBF Kernel: {acc_rbf:.4f}")


if acc_linear > acc_rbf:
    print("\nThe Linear kernel performed better.")
elif acc_rbf > acc_linear:
    print("\nThe RBF kernel performed better.")
else:
    print("\nBoth kernels performed equally well.")

Accuracy with Linear Kernel: 0.9815
Accuracy with RBF Kernel: 0.7593

The Linear kernel performed better.


Q8. What is naive bayes classifier and why is it called naive

 - The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes' theorem that is used for classification tasks. It is called "naïve" because it makes a strong and often unrealistic assumption that all features are independent of each other, meaning one feature's presence does not affect the presence of another

Q9. Explain the diff between gaussian naive bayes multinominal naive bayes and bernoli naive bayes

 - Assumes features follow a Gaussian distribution and estimates the mean and variance for each class   Calculates the probability of each feature based on its frequency or count  Calculates the probability of a feature being present or absent
Common Use Case Classification problems with continuous features like house price prediction or medical diagnosis   Text classification, such as spam detection, based on word frequencies  Text classification, like spam detection, based on the presence or absence of words, especially with shorter document

In [None]:
'''. Write a python program to train a gaussian naive bayes classifier on the breast cancer dataset and evaluate accuracy '''

# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create and train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f}")

# Optional: show a few predicted vs actual values
print("\nSample Predictions:")
print("Predicted:", y_pred[:10])
print("Actual:   ", y_test[:10])

Model Accuracy: 0.9415

Sample Predictions:
Predicted: [1 0 0 1 1 0 0 0 1 1]
Actual:    [1 0 0 1 1 0 0 0 1 1]
