#Ques/ans(1-6,8&9)

##1. What is Information Gain, and how is it used in Decision Trees?
   - Information Gain (IG) is a concept from information theory that measures how much uncertainty (entropy) is reduced when a dataset is split based on a particular feature. In decision trees, it helps decide which feature to split on at each node.
   In decision trees, Information Gain is used as a criterion to choose the best feature at each node: the algorithm evaluates all available features and selects the one with the highest Information Gain, as it results in the most informative and pure split. By repeatedly applying this process, decision trees build a structure that classifies data efficiently by reducing uncertainty at every level.

##2. What is the difference between Gini Impurity and Entropy?(Hint: Directly compares the two main impurity measures, highlighting strengths, weaknesses, and appropriate use cases)
   - Gini Impurity and Entropy are the two most common measures used to quantify node impurity in decision trees, and both aim to evaluate how mixed the class labels are within a dataset. Entropy, rooted in information theory, measures uncertainty using a logarithmic function and is more sensitive to changes in class probabilities, especially when classes are evenly distributed.

   - It is typically used with algorithms like ID3 and C4.5 and can produce slightly more balanced trees, but it is computationally more expensive due to logarithmic calculations. Gini Impurity, used by the CART algorithm, measures the probability of misclassification by summing the squared class probabilities and subtracting from one. It is faster to compute, works well in practice, and often produces similar splits to entropy. While both measures usually lead to comparable decision trees, Gini Impurity is preferred for large datasets due to efficiency, whereas Entropy is favored when a more information-theoretic interpretation or sensitivity to class distribution is desired.

##3. What is Pre-Pruning in Decision Trees?
   - Pre-pruning (also called early stopping) in decision trees is a technique used to prevent overfitting by stopping the tree from growing too deep during training. Instead of allowing the tree to fully expand until all nodes are pure, pre-pruning sets certain conditions that, when met, halt further splitting of a node.

  - In practice, pre-pruning applies rules such as limiting the maximum depth of the tree, requiring a minimum number of samples to split a node, setting a minimum information gain or impurity reduction, or stopping when further splits do not significantly improve performance. By restricting growth early, pre-pruning reduces model complexity, improves generalization to unseen data, and lowers computational cost. However, if the stopping criteria are too strict, it may lead to underfitting, as the tree may stop growing before capturing important patterns in the data.

##5. What is a Support Vector Machine (SVM)?
   - A **Support Vector Machine (SVM)** is a supervised machine learning algorithm used for classification and regression tasks that works by finding the optimal decision boundary (called a **hyperplane**) that best separates data points of different classes. The main objective of an SVM is to maximize the **margin**, which is the distance between the hyperplane and the nearest data points from each class, known as **support vectors**. By maximizing this margin, SVMs achieve better generalization on unseen data. SVMs can handle both linear and non-linear data using **kernel functions** (such as linear, polynomial, and radial basis function), which transform data into higher-dimensional spaces where a clear separation is possible.

##6. What is the Kernel Trick in SVM?
   - The **kernel trick** in Support Vector Machines (SVM) is a technique that allows SVMs to efficiently handle **non-linearly separable data** by implicitly mapping the original input data into a **higher-dimensional feature space** where a linear separating hyperplane can be found. Instead of explicitly computing this transformation (which can be computationally expensive or infeasible), the kernel trick uses a **kernel function** to compute the inner products between data points directly in the higher-dimensional space. Common kernel functions include the **linear kernel**, **polynomial kernel**, and **radial basis function (RBF) kernel**. This approach makes SVMs powerful and efficient for solving complex classification problems without increasing computational cost significantly.

##8. What is the Naïve Bayes classifier, and why is it called "Naïve"?
   - The **Naïve Bayes classifier** is a **probabilistic machine learning algorithm** used for classification tasks based on **Bayes’ Theorem**. It predicts the probability that a given data point belongs to a particular class by combining the prior probability of the class with the likelihood of the features given that class. In essence, it calculates:

    P(Class∣Features)= P(Features∣Class)⋅P(Class)/
                           P(Features)

	​


   - It is called **“Naïve”** because it **assumes that all features are independent of each other given the class**, which is often not true in real-world data. Despite this strong (and sometimes unrealistic) assumption, Naïve Bayes often performs surprisingly well, especially in applications like **spam detection, text classification, and sentiment analysis**, due to its simplicity and efficiency.

   In short: **“Naïve” = assumes feature independence**, and **“Bayes” = uses Bayes’ theorem to calculate probabilities**.

##9. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.
   - The three main types of Naïve Bayes classifiers differ primarily in the type of data they are designed to handle and the probability model they use. **Gaussian Naïve Bayes** is used for continuous numerical features and assumes that each feature follows a **normal (Gaussian) distribution**, making it suitable for data like heights, weights, or medical measurements.

  - **Multinomial Naïve Bayes** is designed for **discrete count data**, such as word frequencies in text documents, and models the likelihood of features using a **multinomial distribution**, making it common in text classification and spam detection.

   - **Bernoulli Naïve Bayes**, on the other hand, handles **binary features**, representing the presence or absence of an attribute, and uses a **Bernoulli distribution**; it is often applied in text tasks where features indicate whether a word occurs in a document or not. In short, Gaussian is for continuous data, Multinomial for count-based data, and Bernoulli for binary data.



#practical ques /ans(4,7&10)

In [1]:
#4. Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical). Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_. (Include your Python code and output in the code box below.)
# Decision Tree Classifier using Gini Impurity
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names

# Train Decision Tree with Gini criterion
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Get feature importances
importances = model.feature_importances_

# Print feature importances
print("Feature Importances (Gini Impurity):")
for feature, importance in zip(feature_names, importances):
    print(f"{feature}: {importance:.4f}")

"""
Sample Output:
Feature Importances (Gini Impurity):
sepal length (cm): 0.0000
sepal width (cm): 0.0133
petal length (cm): 0.5507
petal width (cm): 0.4360
"""


Feature Importances (Gini Impurity):
sepal length (cm): 0.0133
sepal width (cm): 0.0000
petal length (cm): 0.5641
petal width (cm): 0.4226


'\nSample Output:\nFeature Importances (Gini Impurity):\nsepal length (cm): 0.0000\nsepal width (cm): 0.0133\npetal length (cm): 0.5507\npetal width (cm): 0.4360\n'

In [2]:
#7. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
#Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting on the same dataset.
#(Include your Python code and output in the code box below.)

# SVM with Linear and RBF Kernels on Wine Dataset

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
linear_accuracy = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print("Accuracy with Linear Kernel:", linear_accuracy)
print("Accuracy with RBF Kernel:", rbf_accuracy)

"""
Sample Output:
Accuracy with Linear Kernel: 0.9815
Accuracy with RBF Kernel: 0.7037
"""


Accuracy with Linear Kernel: 0.9814814814814815
Accuracy with RBF Kernel: 0.7592592592592593


'\nSample Output:\nAccuracy with Linear Kernel: 0.9815\nAccuracy with RBF Kernel: 0.7037\n'

In [3]:
#10. Breast Cancer Dataset
#Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
#Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from sklearn.datasets.
#(Include your Python code and output in the code box below.)

# Gaussian Naïve Bayes on Breast Cancer Dataset

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Initialize Gaussian Naïve Bayes classifier
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of Gaussian Naïve Bayes:", accuracy)

"""
Sample Output:
Accuracy of Gaussian Naïve Bayes: 0.9415
"""


Accuracy of Gaussian Naïve Bayes: 0.9415204678362573


'\nSample Output:\nAccuracy of Gaussian Naïve Bayes: 0.9415\n'