In [None]:
'''Question 1 : What is Information Gain, and how is it used in Decision Trees?

Answer: Information Gain (IG) is a metric used in Decision Trees to decide which feature should be used to split the data at each node.
        Information Gain measures the reduction in uncertainty (entropy) about the target variable after splitting the dataset based on a particular feature.

        In simple words, it tells us how much information a feature provides about the class label.

 Question 2: What is the difference between Gini Impurity and Entropy?

 Answer: Gini Impurity measures the probability of misclassifying a randomly chosen data point if it were labeled according to the class distribution of the node.

          Entropy measures the uncertainty or randomness in the data using information theory.
          Gini Impurity	0 to 0.5 (binary classification)
          Entropy	0 to 1 (binary classification)

           Gini Impurity

      Faster to compute
      Focuses on misclassification probability
      Slightly biased toward larger classes

      Entropy

     Based on information theory
     Measures disorder/uncertainty
     More sensitive to changes in probability

Question 3:What is Pre-Pruning in Decision Trees?

Answer:Pre-Pruning (also called early stopping) is a technique used in decision trees to stop the tree from growing further before it perfectly fits the training data.
       The goal is to prevent overfitting and improve the model’s ability to generalize to unseen data.'''


In [None]:
'''Question 4:Write a Python program to train a Decision Tree Classifier using Gini'''
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset (Iris dataset)
data = load_iris()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Decision Tree Classifier using Gini Impurity
dt_model = DecisionTreeClassifier(criterion='gini', random_state=42)

# Train the model
dt_model.fit(X_train, y_train)

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(data.feature_names, dt_model.feature_importances_):
    print(f"{feature}: {importance:.4f}")



Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0167
petal length (cm): 0.9061
petal width (cm): 0.0772


In [None]:
'''Question 5: What is a Support Vector Machine (SVM)?
Answer:A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks.
 Its main objective is to find an optimal decision boundary (hyperplane) that maximizes the margin between different classes in the feature space.

 Question 6: What is the Kernel Trick in SVM?
Answer:The Kernel Trick is a technique used in Support Vector Machines (SVMs) that allows the algorithm to solve non-linearly separable problems by
      implicitly mapping data into a higher-dimensional feature space, where a linear separation becomes possible, without explicitly computing the transformation.'''

In [None]:
'Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.'

# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling (important for SVM)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Linear SVM
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# Train RBF SVM
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracy
linear_accuracy = accuracy_score(y_test, y_pred_linear)
rbf_accuracy = accuracy_score(y_test, y_pred_rbf)

# Print results
print("Linear SVM Accuracy:", linear_accuracy)
print("RBF SVM Accuracy:", rbf_accuracy)


Linear SVM Accuracy: 0.9722222222222222
RBF SVM Accuracy: 1.0


In [None]:
'''Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
Answer:The Naïve Bayes classifier is a supervised probabilistic machine learning algorithm based on Bayes’ Theorem.
It is mainly used for classification tasks, especially in text classification, spam detection, and sentiment analysis.
It is called “Naïve” because it makes a strong assumption that:

All features are conditionally independent given the class label

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes
Answer:Gaussian Naïve Bayes (GNB)

Type of data: Continuous numerical data

Probability model: Gaussian (mean & variance per feature per class)

Examples: Height, weight, salary, temperature

Use cases: Medical diagnosis, sensor data, real-valued datasets
. Multinomial Naïve Bayes (MNB)

Type of data: Discrete count data

Probability model: Word or event frequencies

Examples: Word counts in documents, number of occurrences

Use cases: Text classification, spam detection, document categorization

Bernoulli Naïve Bayes (BNB)

Type of data: Binary (0 or 1)

Probability model: Presence or absence of features

Examples: Word appears or not in a document

Use cases: Text classification with binary features

In [5]:
'''Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.'''
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)

# Initialize the Gaussian Naïve Bayes classifier
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.9736842105263158

Confusion Matrix:
 [[40  3]
 [ 0 71]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.93      0.96        43
           1       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

