# 1. What is Information Gain, and how is it used in Decision Trees?
Information Gain measures how much uncertainty (entropy) is reduced after splitting a dataset on a feature. In decision trees, the feature with the highest information gain is chosen for splitting because it best separates the data.

# 2. What is the difference between Gini Impurity and Entropy?
Gini Impurity measures the probability of incorrect classification, while Entropy measures the level of disorder in data. Gini is computationally faster, while Entropy is based on information theory.

# 3. What is Pre-Pruning in Decision Trees?
Pre-pruning stops the growth of a decision tree early by setting conditions like maximum depth or minimum samples per node, which helps prevent overfitting.

# 4. Decision Tree Classifier using Gini Impurity

In [1]:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y=True)

model = DecisionTreeClassifier(criterion='gini')
model.fit(X, y)

print("Feature Importances:", model.feature_importances_)


Feature Importances: [0.01333333 0.01333333 0.05072262 0.92261071]


# 5. What is a Support Vector Machine (SVM)?
SVM is a supervised learning algorithm that finds the optimal hyperplane which best separates data points of different classes by maximizing the margin.

# 6. What is the Kernel Trick in SVM?
The kernel trick allows SVMs to transform data into a higher-dimensional space without explicitly computing it, enabling classification of non-linearly separable data.

# 7. SVM Classifiers with Linear and RBF Kernels

In [2]:

from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')

svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

pred_linear = svm_linear.predict(X_test)
pred_rbf = svm_rbf.predict(X_test)

print("Linear Kernel Accuracy:", accuracy_score(y_test, pred_linear))
print("RBF Kernel Accuracy:", accuracy_score(y_test, pred_rbf))


Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


# 8. What is the Na誰ve Bayes classifier, and why is it called 'Naive'?
Na誰ve Bayes is a probabilistic classifier based on Bayes' Theorem. It is called na誰ve because it assumes that features are independent of each other.

# 9 Differences between Gaussian, Multinomial, and Bernoulli Naive bayes
Gaussian NB is used for continuous data, Multinomial NB for count-based data like text, and Bernoulli NB for binary features.

# 10. Gaussian Na誰ve Bayes on Breast Cancer Dataset

In [3]:

from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = GaussianNB()
model.fit(X_train, y_train)

pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, pred))


Accuracy: 0.9415204678362573
