## Q5. Use Naive bayes, K-nearest, and Decision tree classification algorithms and build classifiers. Divide the data set into training and test set. Compare the accuracy of the different classifiers under the following situations:

5.1 a) Training set = 75% Test set = 25% b) Training set = 66.6% (2/3rd of total), Test set = 33.3%

5.2 Training set is chosen by i) hold out method ii) Random subsampling iii) Cross-Validation. Compare the accuracy of the classifiers obtained.

5.3 Data is scaled to standard format.


In [None]:
#importing important libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

In [None]:
# Load the breast cancer dataset
data = load_breast_cancer()

#### 5.1 a)

In [None]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=42)


In [None]:
# Creating the classifiers
nb = GaussianNB()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()

In [None]:
#training the classifiers
nb.fit(X_train, y_train)
knn.fit(X_train, y_train)
dt.fit(X_train, y_train)

In [None]:
# Evaluate the classifiers on the test set
nb_acc = accuracy_score(y_test, nb.predict(X_test))
knn_acc = accuracy_score(y_test, knn.predict(X_test))
dt_acc = accuracy_score(y_test, dt.predict(X_test))

print("Naive Bayes accuracy:", nb_acc)
print("K-Nearest Neighbors accuracy:", knn_acc)
print("Decision Tree accuracy:", dt_acc)

Naive Bayes accuracy: 0.958041958041958
K-Nearest Neighbors accuracy: 0.965034965034965
Decision Tree accuracy: 0.951048951048951


#### 5.1. b)

In [None]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.33, random_state=42)

In [None]:
# Create and train the classifiers
nb = GaussianNB()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()

nb.fit(X_train, y_train)
knn.fit(X_train, y_train)
dt.fit(X_train, y_train)


In [None]:

# Evaluate the classifiers on the test set
nb_acc = accuracy_score(y_test, nb.predict(X_test))
knn_acc = accuracy_score(y_test, knn.predict(X_test))
dt_acc = accuracy_score(y_test, dt.predict(X_test))


In [None]:
print("Naive Bayes accuracy:", nb_acc)
print("K-Nearest Neighbors accuracy:", knn_acc)
print("Decision Tree accuracy:", dt_acc)

Naive Bayes accuracy: 0.9414893617021277
K-Nearest Neighbors accuracy: 0.9521276595744681
Decision Tree accuracy: 0.9148936170212766


#### 5.2 i) hold-out method

In [None]:
# Split the dataset into training and test sets using the holdout method
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=42)


In [None]:
# Create and train the classifiers
nb = GaussianNB()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()

nb.fit(X_train, y_train)
knn.fit(X_train, y_train)
dt.fit(X_train, y_train)


In [None]:
# Evaluate the classifiers on the test set
nb_acc = accuracy_score(y_test, nb.predict(X_test))
knn_acc = accuracy_score(y_test, knn.predict(X_test))
dt_acc = accuracy_score(y_test, dt.predict(X_test))

In [None]:
# Evaluate the classifiers using the holdout method
print("Holdout method:")
print("Naive Bayes accuracy:", nb_acc)
print("K-Nearest Neighbors accuracy:", knn_acc)
print("Decision Tree accuracy:", dt_acc)

Holdout method:
Naive Bayes accuracy: 0.958041958041958
K-Nearest Neighbors accuracy: 0.965034965034965
Decision Tree accuracy: 0.951048951048951


#### 5.2 ii) Random sub-sampling

In [None]:
# Split the dataset into training and test sets using random subsampling
nb_accs = []
knn_accs = []
dt_accs = []


In [None]:
for i in range(10):
    X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25)
    
    nb = GaussianNB()
    knn = KNeighborsClassifier()
    dt = DecisionTreeClassifier()

    nb.fit(X_train, y_train)
    knn.fit(X_train, y_train)
    dt.fit(X_train, y_train)

    nb_acc = accuracy_score(y_test, nb.predict(X_test))
    knn_acc = accuracy_score(y_test, knn.predict(X_test))
    dt_acc = accuracy_score(y_test, dt.predict(X_test))
    
    nb_accs.append(nb_acc)
    knn_accs.append(knn_acc)
    dt_accs.append(dt_acc)

In [None]:

# Evaluate the classifiers using random subsampling
print("Random subsampling:")
print("Naive Bayes accuracy:", sum(nb_accs) / len(nb_accs))
print("K-Nearest Neighbors accuracy:", sum(knn_accs) / len(knn_accs))
print("Decision Tree accuracy:", sum(dt_accs) / len(dt_accs))

Random subsampling:
Naive Bayes accuracy: 0.9314685314685315
K-Nearest Neighbors accuracy: 0.9237762237762237
Decision Tree accuracy: 0.9132867132867133


#### 5.2 iii)Cross validation Method

In [None]:
# Evaluate the classifiers using cross-validation
from sklearn.model_selection import cross_val_score

nb = GaussianNB()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()

In [None]:
nb_accs = cross_val_score(nb, data.data, data.target, cv=10)
knn_accs = cross_val_score(knn, data.data, data.target, cv=10)
dt_accs = cross_val_score(dt, data.data, data.target, cv=10)


In [None]:
print("Cross-validation:")
print("Naive Bayes accuracy:", nb_accs.mean())
print("K-Nearest Neighbors accuracy:", knn_accs.mean())
print("Decision Tree accuracy:", dt_accs.mean())

Cross-validation:
Naive Bayes accuracy: 0.9367794486215537
K-Nearest Neighbors accuracy: 0.9297619047619046
Decision Tree accuracy: 0.9191416040100251


5.3

In [None]:
# Scale the data to standard format
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Create and train the classifiers on the scaled data
nb = GaussianNB()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()

nb.fit(X_train_scaled, y_train)
knn.fit(X_train_scaled, y_train)
dt.fit(X_train_scaled, y_train)



In [None]:
# Evaluate the classifiers on the test set
nb_acc = accuracy_score(y_test, nb.predict(X_test_scaled))
knn_acc = accuracy_score(y_test, knn.predict(X_test_scaled))
dt_acc = accuracy_score(y_test, dt.predict(X_test_scaled))


In [None]:
print("Scaled data accuracy:")
print("Naive Bayes accuracy:", nb_acc)
print("K-Nearest Neighbors accuracy:", knn_acc)
print("Decision Tree accuracy:", dt_acc)

Scaled data accuracy:
Naive Bayes accuracy: 0.8951048951048951
K-Nearest Neighbors accuracy: 0.958041958041958
Decision Tree accuracy: 0.9370629370629371
