# Classification Model Selection

## Data preprocessing

✔️ Import the necessary libraries.

✔️ Load dataset (Breast_Cancer.csv).

❌ Our dataset doesn't have any missing data.

❌ Our dataset doesn't have any string data.

✔️ We have 684 data. So, we can split and have 75% for the training set and 25% for the testing set. 

✔️ Applying feature scaling for the dataset will improve the performance of the model.

In [15]:
# Import libraries....
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# setting two digits after decimal point...
np.set_printoptions(precision=2)


In [16]:
dataset = pd.read_csv(r"tic-tac-toe.csv")
X = dataset.iloc[:, [0,1,2,3,4,5,6,7,8]].values
y = dataset.iloc[:, -1].values

In [17]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for i in range(0, 8+1):
    X[:, i] = le.fit_transform(X[:, i])
print(X)

[[2 0 0 ... 0 0 0]
 [2 0 0 ... 0 0 0]
 [2 0 0 ... 0 0 0]
 ...
 [2 1 0 ... 0 0 0]
 [2 1 0 ... 0 0 0]
 [2 1 0 ... 0 0 0]]


## Train and evaluate the performance of Logistic Regression Classification


In [18]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.linear_model import LogisticRegression
logistic_regression_calssifier = LogisticRegression()
logistic_regression_calssifier.fit(X, y)

# Testing....
y_pred = logistic_regression_calssifier.predict(X)

# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_logistic_regression_classification = accuracy_score(
    y_true=y, y_pred=y_pred)
print("Accuracy score for Logistic Regression Classification :",
      acc_logistic_regression_classification)


[[ 0  0  0  0 13 49  2  0  0]
 [ 0  0  0  0 13 49  2  0  0]
 [ 0  0  0  0 13 49  2  0  0]
 [ 0  0  0  0 13 49  2  0  0]
 [ 0  0  0  0 15 49  0  0  0]
 [ 0  0  0  0  0 64  0  0  0]
 [ 0  0  0  0 13 49  2  0  0]
 [ 0  0  0  0 13 49  2  0  0]
 [ 0  0  0  0 13 49  2  0  0]]
Accuracy score for Logistic Regression Classification : 0.140625


## Train and evaluate the performance of K Nearest Neighbor Classification

In [19]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.neighbors import KNeighborsClassifier
k_nn_calssifier = KNeighborsClassifier(n_neighbors=5, p=2, metric="minkowski")
k_nn_calssifier.fit(X, y)

# Testing....
y_pred = k_nn_calssifier.predict(X)


# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_k_nearest_neighbor_classification = accuracy_score(
    y_true=y, y_pred=y_pred)
print("Accuracy score for K Nearest Neighbor Classification :",
      acc_k_nearest_neighbor_classification)


[[63  1  0  0  0  0  0  0  0]
 [48 15  1  0  0  0  0  0  0]
 [49 12  3  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [48 13  3  0  0  0  0  0  0]
 [48 13  3  0  0  0  0  0  0]
 [48 13  3  0  0  0  0  0  0]
 [48 13  3  0  0  0  0  0  0]
 [48 13  3  0  0  0  0  0  0]]
Accuracy score for K Nearest Neighbor Classification : 0.140625


## Train and evaluate the performance of (SVC) Support Vector Classification

In [20]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.svm import SVC
svc_calssifier = SVC(kernel="linear", )
svc_calssifier.fit(X, y)

# Testing....
y_pred = svc_calssifier.predict(X)

# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_svc_support_vector_classification = accuracy_score(
    y_true=y, y_pred=y_pred)
print("Accuracy score for (SVC) Support Vector Classification :",
      acc_svc_support_vector_classification)


[[ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  2 13 49]
 [ 0  0  0  0  0  0  0 15 49]
 [ 0  0  0  0  0  0  0  0 64]]
Accuracy score for (SVC) Support Vector Classification : 0.140625


## Train and evaluate the performance of Kernel (SVC) Support Vector Classification

In [21]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.svm import SVC
kernel_svc_calssifier = SVC(kernel="rbf", )
kernel_svc_calssifier.fit(X, y)

# Testing....
y_pred = kernel_svc_calssifier.predict(X)

# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_kernel_support_vector_classification = accuracy_score(
    y_true=y, y_pred=y_pred)
print("Accuracy score for Kernel (SVC) Support Vector Classification :",
      acc_kernel_support_vector_classification)


[[12  0  0  0  5  1  2 11 33]
 [ 5  7  0  0  5  1  2 11 33]
 [11  6  1  0  5  1  2 10 28]
 [11  6  1  0  0  0  2 11 33]
 [10  6  1  0  5  0  2 10 30]
 [10  6  1  0  4  1  2  9 31]
 [10  6  1  0  4  1  2  9 31]
 [10  6  1  0  4  1  0 12 30]
 [10  6  1  0  4  1  0  1 41]]
Accuracy score for Kernel (SVC) Support Vector Classification : 0.140625


## Train and evaluate the performance of Naive Bayes Classification

In [22]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.naive_bayes import GaussianNB
naive_bayes_calssifier = GaussianNB()
naive_bayes_calssifier.fit(X, y)

# Testing....
y_pred = naive_bayes_calssifier.predict(X)

# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_naive_bayes_classification = accuracy_score(y_true=y, y_pred=y_pred)
print("Accuracy score for Naive Bayes Classification :",
      acc_naive_bayes_classification)


[[64  0  0  0  0  0  0  0  0]
 [49 15  0  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]]
Accuracy score for Naive Bayes Classification : 0.140625


## Train and evaluate the performance of Decision Tree Classification

In [23]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.tree import DecisionTreeClassifier
decision_tree_calssifier = DecisionTreeClassifier(
    criterion='entropy', )
decision_tree_calssifier.fit(X, y)

# Testing....
y_pred = decision_tree_calssifier.predict(X)

# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_decision_tree_classification = accuracy_score(y_true=y, y_pred=y_pred)
print("Accuracy score for Decision Tree Classification :",
      acc_decision_tree_classification)


[[64  0  0  0  0  0  0  0  0]
 [49 15  0  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]
 [49 13  2  0  0  0  0  0  0]]
Accuracy score for Decision Tree Classification : 0.140625


## Train and evaluate the performance of Random Forest Classification

In [24]:
# Training....
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.ensemble import RandomForestClassifier
random_forest_calssifier = RandomForestClassifier(
    n_estimators=10, criterion='entropy')
random_forest_calssifier.fit(X, y)

# Testing....
y_pred = random_forest_calssifier.predict(X)

# Confusion Matrix....
print(confusion_matrix(y_true=y, y_pred=y_pred))
# Score....
acc_random_forest_classification = accuracy_score(y_true=y, y_pred=y_pred)
print("Accuracy score for Random Forest Classification :",
      acc_random_forest_classification)


[[11  8  5  7  5  7  7  8  6]
 [ 8  9  7  8  1  9 11  5  6]
 [ 9  5  8  7  4  8 10  7  6]
 [ 6  5  6  9  6  8 11  6  7]
 [ 9  9  8  8  7  6  8  4  5]
 [10  5  6  7  7  9  7  7  6]
 [ 9  8  6  4  5  7 12  7  6]
 [ 8  6  4  9  7  6 10  8  6]
 [ 9  8  6  7  7  4 10  5  8]]
Accuracy score for Random Forest Classification : 0.140625


## Which is best for given dataset ?

In [25]:
accuracy_score_list = {
    "Logistic Regression": acc_logistic_regression_classification,
    "K Nearest Neighbor Classification": acc_k_nearest_neighbor_classification,
    "(SVC) Support Vector Classification)": acc_svc_support_vector_classification,
    "Kernel (SVC) Support Vector Classification": acc_kernel_support_vector_classification,
    "Naive Bayes Classification": acc_naive_bayes_classification,
    "Decision Tree Classification": acc_decision_tree_classification,
    "Random Forest Classification": acc_random_forest_classification,
}
# Print final result of all model....
for model, accuracy in accuracy_score_list.items():
    print(f"{model} with accuracy score : {accuracy}")

# find best of them....
best_of_them = max(accuracy_score_list.values())

# Print best of them....
for model, r2 in accuracy_score_list.items():
    if r2 == best_of_them:
        print_me = f"{model} is the best model for given dataset 🥳 with Accuracy score {r2}"
        print("🎉" * (len(print_me) // 2))
        print(print_me)
        print("🎉" * (len(print_me) // 2))
        break


Logistic Regression with accuracy score : 0.140625
K Nearest Neighbor Classification with accuracy score : 0.140625
(SVC) Support Vector Classification) with accuracy score : 0.140625
Kernel (SVC) Support Vector Classification with accuracy score : 0.140625
Naive Bayes Classification with accuracy score : 0.140625
Decision Tree Classification with accuracy score : 0.140625
Random Forest Classification with accuracy score : 0.140625
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
Logistic Regression is the best model for given dataset 🥳 with Accuracy score 0.140625
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉


**Note:** Above result is only for the dataset (Breast_Cancer.csv) which we were given as the input. If you change the dataset, the result also changes certainly.