Bank Note Authentication System-KNN and Naive
---
-------------

We will now test the bank note authentication system with K-Nearest Neighbours (KNN) and Naive Bayes classifiers.


The "Banknote Authentication Data Set" from the UCI Machine Learning Repository is the dataset that was utilised.


In [18]:
# Import necessary libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [19]:
# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.txt"  # (or) Can load the datset from UCI Machine Learning Repository

column_names = ["Variance", "Skewness", "Curtosis", "Entropy", "Class"]

df = pd.read_csv(url, names=column_names)


In [20]:
# Display the first few rows of the dataset

df.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [21]:
# Display the shape of the dataset

df.shape

(1372, 5)

In [22]:
# Display descriptive statistics of the DataFrame

df.describe()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
count,1372.0,1372.0,1372.0,1372.0,1372.0
mean,0.433735,1.922353,1.397627,-1.191657,0.444606
std,2.842763,5.869047,4.31003,2.101013,0.497103
min,-7.0421,-13.7731,-5.2861,-8.5482,0.0
25%,-1.773,-1.7082,-1.574975,-2.41345,0.0
50%,0.49618,2.31965,0.61663,-0.58665,0.0
75%,2.821475,6.814625,3.17925,0.39481,1.0
max,6.8248,12.9516,17.9274,2.4495,1.0


In [23]:
# Check for missing values in the dataset

df.isnull().sum()

Variance    0
Skewness    0
Curtosis    0
Entropy     0
Class       0
dtype: int64

In [24]:
# Display information about the dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1372 entries, 0 to 1371
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Variance  1372 non-null   float64
 1   Skewness  1372 non-null   float64
 2   Curtosis  1372 non-null   float64
 3   Entropy   1372 non-null   float64
 4   Class     1372 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 53.7 KB


In [25]:
# Display the count of each class in the 'Class' column

df['Class'].value_counts()

Class
0    762
1    610
Name: count, dtype: int64

In [26]:
# Check if there are any missing values in the DataFrame

df.isna().any()

Variance    False
Skewness    False
Curtosis    False
Entropy     False
Class       False
dtype: bool

In [27]:
# Display the columns present in the DataFrame

df.columns

Index(['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class'], dtype='object')

In [28]:
# Separate features (X) and target variable (y)

X = df.drop("Class", axis=1)

y = df["Class"]

In [29]:
# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [30]:
# Standardize the features using StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

Applying the K-Nearest Neighbours (KNN) Classification

In [31]:
# K-Nearest Neighbors (KNN) Classifier

knn_classifier = KNeighborsClassifier(n_neighbors=5)

knn_classifier.fit(X_train_scaled, y_train)

y_pred_knn = knn_classifier.predict(X_test_scaled)

In [32]:
# Evaluate KNN Classifier

print("K-Nearest Neighbors (KNN) Classifier:")

print("Accuracy:", accuracy_score(y_test, y_pred_knn))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_knn))

print("Classification Report:\n", classification_report(y_test, y_pred_knn))

K-Nearest Neighbors (KNN) Classifier:
Accuracy: 1.0
Confusion Matrix:
 [[148   0]
 [  0 127]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       148
           1       1.00      1.00      1.00       127

    accuracy                           1.00       275
   macro avg       1.00      1.00      1.00       275
weighted avg       1.00      1.00      1.00       275



Applying the Naive Bayes Classifier

In [33]:
# Naive Bayes Classifier

nb_classifier = GaussianNB()

nb_classifier.fit(X_train_scaled, y_train)

y_pred_nb = nb_classifier.predict(X_test_scaled)

In [34]:
# Evaluate Naive Bayes Classifier

print("\nNaive Bayes (NB) Classifier:")

print("Accuracy:", accuracy_score(y_test, y_pred_nb))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_nb))

print("Classification Report:\n", classification_report(y_test, y_pred_nb))


Naive Bayes (NB) Classifier:
Accuracy: 0.8072727272727273
Confusion Matrix:
 [[133  15]
 [ 38  89]]
Classification Report:
               precision    recall  f1-score   support

           0       0.78      0.90      0.83       148
           1       0.86      0.70      0.77       127

    accuracy                           0.81       275
   macro avg       0.82      0.80      0.80       275
weighted avg       0.81      0.81      0.80       275

