<a href="https://colab.research.google.com/github/junggeyy/DeepLearning/blob/main/authenticate_banknotes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Banknote Authentication

Here, I have applied logistic regression to a banknote authentication dataset to distinguish between genuine and forged bank notes.


The dataset consists of 1372 examples and 4 features for binary classification. The features are:

1. variance of a wavelet-transformed image (continuous)
2. skewness of a wavelet-transformed image (continuous)
3. kurtosis of a wavelet-transformed image (continuous)
4. entropy of the image (continuous)

(More details about this dataset [here](https://archive.ics.uci.edu/ml/datasets/banknote+authentication))


## 1) Installing Libraries

In [None]:
# !conda install numpy pandas matplotlib --yes

In [None]:
# !pip install torch

## 2) Loading the Dataset

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv("data_banknote_authentication.txt", header=None)
df.head()

Unnamed: 0,0,1,2,3,4
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [4]:
X_features = df[[0, 1, 2, 3]].values
y_labels = df[4].values

Number of examples and features
<br>We have 1372 examples with 4 features each

In [5]:
X_features.shape

(1372, 4)

Label distribution<br>
We have 762 labels of class label 0 and 610 of class label 1

In [6]:
import numpy as np

np.bincount(y_labels)

array([762, 610])

## 3) Defining a DataLoader

In [7]:
from torch.utils.data import Dataset, DataLoader


class MyDataset(Dataset):
    def __init__(self, X, y):
        self.features = torch.tensor(X, dtype=torch.float32)
        self.labels = torch.tensor(y, dtype=torch.float32)

    def __getitem__(self, index):
        x = self.features[index]
        y = self.labels[index]
        return x, y

    def __len__(self):
        return self.labels.shape[0]

We will be using 80% of the data for training, 20% of the data for validation. (no explicit test set)

In [9]:
train_size = int(X_features.shape[0]*0.80)
train_size

1097

In [10]:
val_size = X_features.shape[0] - train_size
val_size

275

Using `torch.utils.data.random_split`, we generate the training and validation sets along with the respective data loaders:

In [12]:
import torch

dataset = MyDataset(X_features, y_labels)

torch.manual_seed(1)
train_set, val_set = torch.utils.data.random_split(dataset, [train_size, val_size])

train_loader = DataLoader(
    dataset=train_set,
    batch_size=10,
    shuffle=True,
)

val_loader = DataLoader(
    dataset=val_set,
    batch_size=10,
    shuffle=False,
)

## 4) Implementing the model

In [13]:
import torch

class LogisticRegression(torch.nn.Module):

    def __init__(self, num_features):
        super().__init__()
        self.linear = torch.nn.Linear(in_features=num_features, out_features=1)

    def forward(self, x):
        logits = self.linear(x)
        probas = torch.sigmoid(logits)
        return probas

## 5) The training loop

In [18]:
import torch.nn.functional as F


torch.manual_seed(1)
model = LogisticRegression(num_features=4)
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)

num_epochs = 5

for epoch in range(num_epochs):

    model = model.train()
    for batch_idx, (features, class_labels) in enumerate(train_loader):

        probas = model(features)

        loss = F.binary_cross_entropy(probas, class_labels.view(probas.shape))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        ### LOGGING
        if not batch_idx % 20: # log every 20th batch
            print(f'Epoch: {epoch+1:03d}/{num_epochs:03d}'
                   f' | Batch {batch_idx:03d}/{len(train_loader):03d}'
                   f' | Loss: {loss:.2f}')

Epoch: 001/005 | Batch 000/110 | Loss: 1.30
Epoch: 001/005 | Batch 020/110 | Loss: 0.27
Epoch: 001/005 | Batch 040/110 | Loss: 0.36
Epoch: 001/005 | Batch 060/110 | Loss: 0.12
Epoch: 001/005 | Batch 080/110 | Loss: 0.07
Epoch: 001/005 | Batch 100/110 | Loss: 0.09
Epoch: 002/005 | Batch 000/110 | Loss: 0.19
Epoch: 002/005 | Batch 020/110 | Loss: 0.08
Epoch: 002/005 | Batch 040/110 | Loss: 0.21
Epoch: 002/005 | Batch 060/110 | Loss: 0.09
Epoch: 002/005 | Batch 080/110 | Loss: 0.09
Epoch: 002/005 | Batch 100/110 | Loss: 0.10
Epoch: 003/005 | Batch 000/110 | Loss: 0.05
Epoch: 003/005 | Batch 020/110 | Loss: 0.11
Epoch: 003/005 | Batch 040/110 | Loss: 0.07
Epoch: 003/005 | Batch 060/110 | Loss: 0.24
Epoch: 003/005 | Batch 080/110 | Loss: 0.07
Epoch: 003/005 | Batch 100/110 | Loss: 0.03
Epoch: 004/005 | Batch 000/110 | Loss: 0.05
Epoch: 004/005 | Batch 020/110 | Loss: 0.02
Epoch: 004/005 | Batch 040/110 | Loss: 0.11
Epoch: 004/005 | Batch 060/110 | Loss: 0.07
Epoch: 004/005 | Batch 080/110 |

## 6) Evaluating the results

In [19]:
def compute_accuracy(model, dataloader):

    model = model.eval()

    correct = 0.0
    total_examples = 0

    for idx, (features, class_labels) in enumerate(dataloader):

        with torch.no_grad():
            probas = model(features)

        pred = torch.where(probas > 0.5, 1, 0)
        lab = class_labels.view(pred.shape).to(pred.dtype)

        compare = lab == pred
        correct += torch.sum(compare)
        total_examples += len(compare)

    return correct / total_examples

In [23]:
train_acc = compute_accuracy(model, train_loader)
print(f"Test Accuracy: {train_acc*100:.2f}%")

Test Accuracy: 98.36%


In [24]:
val_acc = compute_accuracy(model, val_loader)
print(f"Validation Accuracy: {val_acc*100:.2f}%")

Validation Accuracy: 97.82%
