In [1]:
import sys
sys.path.append("src")

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Is CUDA available: ", torch.cuda.is_available())

Is CUDA available:  False


### Prepare the dataset

In [2]:
from torchvision.datasets import ImageFolder
from torchvision import transforms
from torch.utils.data import Dataset, random_split, DataLoader

train_dataset_path = 'dataset/ogyeiv2/train'
dataset = ImageFolder(train_dataset_path)
train_dataset, val_dataset = random_split(dataset, [0.8, 0.2]) 

test_dataset_path = 'dataset/ogyeiv2/test'
test_dataset_origin = ImageFolder(test_dataset_path)

We're going to use a model pretrained on ImageNet, so it's best to normalize our dataset accordingly to match the model's training data.

The transforms.ColorJitter function simulates lighting and exposure changes by randomly varying image brightness and contrast by Â±25%. This helps the model stay robust to illumination differences and focus on shape and texture instead of light intensity

In [3]:
from data_pipeline import MedsDataPipeline

data = MedsDataPipeline(train_dataset, val_dataset, test_dataset_origin, batch_size=32)
data.setup()

train_loader = data.train_dataloader()
val_loader = data.val_dataloader()
test_loader = data.test_dataloader()

In [4]:
print(f"Number of classes: {len(dataset.classes)}")
print(f"Class names: {dataset.classes}")
print(f"Training images: {len(data.train_dataset)}")
print(f"Validation images: {len(data.val_dataset)}")
print(f"Test images: {len(data.test_dataset)}")

Number of classes: 84
Class names: ['acc_long_600_mg', 'advil_ultra_forte', 'akineton_2_mg', 'algoflex_forte_dolo_400_mg', 'algoflex_rapid_400_mg', 'algopyrin_500_mg', 'ambroxol_egis_30_mg', 'apranax_550_mg', 'aspirin_ultra_500_mg', 'atoris_20_mg', 'atorvastatin_teva_20_mg', 'betaloc_50_mg', 'bila_git', 'c_vitamin_teva_500_mg', 'calci_kid', 'cataflam_50_mg', 'cataflam_dolo_25_mg', 'cetirizin_10_mg', 'cold_fx', 'coldrex', 'concor_10_mg', 'concor_5_mg', 'condrosulf_800_mg', 'controloc_20_mg', 'covercard_plus_10_mg_2_5_mg_5_mg', 'coverex_4_mg', 'diclopram_75-mg_20-mg', 'dorithricin_mentol', 'dulsevia_60_mg', 'enterol_250_mg', 'favipiravir_meditop_200_mg', 'ibumax_400_mg', 'jutavit_c_vitamin', 'jutavit_cink', 'kalcium_magnezium_cink', 'kalium_r', 'koleszterin_kontroll', 'lactamed', 'lactiv_plus', 'laresin_10_mg', 'letrox_50_mikrogramm', 'lordestin_5_mg', 'merckformin_xr_1000_mg', 'meridian', 'metothyrin_10_mg', 'mezym_forte_10_000_egyseg', 'milgamma', 'milurit_300_mg', 'naprosyn_250_mg', '

### Model

We use a pretrained MobileNetV3-Large as the backbone for pill classification.

The convolutional feature extractor is frozen to retain learned visual patterns from ImageNet,
and the final classifier layer is replaced to match our 84 pill classes.

We will train the model on a small dataset, so we will only train the last layer of the classifier.

In [5]:
from model import MedsClassifier
from torchinfo import summary

model = MedsClassifier(len(dataset.classes)).to(device)

summary(model, input_size=(1, 3, 224, 224), device=device)


Layer (type:depth-idx)                                  Output Shape              Param #
MedsClassifier                                          [1, 84]                   --
â”œâ”€MobileNetV3: 1-1                                      [1, 84]                   --
â”‚    â””â”€Sequential: 2-1                                  [1, 960, 7, 7]            --
â”‚    â”‚    â””â”€Conv2dNormActivation: 3-1                   [1, 16, 112, 112]         (464)
â”‚    â”‚    â””â”€InvertedResidual: 3-2                       [1, 16, 112, 112]         (464)
â”‚    â”‚    â””â”€InvertedResidual: 3-3                       [1, 24, 56, 56]           (3,440)
â”‚    â”‚    â””â”€InvertedResidual: 3-4                       [1, 24, 56, 56]           (4,440)
â”‚    â”‚    â””â”€InvertedResidual: 3-5                       [1, 40, 28, 28]           (10,328)
â”‚    â”‚    â””â”€InvertedResidual: 3-6                       [1, 40, 28, 28]           (20,992)
â”‚    â”‚    â””â”€InvertedResidual: 3-7                  

### Training

In [6]:
import os
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import classification_report
from train_utils import train_one_epoch, evaluate_loss_acc

os.makedirs("models", exist_ok=True)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
EPOCHS = 20
best_vloss = 1e5
best_test_loss = float('inf')

for epoch in range(EPOCHS):
    print(f'Epoch {epoch + 1}/{EPOCHS}')

    # The age of training
    train_one_epoch(model, train_loader, optimizer, criterion, device, epoch)

    # Validation
    train_loss, train_acc, y_true_train, y_pred_train = evaluate_loss_acc(model, train_loader, criterion, device)
    val_loss,  val_acc, y_true_val, y_pred_val  = evaluate_loss_acc(model, val_loader, criterion, device)

    print(f"Training  - loss: {train_loss:.4f}, accuracy: {train_acc*100:.2f}%")
    print(f"Validation - loss: {val_loss:.4f},  accuracy: {val_acc*100:.2f}%")

    # Saving the best model
    if val_loss < best_test_loss:
        best_test_loss = val_loss
        torch.save(model.state_dict(), "models/meds_classifier.pt")
        print("âœ… Model improved and saved")

    if val_acc >= 0.75:
        print(f"ðŸŽ¯ Target validation accuracy reached ({val_acc*100:.2f}%). Stopping.")
        break

print("\nðŸ“Š Final evaluation on test set:")
model.load_state_dict(torch.load("models/meds_classifier.pt"))
test_loss, test_acc, y_true_test, y_pred_test = evaluate_loss_acc(model, test_loader, criterion, device)
print(f"Test - loss: {test_loss:.4f}, accuracy: {test_acc*100:.2f}%")

print(classification_report(
    y_true_test,
    y_pred_test,
    target_names=test_dataset_origin.classes,
    digits=3,
    zero_division=0
))

Epoch 1/20
Epoch: 1, batch: 9, loss: 4.4261
Epoch: 1, batch: 19, loss: 4.3110
Epoch: 1, batch: 29, loss: 4.0800
Epoch: 1, batch: 39, loss: 3.9568
Epoch: 1, batch: 49, loss: 3.6716
Training  - loss: 3.7175, accuracy: 14.56%
Validation - loss: 3.7163,  accuracy: 15.32%
âœ… Model improved and saved
Epoch 2/20
Epoch: 2, batch: 9, loss: 3.1296
Epoch: 2, batch: 19, loss: 2.8309
Epoch: 2, batch: 29, loss: 2.7139
Epoch: 2, batch: 39, loss: 2.6633
Epoch: 2, batch: 49, loss: 2.4300
Training  - loss: 3.0294, accuracy: 21.15%
Validation - loss: 3.1536,  accuracy: 18.51%
âœ… Model improved and saved
Epoch 3/20
Epoch: 3, batch: 9, loss: 2.1240
Epoch: 3, batch: 19, loss: 1.9945
Epoch: 3, batch: 29, loss: 2.0669
Epoch: 3, batch: 39, loss: 1.9184
Epoch: 3, batch: 49, loss: 1.8917
Training  - loss: 2.5444, accuracy: 29.91%
Validation - loss: 2.6191,  accuracy: 29.36%
âœ… Model improved and saved
Epoch 4/20
Epoch: 4, batch: 9, loss: 1.6930
Epoch: 4, batch: 19, loss: 1.8178
Epoch: 4, batch: 29, loss: 1.55

### Analysis

#### In which 5 classes does the model make mistakes most often?
sedatif_pc - completely unrecognized

lactiv_plus - completely unrecognized

naturland_d_vitamin_forte - almost all predictions are wrong

covercard_plus_10_mg_2_5_mg_5_mg - serious errors

teva_enterobene_2_mg - serious errors

#### Why the model may make mistakes on these classes

Lack of distinctive and informative features â€” the tablets are white, smooth, without engravings, textures, or color variations, so the model cannot find stable features to differentiate between them.

Image capture issues â€” overexposure, poor contrast, uneven lighting, and loss of fine details or textures prevent the CNN from perceiving the actual shape relief or imprinted text.

#### In which classes does the model not make mistakes?
acc_long_600_mg

advil_ultra_forte

algoflex_forte_dolo_400_mg

cataflam_dolo_25_mg

ocutein

strepsils

urzinol

valeriana_teva

#### Why the model recognizes these classes without errors

The tablets in these classes have clear and distinctive visual features â€” such as color, pattern, shape, or engravings â€” that remain visible even under imperfect lighting or slight overexposure.

These unique characteristics provide stable visual cues for the neural network, making it easier to differentiate them from other tablets regardless of variations in lighting, angle, or background.

#### How can the classifierâ€™s accuracy be improved?

1. _Photograph both sides of the tablet._ Some tablets have engravings or text only on one side, so capturing both sides provides the model with more distinguishing information.
2. _Increase the contrast during image capture._ Avoid overexposure and ensure that engravings and textures are clearly visible â€” this helps the model recognize surface details more effectively.
3. _Use angled lighting to emphasize shape and relief._ Side lighting creates subtle shadows that highlight the tabletâ€™s contours and engraved details.
4. _Use a neutral gray background._ A slightly darker gray background improves contrast with white tablets and avoids introducing misleading color cues for the model.
5. _Photograph tablets at a slight angle to capture 3D features._ Angled shots reveal thickness and shape, helping the model distinguish visually similar tablets.
6. _Use a size reference grid._ A grid helps the model learn scale and relative size, but it should be neutral, low-contrast, and consistent, so it doesnâ€™t become a dominant visual feature.

#### How else can the modelâ€™s results and errors be analyzed?

Build a confusion matrix â€” it immediately shows which classes are being confused with each other.

