<center>
    <font size="5"> Zaawansowane Metody Uczenia Maszynowego i Głębokiegio<br/>
        <small><em>Studia stacjonarne II stopnia 2025/2026</em><br/>Kierunek: Informatyka<br>Specjalność: Systemy inteligentne i rozszerzona rzeczywistość</small>
    </font>
</center>
<br>


## Temat projektu: Klasyfikacja ras psów. Porównanie modelu Transformer z prostym CNN.
### Autorzy: Jakub Kieliński SIiRRz1, Mateusz Wójtowicz SIiRRz2
### Dataset wykorzystany do treningu: [Link Kaggle](https://www.kaggle.com/competitions/dog-breed-identification/overview)

## Opis datasetu - Dog Breed Identification

### Typ zadania:
- Computer Vision
- Klasyfikacja wieloklasowa
- Liczba klas: 120 (rasy psow)
- Kazdy obraz ma doklanie 1 etykiete

### Liczba probek:
- Zbior treningowy: 10 222 obrazy
- Zbior testowy: 10 357 obrazow
- Format obrazu: JPG

### Strutkura katalogu po rozpakowaniu

```text
dog-breed-identification/
├── train/
│   ├── <image_id>.jpg
│   └── ...
├── test/
│   ├── <image_id>.jpg
│   └── ...
├── labels.csv
└── sample_submission.csv
```

### Plik `labels.csv`
- Kolumny:
  - `id` – identyfikator obrazu (nazwa pliku bez rozszerzenia `.jpg`)
  - `breed` – etykieta klasy (nazwa rasy)
- Jedna etykieta na jeden obraz
- 120 unikalnych wartości w kolumnie `breed`

### Dane testowe
- Brak etykiet
- Struktura plików identyczna jak w `train/`
- Przeznaczone do ewaluacji modelu

Importowanie wymaganych bibliotek

In [8]:
import os, random
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications.efficientnet_v2 import preprocess_input

from sklearn.model_selection import train_test_split

print(f"Tensorflow Version: {tf.__version__}")
print(f"NumPy Version: {np.__version__}")
print(f"Pandas Version: {pd.__version__}")

Tensorflow Version: 2.16.1
NumPy Version: 1.26.4
Pandas Version: 3.0.0


Wersje bibliotek oraz dostepnosc GPU

In [9]:
print(f'Tensorflow version: {tf.__version__}')
print(f'Numpy version: {np.__version__}')
print(f'Pandas version: {pd.__version__}')

gpus = tf.config.list_physical_devices('GPU')

if gpus:
    for gpu in gpus:
        details = tf.config.experimental.get_device_details(gpu)
        gpu_name = details.get('device_name', 'Unknown GPU')
        print(f"Available GPU: {gpu_name}")
else:
    print("No GPU found.")

Tensorflow version: 2.16.1
Numpy version: 1.26.4
Pandas version: 3.0.0
Available GPU: METAL


Mount Google Drive and copy dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

!cp /content/drive/MyDrive/dataset /content/

Konfiguracja seed, path do datasetu, autotune + podstawowe parametry modelu

In [None]:
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

DATA_DIR = "/content/dataset"
TRAIN_DIR = os.path.join(DATA_DIR, "train")
LABELS_CSV = os.path.join(DATA_DIR, "labels.csv")

BATCH_SIZE = 16
IMG_SIZE = 320
EPOCHS_STAGE1 = 10
EPOCHS_STAGE2 = 30

# performance management
AUTOTUNE = tf.data.AUTOTUNE

Wczytywanie etykiet + stratified split

In [11]:
df = pd.read_csv(LABELS_CSV)
df["filename"] = df["id"].astype(str) + ".jpg"
df["filepath"] = df["filename"].apply(lambda x: os.path.join(TRAIN_DIR, x))

# mapowanie klas
class_names = sorted(df["breed"].unique().tolist())
class_to_idx = {c:i for i,c in enumerate(class_names)}
num_classes = len(class_names)

df["label"] = df["breed"].map(class_to_idx).astype(int)

# podzial danych na treningowe i walidacyjne 80/20
train_df, val_df = train_test_split(
    df,
    test_size=0.2,
    random_state=SEED,
    stratify=df["label"]
)

print("Train len:", len(train_df), "Val len:", len(val_df), "Num Classes:", num_classes)

Train len: 8177 Val len: 2045 Num Classes: 120


Tworzenie datasetu treningowego i walidacyjnego z asynchronicznym wczytywaniem

In [12]:
def load_image(path, label):
    img = tf.io.read_file(path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [IMG_SIZE, IMG_SIZE])
    img = tf.cast(img, tf.float32)

    return img, tf.one_hot(label, depth=num_classes)


def make_dataset(df, training: bool):
    ds = tf.data.Dataset.from_tensor_slices((df["filepath"].values, df["label"].values))
    
    if training:
        ds = ds.shuffle(buffer_size=len(df), seed=SEED, reshuffle_each_iteration=True)

    ds = ds.map(load_image, num_parallel_calls=AUTOTUNE)
    ds = ds.batch(BATCH_SIZE)
    ds = ds.prefetch(AUTOTUNE)

    return ds


train_ds = make_dataset(train_df, training=True)
val_ds = make_dataset(val_df, training=False)

Budwanie modelu EfficientNetV2B0 - wczesniej wykorzystany byl NASNetLarge

In [13]:
# augmentacje w modelu dzialajace tylko na treningu
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.08),
    layers.RandomZoom(0.12),
    layers.RandomContrast(0.15)
], name="augmentation")


def build_model():
    base = keras.applications.EfficientNetV2B2(
        include_top=False,
        weights="imagenet",
        input_shape=(IMG_SIZE, IMG_SIZE, 3)
    )
    # stage 1 training
    base.trainable = False
    
    inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
    x = data_augmentation(inputs)
    x = keras.layers.Lambda(preprocess_input)(x)
    x = base(x, training=False)
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(0.35)(x)
    x = layers.Dense(
        512,
        activation="relu",
        kernel_regularizer=keras.regularizers.l2(1e-4)
    )(x)
    outputs = layers.Dense(num_classes, activation="softmax")(x)

    model = keras.Model(inputs, outputs)

    return model, base


model, base_model = build_model()

loss = keras.losses.CategoricalCrossentropy(label_smoothing=0.05)
optimizer = keras.optimizers.Adam(learning_rate=1e-3)

model.compile(
    optimizer=optimizer,
    loss=loss,
    metrics=[
        keras.metrics.CategoricalAccuracy(name="top1"),
        keras.metrics.TopKCategoricalAccuracy(k=5, name="top5"),
    ]
)

model.summary()

Definiowanie Callbackow modelu

In [14]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        "best_model.keras",
        monitor="val_loss",
        save_best_only=True
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor="val_loss",
        factor=0.5,
        patience=2,
        min_lr=1e-6,
        verbose=1
    ),
    keras.callbacks.EarlyStopping(
        monitor="val_loss",
        patience=4,
        restore_best_weights=True
    )
]

Trening modelu (Stage 1)

In [15]:
history_stage1 = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=EPOCHS_STAGE1,
    callbacks=callbacks
)

Epoch 1/10


2026-02-06 02:35:54.074284: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m 80/512[0m [32m━━━[0m[37m━━━━━━━━━━━━━━━━━[0m [1m1:37[0m 226ms/step - loss: 3.7571 - top1: 0.2888 - top5: 0.4841

KeyboardInterrupt: 

Stage 2 -> Fine-tunning (odmrazanie czesci backbone + niski LR)

In [None]:
# odmrazanie ostatnich 60 warstw
# Fine-tunning with BatchNorm frozem
base_model.trainable = True

for layer in base_model.layers:
    if isinstance(layer, keras.layers.BatchNormalization):
        layer.trainable = False

# unfreeze top part
for layer in base_model.layers[:-140]:
    layer.trainable = False

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=3e-5),
    loss=keras.losses.CategoricalCrossentropy(label_smoothing=0.02),
    metrics=[
        keras.metrics.CategoricalAccuracy(name="top1"),
        keras.metrics.TopKCategoricalAccuracy(k=5, name="top5"),
    ]
)

history_stage2 = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=EPOCHS_STAGE2,
    callbacks=callbacks
)

Epoch 1/12
[1m512/512[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m155s[0m 270ms/step - acc: 0.7549 - loss: 1.5388 - val_acc: 0.8274 - val_loss: 1.1664 - learning_rate: 1.0000e-05
Epoch 2/12
[1m512/512[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m131s[0m 254ms/step - acc: 0.7876 - loss: 1.3862 - val_acc: 0.8367 - val_loss: 1.1064 - learning_rate: 1.0000e-05
Epoch 3/12
[1m512/512[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 241ms/step - acc: 0.8067 - loss: 1.2985 - val_acc: 0.8474 - val_loss: 1.0761 - learning_rate: 1.0000e-05
Epoch 4/12
[1m512/512[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m130s[0m 253ms/step - acc: 0.8098 - loss: 1.2550 - val_acc: 0.8513 - val_loss: 1.0421 - learning_rate: 1.0000e-05
Epoch 5/12
[1m512/512[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m138s[0m 268ms/step - acc: 0.8281 - loss: 1.2031 - val_acc: 0.8460 - val_loss: 1.0499 - learning_rate: 1.0000e-05
Epoch 6/12
[1m512/512[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 2

Ewaluacja modelu CNN Transfer learning

Prosty model CNN dla porownania

Ewaluacja modelu CNN

Porownanie modeli

Wnioski