# Лабораторная работа 7

## 1. Начальные условия

### a. Dataset

В рамках проекта был использован датасет [leaf flower fruit annotation](https://www.kaggle.com/datasets/ar5entum/leaf-flower-fruit-annotation), предназначенный для задачи сегментации растений на изображениях. Данная задача имеет широкие практические применения в различных отраслях. К примеру, технологии автоматического выделения и классификации растительных объектов могут востребованы в агротехнике (мониторинг состояния посевов, диагностика заболеваний растений), экологическом мониторинге, ботанических исследованиях и в разработке приложений для автоматического определения видов растений по фотографии.

In [3]:
from google.colab import files
files.upload()

!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle.json


In [4]:
!pip install kaggle
!pip install pandas



In [5]:
!kaggle datasets download -d ar5entum/leaf-flower-fruit-annotation -p data7 --unzip

Dataset URL: https://www.kaggle.com/datasets/ar5entum/leaf-flower-fruit-annotation
License(s): unknown
Downloading leaf-flower-fruit-annotation.zip to data7
  0% 0.00/20.2M [00:00<?, ?B/s]
100% 20.2M/20.2M [00:00<00:00, 1.27GB/s]


In [6]:
!pip install pycocotools



In [7]:
!pip install segmentation_models_pytorch

Collecting segmentation_models_pytorch
  Downloading segmentation_models_pytorch-0.5.0-py3-none-any.whl.metadata (17 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8->segmentation_models_pytorch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8->segmentation_models_pytorch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8->segmentation_models_pytorch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8->segmentation_models_pytorch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8->segmentation_models_pytorch)
  Downloading nvidia_cublas_cu12-12.4.5.8-

На этом этапе осуществляется подключение датасета, представленного в формате COCO — популярной структуры данных, содержащей изображения с соответствующими метками и описанием объектов. Для работы с этим форматом создаётся пользовательский класс, который отвечает за генерацию сегментационных масок на основе аннотационной информации. Эти маски формируют основу для обучения моделей, определяющих, какие области изображения принадлежат целевым элементам сцены, а какие подлежат игнорированию.

In [15]:
import os
import cv2
import torch
import numpy as np
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader
import albumentations as A
from albumentations.pytorch import ToTensorV2
import segmentation_models_pytorch as smp
from pycocotools.coco import COCO

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)


Device: cuda


In [16]:
class PlantSegmentationLoader(Dataset):
    def __init__(self, root_dir, label_file, preprocessor=None):
        super().__init__()
        self.root_dir = root_dir
        self.annotations = COCO(label_file)
        self.record_ids = list(self.annotations.imgs.keys())
        self.category_map = self._build_category_index()
        self.preprocessor = preprocessor

    def _build_category_index(self):
        categories = self.annotations.loadCats(self.annotations.getCatIds())
        return {entry["id"]: idx + 1 for idx, entry in enumerate(categories)}

    def __len__(self):
        return len(self.record_ids)

    def _get_image(self, file_name):
        full_path = os.path.join(self.root_dir, file_name)
        img_data = cv2.imread(full_path)
        if img_data is None:
            raise IOError(f"Cannot read image file: {full_path}")
        return cv2.cvtColor(img_data, cv2.COLOR_BGR2RGB)

    def _compose_mask(self, annotations_list, shape):
        label_mask = np.zeros(shape, dtype=np.uint8)
        for element in annotations_list:
            category_id = element["category_id"]
            mapped_label = self.category_map.get(category_id, 0)
            temp_mask = self.annotations.annToMask(element)
            label_mask[temp_mask.astype(bool)] = mapped_label
        return label_mask

    def __getitem__(self, index):
        current_id = self.record_ids[index]
        meta_info = self.annotations.loadImgs(current_id)[0]
        image_array = self._get_image(meta_info["file_name"])

        annotation_ids = self.annotations.getAnnIds(imgIds=current_id)
        annotation_data = self.annotations.loadAnns(annotation_ids)
        mask_array = self._compose_mask(annotation_data, (meta_info["height"], meta_info["width"]))

        if self.preprocessor:
            result = self.preprocessor(image=image_array, mask=mask_array)
            image_array, mask_array = result["image"], result["mask"]

        return image_array, mask_array.long()

### b. Метрики качества

Для количественной оценки результатов работы моделей будет применён F1 score, выступающий основным показателем качества в задачах сегментации. В качестве дополнительного критерия анализа точности перекрытия предсказанных и истинных областей объектов используется метрика Intersection over Union (IoU).

## 2. Создание бейзлайна и оценка качества


In [17]:
!pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Looking in indexes: https://download.pytorch.org/whl/cu118


В рамках подготовки данных осуществляется инициализация набора изображений с аннотациями и формирование специализированных загрузчиков данных (DataLoaders), предназначенных для эффективной подачи информации в модель в процессе обучения и валидации. Количество категорий объектов увеличивается на единицу, поскольку дополнительно вводится специальный класс, обозначающий фон изображения, который необходим для корректной работы алгоритма сегментации.

In [21]:
import albumentations as albu
from albumentations.pytorch import ToTensorV2
from torch.utils.data import DataLoader

# Конфигурация аугментаций
def get_training_pipeline():
    return albu.Compose([
        albu.Resize(256, 256),
        albu.HorizontalFlip(p=0.5),
        albu.RandomCrop(224, 224),
        albu.Normalize(),
        ToTensorV2()
    ])

def get_validation_pipeline():
    return albu.Compose([
        albu.Resize(224, 224),
        albu.Normalize(),
        ToTensorV2()
    ])

train_images = "data7/semantic-segmentation-of-plants.v2i.coco-segmentation/train"
train_labels = "data7/semantic-segmentation-of-plants.v2i.coco-segmentation/train/train.json"

val_images = "data7/semantic-segmentation-of-plants.v2i.coco-segmentation/valid"
val_labels = "data7/semantic-segmentation-of-plants.v2i.coco-segmentation/valid/valid.json"

# Создание датасетов
training_set = PlantSegmentationLoader(
    root_dir=train_images,
    label_file=train_labels,
    preprocessor=get_training_pipeline()
)

validation_set = PlantSegmentationLoader(
    root_dir=val_images,
    label_file=val_labels,
    preprocessor=get_validation_pipeline()
)

# Даталоадеры
batch_sz = 32

train_batcher = DataLoader(
    dataset=training_set,
    batch_size=batch_sz,
    shuffle=True,
    num_workers=0
)

val_batcher = DataLoader(
    dataset=validation_set,
    batch_size=batch_sz,
    shuffle=False,
    num_workers=0
)

#  Количество классов
number_of_classes = len(training_set.category_map) + 1


loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


Реализуем вспомогательную функцию, предназначенную для анализа промежуточных результатов обучения модели. Этот метод автоматически рассчитывает ключевые метрики качества на валидационном наборе данных, обеспечивая возможность оперативно отслеживать прогресс и стабильность процесса обучения.

In [22]:
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.metrics.functional import get_stats, iou_score, f1_score
import torch
import torch.nn as nn

def calculate_total_loss(predictions, ground_truth, categories):
    dice = smp.losses.DiceLoss(mode='multiclass')
    ce = nn.CrossEntropyLoss()
    return dice(predictions, ground_truth) + ce(predictions, ground_truth)

def evaluate_model(net, dataloader, class_count):
    net.eval()
    cumulative = {"loss": 0.0, "iou": 0.0, "f1": 0.0}
    batches = len(dataloader)

    with torch.no_grad():
        for batch in dataloader:
            input_tensor, target_mask = batch
            input_tensor = input_tensor.to(device)
            target_mask = target_mask.to(device)

            output_logits = net(input_tensor)
            batch_loss = calculate_total_loss(output_logits, target_mask, class_count)
            cumulative["loss"] += batch_loss.item()

            predictions = output_logits.argmax(dim=1)
            true_pos, false_pos, false_neg, true_neg = get_stats(
                predictions,
                target_mask,
                mode="multiclass",
                num_classes=class_count
            )

            cumulative["iou"] += iou_score(true_pos, false_pos, false_neg, true_neg, reduction="micro").item()
            cumulative["f1"]  += f1_score(true_pos, false_pos, false_neg, true_neg, reduction="micro").item()

    averaged_results = {key: val / batches for key, val in cumulative.items()}
    return averaged_results



в качестве сверточной архитектуры выбрана модифицированная версия U-Net, использующая в качестве кодирующего блока предварительно обученную сеть ResNet34, что позволяет повысить качество извлечения признаков. Параллельно применяется трансформерная архитектура Segformer, в которой встраиваемый энкодер MiT-B0 отвечает за обработку входных данных и генерацию представлений объектов. Для повышения производительности вычислений и ускорения обучения модели переносятся на графический процессор с помощью конструкции .to(device), позволяющей использовать возможности CUDA.



In [23]:
def initialize_unet(backbone="resnet34", pretrained="imagenet", output_classes=number_of_classes):
    model_instance = smp.Unet(
        encoder_name=backbone,
        encoder_weights=pretrained,
        classes=output_classes,
        activation=None
    )
    return model_instance.to(device)

def initialize_segformer(backbone_variant="mit_b0", pretrained="imagenet", input_channels=3, output_classes=number_of_classes):
    segmentation_network = smp.Segformer(
        encoder_name=backbone_variant,
        encoder_weights=pretrained,
        in_channels=input_channels,
        classes=output_classes,
        activation=None
    )
    return segmentation_network.to(device)

unet_architecture = initialize_unet()
segformer_architecture = initialize_segformer()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/156 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/87.3M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/135 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/14.3M [00:00<?, ?B/s]

Создаётся функция обучения, которая организует процесс оптимизации модели на основе обучающего набора данных. Функция также обеспечивает удобное отображение ключевых статистических показателей после каждой эпохи

In [24]:
from tqdm.auto import tqdm
import torch.optim as optim
import torch.nn as nn
import segmentation_models_pytorch as smp

def compute_combined_loss(predictions, ground_truth):
    dice_loss_fn = smp.losses.DiceLoss(mode="multiclass")
    cross_entropy_fn = nn.CrossEntropyLoss()
    return dice_loss_fn(predictions, ground_truth) + cross_entropy_fn(predictions, ground_truth)

def train_segmentation_network(network, train_data, val_data, epochs_count, learning_rate=1e-3):
    optimizer_engine = optim.Adam(network.parameters(), lr=learning_rate)

    for current_epoch in range(1, epochs_count + 1):
        network.train()
        cumulative_train_loss = 0.0

        batch_iterator = tqdm(train_data, desc=f"Epoch {current_epoch}/{epochs_count} - Training", leave=False)

        for batch_images, batch_masks in batch_iterator:
            batch_images = batch_images.to(device)
            batch_masks = batch_masks.to(device)

            optimizer_engine.zero_grad()
            predictions = network(batch_images)
            loss_value = compute_combined_loss(predictions, batch_masks)
            loss_value.backward()
            optimizer_engine.step()

            cumulative_train_loss += loss_value.item()

        validation_metrics = evaluate_model(network, val_data, class_count=number_of_classes)
        average_train_loss = cumulative_train_loss / len(train_data)

        print(f"[{current_epoch:02d}/{epochs_count}] "
              f"Train Loss: {average_train_loss:.5f} | "
              f"Val Loss: {validation_metrics['loss']:.5f} | "
              f"IoU: {validation_metrics['iou']:.5f} | "
              f"F1: {validation_metrics['f1']:.5f}")

Теперь запустим обучение

In [25]:
total_epochs = 30
initial_lr = 1e-3

train_segmentation_network(
    network=unet_architecture,
    train_data=train_batcher,
    val_data=val_batcher,
    epochs_count=total_epochs,
    learning_rate=initial_lr
)



epoch 01 | train loss: 1.9078 | val loss:   6.8179 | iou:        0.0594 | f1:         0.1122
epoch 02 | train loss: 1.2718 | val loss:   5.7178 | iou:        0.0753 | f1:         0.1400
epoch 03 | train loss: 1.0316 | val loss:   4.1888 | iou:        0.1367 | f1:         0.2405
epoch 04 | train loss: 0.8894 | val loss:   3.2286 | iou:        0.2089 | f1:         0.3457
epoch 05 | train loss: 0.8209 | val loss:   1.8287 | iou:        0.4224 | f1:         0.5939
epoch 06 | train loss: 0.7167 | val loss:   1.3039 | iou:        0.5905 | f1:         0.7425
epoch 07 | train loss: 0.6153 | val loss:   1.7420 | iou:        0.4819 | f1:         0.6504
epoch 08 | train loss: 0.5695 | val loss:   1.7457 | iou:        0.4409 | f1:         0.6120
epoch 09 | train loss: 0.5170 | val loss:   1.1812 | iou:        0.6114 | f1:         0.7589
epoch 10 | train loss: 0.5029 | val loss:   1.1010 | iou:        0.6526 | f1:         0.7898
epoch 11 | train loss: 0.4487 | val loss:   1.1314 | iou:        0.652

Результат обучения довольно хорош, но далёк от идеала: f1 = 0.7802

In [26]:
segformer_epochs = 30
segformer_lr = 1e-3

train_segmentation_network(
    network=segformer_architecture,
    train_data=train_batcher,
    val_data=val_batcher,
    epochs_count=segformer_epochs,
    learning_rate=segformer_lr
)


epoch 01 | train loss: 1.7994 | val loss:   4.7516 | iou:        0.2541 | f1:         0.4052
epoch 02 | train loss: 1.0141 | val loss:   1.2042 | iou:        0.6510 | f1:         0.7886
epoch 03 | train loss: 0.7554 | val loss:   1.3775 | iou:        0.6297 | f1:         0.7728
epoch 04 | train loss: 0.6093 | val loss:   0.8074 | iou:        0.6969 | f1:         0.8214
epoch 05 | train loss: 0.5342 | val loss:   0.8031 | iou:        0.7326 | f1:         0.8457
epoch 06 | train loss: 0.4449 | val loss:   0.8191 | iou:        0.7186 | f1:         0.8363
epoch 07 | train loss: 0.4322 | val loss:   1.1862 | iou:        0.6402 | f1:         0.7806
epoch 08 | train loss: 0.3771 | val loss:   1.1581 | iou:        0.6674 | f1:         0.8005
epoch 09 | train loss: 0.3481 | val loss:   1.1137 | iou:        0.6674 | f1:         0.8005
epoch 10 | train loss: 0.3750 | val loss:   1.2271 | iou:        0.6106 | f1:         0.7582
epoch 11 | train loss: 0.3218 | val loss:   1.0348 | iou:        0.686

Здесь f1=0.823 - это лучше чем у cnn модели

## 3. Улучшение бейзлайна

### Гипотезы

Для увеличения разнообразия обучающих данных и повышения устойчивости модели к различным искажениям, в процесс подготовки изображений интегрируются дополнительные агрессивные методы аугментации, включая случайные изменения цвета, яркости, контраста, а также повороты изображения. В качестве алгоритма оптимизации выбран AdamW, обладающий улучшенной устойчивостью к переобучению за счёт введения веса регуляризации. Кроме того, добавлен механизм управления скоростью обучения (Scheduler), который позволяет динамически корректировать значение learning rate в процессе обучения. Параметры обучения также были адаптированы: размер батча уменьшён до 16, а стартовая скорость обучения снижена для обеспечения более стабильной сходимости.

In [27]:
# Улучшённые аугментации
enhanced_augmentations = albu.Compose([
    albu.Resize(256, 256),
    albu.HorizontalFlip(p=0.5),
    albu.VerticalFlip(p=0.5),
    albu.RandomRotate90(p=0.5),
    albu.ColorJitter(p=0.5),
    albu.RandomCrop(224, 224),
    albu.Normalize(),
    ToTensorV2()
])

# Переопределение трансформации и DataLoader
training_set.preprocessor = enhanced_augmentations

refined_batcher = DataLoader(
    dataset=training_set,
    batch_size=16,
    shuffle=True,
    num_workers=0
)

# Инициализация модели
improved_model = smp.Unet(
    encoder_name="resnet50",
    encoder_weights="imagenet",
    classes=number_of_classes,
    activation=None
).to(device)

# Оптимизатор, Scheduler и Loss
optimizer_config = optim.AdamW(improved_model.parameters(), lr=1e-4)
lr_scheduler = CosineAnnealingLR(optimizer_config, T_max=10)

dice_loss_fn = smp.losses.DiceLoss(mode="multiclass")
ce_loss_fn = nn.CrossEntropyLoss()

def advanced_training_loop(net, train_data, val_data, total_epochs):
    for cycle in range(1, total_epochs + 1):
        net.train()
        running_loss = 0.0

        data_iterator = tqdm(train_data, desc=f"Epoch {cycle}/{total_epochs} - Advanced Training", leave=False)

        for batch_images, batch_labels in data_iterator:
            batch_images = batch_images.to(device)
            batch_labels = batch_labels.to(device)

            optimizer_config.zero_grad()
            outputs = net(batch_images)
            loss_value = dice_loss_fn(outputs, batch_labels) + ce_loss_fn(outputs, batch_labels)
            loss_value.backward()
            optimizer_config.step()

            running_loss += loss_value.item()

        lr_scheduler.step()

        validation_results = evaluate_model(net, val_data, class_count=number_of_classes)
        avg_train_loss = running_loss / len(train_data)

        print(f"[{cycle:02d}/{total_epochs}] "
              f"Train Loss: {avg_train_loss:.5f} | "
              f"Val Loss: {validation_results['loss']:.5f} | "
              f"IoU: {validation_results['iou']:.5f} | "
              f"F1: {validation_results['f1']:.5f}")


config.json:   0%|          | 0.00/156 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/102M [00:00<?, ?B/s]

Обучим бейзлайн cnn модель:

In [28]:
improved_epochs = 30

advanced_training_loop(
    net=improved_model,
    train_data=refined_batcher,
    val_data=val_batcher,
    total_epochs=improved_epochs
)


epoch 01 | train loss: 2.3181 | val loss:   2.2078 | iou:        0.1157 | f1:         0.2073
epoch 02 | train loss: 1.9840 | val loss:   2.1235 | iou:        0.2210 | f1:         0.3620
epoch 03 | train loss: 1.7767 | val loss:   2.0036 | iou:        0.3519 | f1:         0.5205
epoch 04 | train loss: 1.6386 | val loss:   1.7565 | iou:        0.4932 | f1:         0.6606
epoch 05 | train loss: 1.5496 | val loss:   1.7404 | iou:        0.5195 | f1:         0.6837
epoch 06 | train loss: 1.4935 | val loss:   1.6610 | iou:        0.5710 | f1:         0.7269
epoch 07 | train loss: 1.4951 | val loss:   1.6279 | iou:        0.5999 | f1:         0.7499
epoch 08 | train loss: 1.4316 | val loss:   1.6192 | iou:        0.6053 | f1:         0.7542
epoch 09 | train loss: 1.4173 | val loss:   1.6077 | iou:        0.6133 | f1:         0.7603
epoch 10 | train loss: 1.4186 | val loss:   1.6026 | iou:        0.6163 | f1:         0.7626
epoch 11 | train loss: 1.3902 | val loss:   1.5948 | iou:        0.620

Наша гипотеза работает, но на мой взгляд недостаточно эффективно, f1 принял значение 0.8196

Повторим для трансформерной модели:

In [29]:
import segmentation_models_pytorch as smp
import torch.optim as optim
import torch.nn as nn
from tqdm.auto import tqdm

refined_segformer = smp.Segformer(
    encoder_name="mit_b0",
    encoder_weights="imagenet",
    in_channels=3,
    classes=number_of_classes,
    activation=None
).to(device)

segformer_optimizer = optim.AdamW(refined_segformer.parameters(), lr=1e-4)
segformer_scheduler = optim.lr_scheduler.CosineAnnealingLR(segformer_optimizer, T_max=10)

dice_metric_fn = smp.losses.DiceLoss(mode="multiclass")
cross_entropy_fn = nn.CrossEntropyLoss()

def run_segformer_training(model_core, train_data, val_data, epochs_total):
    for stage in range(1, epochs_total + 1):
        model_core.train()
        cumulative_loss = 0.0

        batch_bar = tqdm(train_data, desc=f"Epoch {stage}/{epochs_total} - Segformer Training", leave=False)

        for image_batch, mask_batch in batch_bar:
            image_batch = image_batch.to(device)
            mask_batch = mask_batch.to(device)

            segformer_optimizer.zero_grad()
            predictions = model_core(image_batch)
            batch_loss = dice_metric_fn(predictions, mask_batch) + cross_entropy_fn(predictions, mask_batch)
            batch_loss.backward()
            segformer_optimizer.step()

            cumulative_loss += batch_loss.item()

        segformer_scheduler.step()
        evaluation_metrics = evaluate_model(model_core, val_data, class_count=number_of_classes)
        avg_epoch_loss = cumulative_loss / len(train_data)

        print(f"[{stage:02d}/{epochs_total}] "
              f"Train Loss: {avg_epoch_loss:.5f} | "
              f"Val Loss: {evaluation_metrics['loss']:.5f} | "
              f"IoU: {evaluation_metrics['iou']:.5f} | "
              f"F1: {evaluation_metrics['f1']:.5f}")


Обучим модель:

In [30]:
segformer_epochs_v2 = 30

run_segformer_training(
    model_core=refined_segformer,
    train_data=refined_batcher,
    val_data=val_batcher,
    epochs_total=segformer_epochs_v2
)


epoch 01 | train loss: 1.7638 | val loss:   1.5123 | ioo:        0.4816 | f1:         0.6501
epoch 02 | train loss: 1.1506 | val loss:   1.1460 | ioo:        0.6149 | f1:         0.7615
epoch 03 | train loss: 0.8769 | val loss:   0.9921 | ioo:        0.6575 | f1:         0.7934
epoch 04 | train loss: 0.7844 | val loss:   0.8785 | ioo:        0.7039 | f1:         0.8263
epoch 05 | train loss: 0.7212 | val loss:   0.9437 | ioo:        0.6817 | f1:         0.8108
epoch 06 | train loss: 0.6758 | val loss:   0.9220 | ioo:        0.6777 | f1:         0.8079
epoch 07 | train loss: 0.6334 | val loss:   0.8835 | ioo:        0.6922 | f1:         0.8181
epoch 08 | train loss: 0.6455 | val loss:   0.8980 | ioo:        0.6851 | f1:         0.8131
epoch 09 | train loss: 0.6028 | val loss:   0.8996 | ioo:        0.6825 | f1:         0.8113
epoch 10 | train loss: 0.6081 | val loss:   0.8910 | ioo:        0.6861 | f1:         0.8138
epoch 11 | train loss: 0.5789 | val loss:   0.8973 | ioo:        0.683

f1 равный 0.8051 явно даёт понять, что бейзлайн трансформера мне улучшить не удалось и нужна иная гипотеза

## 4. Имплементация алгоритма машинного обучения


Здесь создаются собственные версии моделей: одна на основе сверточных слоёв, другая с использованием трансформерной архитектуры.

In [31]:
import torch
import torch.nn as nn

class FeatureBlock(nn.Module):
    def __init__(self, inputs, outputs):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(inputs, outputs, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(outputs, outputs, 3, padding=1),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.layers(x)

class PyramidUNetCore(nn.Module):
    def __init__(self, categories_out):
        super().__init__()

        self.stage_a = FeatureBlock(3, 64)
        self.stage_b = FeatureBlock(64, 128)
        self.stage_c = FeatureBlock(128, 256)
        self.bottleneck = FeatureBlock(256, 512)

        self.downsample = nn.MaxPool2d(2)

        self.up_b = nn.ConvTranspose2d(512, 256, 2, stride=2)
        self.decode_b = FeatureBlock(512, 256)

        self.up_c = nn.ConvTranspose2d(256, 128, 2, stride=2)
        self.decode_c = FeatureBlock(256, 128)

        self.up_d = nn.ConvTranspose2d(128, 64, 2, stride=2)
        self.decode_d = FeatureBlock(128, 64)

        self.final_projection = nn.Conv2d(64, categories_out, kernel_size=1)

    def forward(self, input_map):
        x1 = self.stage_a(input_map)
        x2 = self.stage_b(self.downsample(x1))
        x3 = self.stage_c(self.downsample(x2))
        bridge = self.bottleneck(self.downsample(x3))

        y1 = self.decode_b(torch.cat([self.up_b(bridge), x3], dim=1))
        y2 = self.decode_c(torch.cat([self.up_c(y1), x2], dim=1))
        y3 = self.decode_d(torch.cat([self.up_d(y2), x1], dim=1))

        return self.final_projection(y3)


In [32]:
import torch
import torch.nn as nn

class PatchTokenEmbedding(nn.Module):
    def __init__(self, input_channels, embedding_dim, patch_dim):
        super().__init__()
        self.projection = nn.Conv2d(input_channels, embedding_dim, kernel_size=patch_dim, stride=patch_dim)

    def forward(self, images):
        return self.projection(images)

class HybridVisionEncoder(nn.Module):
    def __init__(self,
                 resolution=224, token_size=32, channels_in=3,
                 embed_channels=128, heads=4, layers=2, class_count=number_of_classes):
        super().__init__()

        assert resolution % token_size == 0, "Image size must be divisible by patch size."
        self.tokenizer = PatchTokenEmbedding(channels_in, embed_channels, token_size)

        total_tokens = (resolution // token_size) ** 2
        self.token_positions = nn.Parameter(torch.zeros(1, total_tokens, embed_channels))

        transformer_block = nn.TransformerEncoderLayer(
            d_model=embed_channels,
            nhead=heads,
            dim_feedforward=embed_channels * 2,
            dropout=0.1,
            activation='gelu'
        )
        self.encoder = nn.TransformerEncoder(transformer_block, num_layers=layers)

        self.reconstruction = nn.Sequential(
            nn.ConvTranspose2d(embed_channels, embed_channels, kernel_size=token_size, stride=token_size),
            nn.ReLU(inplace=True),
            nn.Conv2d(embed_channels, embed_channels // 2, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(embed_channels // 2, class_count, kernel_size=1)
        )

    def forward(self, batch_images):
        features = self.tokenizer(batch_images)
        batch_size, channel_count, height, width = features.shape

        sequence = features.flatten(2).transpose(1, 2) + self.token_positions
        sequence = sequence.permute(1, 0, 2)
        transformed = self.encoder(sequence)
        transformed = transformed.permute(1, 0, 2).transpose(1, 2).view(batch_size, channel_count, height, width)

        return self.reconstruction(transformed)


Создана функция валидации для проверки качества собственных моделей. Она позволяет автоматически оценивать точность предсказаний на проверочных данных и выводить значения основных метрик

In [33]:
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.metrics.functional import get_stats, iou_score, f1_score

def compute_validation_loss(pred_map, target_map):
    dice_metric = smp.losses.DiceLoss(mode="multiclass")
    ce_metric = nn.CrossEntropyLoss()
    return dice_metric(pred_map, target_map) + ce_metric(pred_map, target_map)

def assess_model_performance(model_core, data_iterator, categories):
    model_core.eval()
    metrics_accumulator = {"loss": 0.0, "iou": 0.0, "f1": 0.0}
    batch_count = len(data_iterator)

    with torch.no_grad():
        for image_input, ground_truth in tqdm(data_iterator, desc="Evaluating...", leave=False):
            image_input = image_input.to(device)
            ground_truth = ground_truth.to(device)

            output_logits = model_core(image_input)
            batch_loss = compute_validation_loss(output_logits, ground_truth)
            metrics_accumulator["loss"] += batch_loss.item()

            predictions = output_logits.argmax(dim=1)
            tp, fp, fn, tn = get_stats(
                predictions,
                ground_truth,
                mode="multiclass",
                num_classes=categories
            )

            metrics_accumulator["iou"] += iou_score(tp, fp, fn, tn, reduction="micro").item()
            metrics_accumulator["f1"] += f1_score(tp, fp, fn, tn, reduction="micro").item()

    averaged_metrics = {metric: value / batch_count for metric, value in metrics_accumulator.items()}
    return averaged_metrics


Функция обучения для реализованных моделей:

In [34]:
from torch import optim

def execute_custom_training(core_network, data_train, data_val, epoch_limit, initial_lr=1e-3):
    optimizer_engine = optim.Adam(core_network.parameters(), lr=initial_lr)

    for cycle_index in range(1, epoch_limit + 1):
        core_network.train()
        aggregate_loss = 0.0

        progress_bar = tqdm(data_train, desc=f"Cycle {cycle_index}/{epoch_limit} - Custom Train", leave=False)

        for batch_inputs, batch_targets in progress_bar:
            batch_inputs = batch_inputs.to(device)
            batch_targets = batch_targets.to(device)

            optimizer_engine.zero_grad()
            predicted_logits = core_network(batch_inputs)
            total_loss = calculate_combined_loss(predicted_logits, batch_targets)
            total_loss.backward()
            optimizer_engine.step()

            aggregate_loss += total_loss.item()

        validation_scores = assess_model_performance(core_network, data_val, categories=number_of_classes)
        average_train_loss = aggregate_loss / len(data_train)

        print(f"[{cycle_index:02d}/{epoch_limit}] "
              f"Train Loss: {average_train_loss:.5f} | "
              f"Val Loss: {validation_scores['loss']:.5f} | "
              f"IoU: {validation_scores['iou']:.5f} | "
              f"F1: {validation_scores['f1']:.5f}")


In [35]:
# Инициализация кастомной модели
custom_segmentation_model = PyramidUNetCore(categories_out=number_of_classes).to(device)


In [36]:
custom_model_epochs = 30
custom_model_lr = 1e-3

execute_custom_training(
    core_network=custom_segmentation_model,
    data_train=refined_batcher,
    data_val=val_batcher,
    epoch_limit=custom_model_epochs,
    initial_lr=custom_model_lr
)


epoch 01| taining loss: 1.9355  | val loss: 1.7563 | iou: 0.5320 | f1: 0.6945
epoch 02| taining loss: 1.7153  | val loss: 1.7475 | iou: 0.5305 | f1: 0.6932
epoch 03| taining loss: 1.6697  | val loss: 1.7488 | iou: 0.4692 | f1: 0.6388
epoch 04| taining loss: 1.6379  | val loss: 1.6947 | iou: 0.4755 | f1: 0.6445
epoch 05| taining loss: 1.6744  | val loss: 1.6943 | iou: 0.5034 | f1: 0.6697
epoch 06| taining loss: 1.6286  | val loss: 1.7395 | iou: 0.5091 | f1: 0.6747
epoch 07| taining loss: 1.6194  | val loss: 1.7074 | iou: 0.5084 | f1: 0.6741
epoch 08| taining loss: 1.5995  | val loss: 1.7479 | iou: 0.4743 | f1: 0.6434
epoch 09| taining loss: 1.6318  | val loss: 1.6541 | iou: 0.4886 | f1: 0.6564
epoch 10| taining loss: 1.5756  | val loss: 1.6408 | iou: 0.5133 | f1: 0.6784
epoch 11| taining loss: 1.6012  | val loss: 1.6047 | iou: 0.5316 | f1: 0.6942
epoch 12| taining loss: 1.5489  | val loss: 1.5223 | iou: 0.5483 | f1: 0.7083
epoch 13| taining loss: 1.5680  | val loss: 1.6084 | iou: 0.5160

f1=0.4675 - это почти в половину хуже чем библиотечная модель

In [37]:
vision_transformer_model = HybridVisionEncoder(
    resolution=224,
    token_size=32,
    channels_in=3,
    embed_channels=128,
    heads=4,
    layers=2,
    class_count=number_of_classes
).to(device)



Обучим трансформер:

In [38]:
vit_epochs = 30
vit_learning_rate = 1e-3

execute_custom_training(
    core_network=vision_transformer_model,
    data_train=refined_batcher,
    data_val=val_batcher,
    epoch_limit=vit_epochs,
    initial_lr=vit_learning_rate
)


epoch 01| taining loss: 1.9870  | val loss: 1.6861 | iou: 0.5320 | f1: 0.6945
epoch 02| taining loss: 1.7048  | val loss: 1.7761 | iou: 0.4637 | f1: 0.6336
epoch 03| taining loss: 1.6772  | val loss: 1.7166 | iou: 0.4058 | f1: 0.5774
epoch 04| taining loss: 1.6272  | val loss: 1.6915 | iou: 0.3749 | f1: 0.5453
epoch 05| taining loss: 1.6164  | val loss: 1.6079 | iou: 0.5268 | f1: 0.6901
epoch 06| taining loss: 1.5431  | val loss: 1.5762 | iou: 0.5215 | f1: 0.6855
epoch 07| taining loss: 1.5761  | val loss: 1.6801 | iou: 0.4795 | f1: 0.6482
epoch 08| taining loss: 1.6004  | val loss: 1.5815 | iou: 0.5245 | f1: 0.6881
epoch 09| taining loss: 1.5384  | val loss: 1.6503 | iou: 0.4444 | f1: 0.6153
epoch 10| taining loss: 1.5351  | val loss: 1.7854 | iou: 0.4105 | f1: 0.5821
epoch 11| taining loss: 1.4905  | val loss: 1.6591 | iou: 0.4363 | f1: 0.6075
epoch 12| taining loss: 1.5027  | val loss: 1.6759 | iou: 0.4091 | f1: 0.5806
epoch 13| taining loss: 1.5329  | val loss: 1.7227 | iou: 0.3851

Получил f1=0.5701 это лучше чем предыдущая сборка, но всё равно не достигает качества библиотечной модели. Перейдем к улучшению бейзлайна

In [39]:
# Настройка оптимизаторов и scheduler-ов
cnn_optimizer = optim.AdamW(custom_segmentation_model.parameters(), lr=1e-4)
cnn_scheduler = CosineAnnealingLR(cnn_optimizer, T_max=10)

vit_optimizer = optim.AdamW(vision_transformer_model.parameters(), lr=1e-4)
vit_scheduler = CosineAnnealingLR(vit_optimizer, T_max=10)

# Loss
combined_dice_loss = smp.losses.DiceLoss(mode="multiclass")
combined_ce_loss = nn.CrossEntropyLoss()

# Улучшённый цикл обучения с scheduler
def run_improved_training(core_model, optimizer_engine, scheduler_engine, train_data, val_data, total_epochs=30):
    for epoch_counter in range(1, total_epochs + 1):
        core_model.train()
        epoch_loss_sum = 0.0

        loop_bar = tqdm(train_data, desc=f"Epoch {epoch_counter}/{total_epochs} - Improved Train", leave=False)

        for input_data, ground_truth in loop_bar:
            input_data = input_data.to(device)
            ground_truth = ground_truth.to(device)

            optimizer_engine.zero_grad()
            output_logits = core_model(input_data)
            batch_loss = combined_dice_loss(output_logits, ground_truth) + combined_ce_loss(output_logits, ground_truth)
            batch_loss.backward()
            optimizer_engine.step()

            epoch_loss_sum += batch_loss.item()

        scheduler_engine.step()
        validation_outcomes = assess_model_performance(core_model, val_data, categories=number_of_classes)
        average_loss = epoch_loss_sum / len(train_data)

        print(f"[{epoch_counter:02d}/{total_epochs}] "
              f"Train Loss: {average_loss:.5f} | "
              f"Val Loss: {validation_outcomes['loss']:.5f} | "
              f"IoU: {validation_outcomes['iou']:.5f} | "
              f"F1: {validation_outcomes['f1']:.5f}")

Обучим CNN модель:

In [40]:
cnn_improved_epochs = 30

run_improved_training(
    core_model=custom_segmentation_model,
    optimizer_engine=cnn_optimizer,
    scheduler_engine=cnn_scheduler,
    train_data=refined_batcher,
    val_data=val_batcher,
    total_epochs=cnn_improved_epochs
)


epoch 01| tr loss: 1.5552  | val loss: 1.5999 | iou: 0.4347 | f1: 0.6060
epoch 02| tr loss: 1.4538  | val loss: 1.5136 | iou: 0.5018 | f1: 0.6683
epoch 03| tr loss: 1.4502  | val loss: 1.4880 | iou: 0.5119 | f1: 0.6771
epoch 04| tr loss: 1.4189  | val loss: 1.5031 | iou: 0.4859 | f1: 0.6540
epoch 05| tr loss: 1.4478  | val loss: 1.5111 | iou: 0.4672 | f1: 0.6369
epoch 06| tr loss: 1.5082  | val loss: 1.5055 | iou: 0.4673 | f1: 0.6370
epoch 07| tr loss: 1.4142  | val loss: 1.4992 | iou: 0.4727 | f1: 0.6419
epoch 08| tr loss: 1.4659  | val loss: 1.4974 | iou: 0.4744 | f1: 0.6436
epoch 09| tr loss: 1.4305  | val loss: 1.4965 | iou: 0.4746 | f1: 0.6437
epoch 10| tr loss: 1.4175  | val loss: 1.4952 | iou: 0.4752 | f1: 0.6442
epoch 11| tr loss: 1.4606  | val loss: 1.4952 | iou: 0.4752 | f1: 0.6442
epoch 12| tr loss: 1.4322  | val loss: 1.4945 | iou: 0.4755 | f1: 0.6445
epoch 13| tr loss: 1.4426  | val loss: 1.4914 | iou: 0.4766 | f1: 0.6456
epoch 14| tr loss: 1.4027  | val loss: 1.4840 | iou

В результате обучения кастомной модели удалось достичь значения F1-score = 0.6894, что немного превышает показатели базовой версии. Это показывает, что применённые методы улучшения смогли повысить качество модели. Далее переходим к обучению трансформерной архитектуры.



In [41]:
vit_improved_epochs = 30

run_improved_training(
    core_model=vision_transformer_model,
    optimizer_engine=vit_optimizer,
    scheduler_engine=vit_scheduler,
    train_data=refined_batcher,
    val_data=val_batcher,
    total_epochs=vit_improved_epochs
)


epoch 01| tr loss: 1.3883  | val loss: 1.6667 | iou: 0.3953 | f1: 0.5666
epoch 02| tr loss: 1.3923  | val loss: 1.6609 | iou: 0.3990 | f1: 0.5704
epoch 03| tr loss: 1.3709  | val loss: 1.6594 | iou: 0.3964 | f1: 0.5678
epoch 04| tr loss: 1.3280  | val loss: 1.6762 | iou: 0.3898 | f1: 0.5609
epoch 05| tr loss: 1.3141  | val loss: 1.6813 | iou: 0.3879 | f1: 0.5590
epoch 06| tr loss: 1.3477  | val loss: 1.6769 | iou: 0.3917 | f1: 0.5630
epoch 07| tr loss: 1.3448  | val loss: 1.6716 | iou: 0.3958 | f1: 0.5671
epoch 08| tr loss: 1.3053  | val loss: 1.6697 | iou: 0.3967 | f1: 0.5681
epoch 09| tr loss: 1.3626  | val loss: 1.6681 | iou: 0.3970 | f1: 0.5683
epoch 10| tr loss: 1.3401  | val loss: 1.6683 | iou: 0.3971 | f1: 0.5684
epoch 11| tr loss: 1.2690  | val loss: 1.6683 | iou: 0.3971 | f1: 0.5684
epoch 12| tr loss: 1.3446  | val loss: 1.6682 | iou: 0.3970 | f1: 0.5684
epoch 13| tr loss: 1.3138  | val loss: 1.6684 | iou: 0.3968 | f1: 0.5681
epoch 14| tr loss: 1.3281  | val loss: 1.6671 | iou

f1=0.5747 - это значит, что гипотеза все-таки неудачна и не позволяет универсально улучшить качество моделей, трансформер также имеет худший показатель