## Детекция клеток

Ваша задача: обучить YOLO для детекции дрожжевых клеток и микроструктур (см. [07_object_detection.ipynb](../workshops/07_object_detection.ipynb)). Всё необходимое для запуска обучения вы можете взять из ноутбука с практикой, доделать нужно будет самую малость:
- реализовать расчёт Mean Average Precision для всего валидационного сета
- попробовать привести regression loss к виду, который используется в Yolo9000 и YoloV3
- подобрать лучшие размеры якорных рамок с помощью кластеризации

Основная цель: любыми средствами добиться $mAP > 0.6$ на валидации.

Используйте класс `torchmetrics.detection.MeanAveragePrecision` для расчёта $mAP$.

Нас будет интересовать именно значение `map` в словаре со всеми метриками - это mean average precision, усреднённый по всем отсечкам intersection over union в диапазоне $[0.5, 0.95]$ (см. документацию к классу).

При решении можно пользоваться `lightning` или писать цикл обучения вручную. В последнем случае не забудьте вручную отправить модель и батчи на GPU, чтобы обучалось быстрее

In [None]:
 ! wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3799/yeast_cell_in_microstructures_dataset.zip
 ! unzip yeast_cell_in_microstructures_dataset.zip -d yeast_cell_in_microstructures_dataset

--2024-10-28 20:05:56--  https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3799/yeast_cell_in_microstructures_dataset.zip
Resolving tudatalib.ulb.tu-darmstadt.de (tudatalib.ulb.tu-darmstadt.de)... 130.83.152.157
Connecting to tudatalib.ulb.tu-darmstadt.de (tudatalib.ulb.tu-darmstadt.de)|130.83.152.157|:443... connected.
HTTP request sent, awaiting response... 200 200
Length: unspecified [application/zip]
Saving to: ‘yeast_cell_in_microstructures_dataset.zip’

     0K .......... .......... .......... .......... ..........  572K
    50K .......... .......... .......... .......... .......... 1.07M
   100K .......... .......... .......... .......... .......... 65.4M
   150K .......... .......... .......... .......... .......... 1.07M
   200K .......... .......... .......... .......... ..........  199M
   250K .......... .......... .......... .......... .......... 86.0M
   300K .......... .......... .......... .......... .......... 95.1M
   350K .......... .......... ........

Archive:  yeast_cell_in_microstructures_dataset.zip
   creating: yeast_cell_in_microstructures_dataset/test/
   creating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/395.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/303.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/352.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/170.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/160.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/202.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/154.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/294.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/236.pt  
  inflating: yeast_cell_in_microstructures_dataset/test/bounding_boxes/78.pt  
  inflating: yeast_c

In [None]:
%pip install lightning==2.4.0

In [None]:
import lightning as L
import numpy as np
import torch
from torch import Tensor, nn

In [None]:
torch.manual_seed(42)

<torch._C.Generator at 0x7fcc2419ce70>

In [None]:
def process_yolo_preds(preds: Tensor, rescaled_anchors: Tensor) -> tuple[Tensor, Tensor, Tensor]:
    """
    Преобразование выходов модели в
    1. Логит наличия объекта (вероятность получается применением сигмоиды)
    2. Положение рамки относительно ячейки в формате cxcywh
    3. Логиты классов (вероятности получаются применением softmax)
    """
    rescaled_anchors = rescaled_anchors.view(1, len(rescaled_anchors), 1, 1, 2)
    box_predictions = preds[..., 1:5].clone()

    box_predictions[..., 0:2] = torch.sigmoid(box_predictions[..., 0:2])
    box_predictions[..., 2:] = torch.exp(box_predictions[..., 2:]) * rescaled_anchors

    scores = preds[..., 0:1]
    return scores, box_predictions, preds[..., 5:]

In [None]:
GRID_SIZE = 8
IMAGE_SIZE = 256
ANCHORS = [
    [48, 72],
    [64, 64],
    [72, 48],
]
BATCH_SIZE = 8
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
rescaled_anchors = torch.tensor(ANCHORS,device = device) / IMAGE_SIZE * GRID_SIZE

In [None]:
class CNNBlock(nn.Module):
    def __init__(self, in_channels: int, out_channels: int, **kwargs):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels)
        self.activation = nn.LeakyReLU(0.1)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return self.activation(x)

class TinyYOLO(nn.Module):
    def __init__(self, num_classes: int = 2, num_anchors: int = len(ANCHORS), in_channels: int = 1) -> None:
        super().__init__()
        self.num_classes = num_classes
        self.in_channels = in_channels
        self.num_anchors = num_anchors
        self.layers = nn.Sequential(
            CNNBlock(1, 16, kernel_size=3, stride=2, padding=1, dilation=2),
            CNNBlock(16, 32, kernel_size=3, stride=2, padding=1, dilation=2),
            CNNBlock(32, 64, kernel_size=3, stride=2, padding=1, dilation=2),
            CNNBlock(64, 128, kernel_size=3, stride=2, padding=1, groups=8),
            CNNBlock(128, 256, kernel_size=3, stride=1, padding=1, groups=8),
            CNNBlock(256, 256, kernel_size=3, stride=1, padding=1, groups=16),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(256, num_anchors * (num_classes + 5), kernel_size=1)
        )

    def forward(self, x: Tensor) -> Tensor:
        x = self.layers(x)
        B, _, W, H = x.shape
        x = x.view(B, self.num_anchors, self.num_classes + 5, W, H)  # B A F W H
        x = x.permute(0, 1, 3, 4, 2)  # B A W H F
        return x

In [None]:
def iou_wh(wh1: Tensor, wh2: Tensor) -> Tensor:
    # IoU based on width and height of bboxes

    # intersection
    intersection_area = torch.min(wh1[..., 0], wh2[..., 0]) * torch.min(wh1[..., 1], wh2[..., 1])

    # union
    box1_area = wh1[..., 0] * wh1[..., 1]
    box2_area = wh2[..., 0] * wh2[..., 1]
    union_area = box1_area + box2_area - intersection_area

    iou_score = intersection_area / union_area
    return iou_score


def boxes_to_cells(
    boxes: Tensor,
    classes: Tensor,
    rescaled_anchors: Tensor,
    grid_size: int = 8,
    ignore_iou_thresh: float = 0.5,
) -> Tensor:
    """
    Переводит bbox представление в клеточное представление, где каждая рамка -
    (id класса, вероятность нахождения объекта, cx, cy, w, h), а клеточное представление
    имеет размер (batch_size, n_anchors, grid_size, grid_size, 6), в последней размерности
    хранятся признаки ячейки: класс объекта, вероятность объекта, координаты и размеры рамки
    относительно ячейки

    Args:
        boxes (Tensor): тензор со всеми рамками
        classes (Tensor): тензор с id классов объектов
        rescaled_anchors (Tensor): тензор размера (n_anchors, 2) с размерами якорей в долях от размеров ячейки
        grid_size (int): размер сетки,
        ignore_iou_thresh (float, optional): значение IoU для рамок, при котором ячейка,
            занятая более чем одним объектом, будет специально помечена для игнорирования
    """
    targets = torch.zeros((len(rescaled_anchors), grid_size, grid_size, 6))

    # Каждой рамке сопоставляем клетку и наиболее подходящий якорь
    for box, class_label in zip(boxes, classes):
        iou_anchors = iou_wh(box[2:4], rescaled_anchors / grid_size)
        anchor_indices = iou_anchors.argsort(descending=True, dim=0)
        x, y, width, height = box

        # Относим рамку к наиболее подходящему якорю
        has_anchor = False
        for anchor_idx in anchor_indices:
            s = grid_size

            # Определяем клетку, к которой относится рамка
            i, j = int(s * y), int(s * x)
            anchor_taken = targets[anchor_idx, i, j, 0]

            # Проверяем, доступен ли якорная рамка для текущей ячейки
            if not anchor_taken and not has_anchor:
                # Пересчитываем координаты по отношению к клетке
                x_cell, y_cell = s * x - j, s * y - i
                width_cell, height_cell = (width * s, height * s)
                box_coordinates = torch.tensor([x_cell, y_cell, width_cell, height_cell])

                # Заполняем содержимое для выбранной клетки
                targets[anchor_idx, i, j, 0] = 1  # указатель, что в клетке есть объект
                targets[anchor_idx, i, j, 1:5] = box_coordinates
                targets[anchor_idx, i, j, 5] = int(class_label)

                has_anchor = True

            # Если якорь уже выбран, проверим IoU, если больше threshold - пометим клетку -1
            elif not anchor_taken and iou_anchors[anchor_idx] > ignore_iou_thresh:
                targets[anchor_idx, i, j, 0] = -1

    return targets

In [None]:
from torch.utils.data import DataLoader, Dataset
class YeastDetectionDataset(Dataset):
    def __init__(
        self, subset_dir: Path, anchors: list[tuple[int, int]] = ANCHORS, image_size: int = IMAGE_SIZE, grid_size: int = GRID_SIZE
    ) -> None:
        super().__init__()
        self.subset_dir = subset_dir
        self.items = list((self.subset_dir / "inputs").glob("*.pt"))
        # Ignore IoU threshold
        self.ignore_iou_thresh = 0.5
        self.rescaled_anchors = torch.tensor(anchors) / image_size * grid_size
        self.grid_size = grid_size
        self.image_size = image_size

    def __len__(self) -> int:
        return len(self.items)

    def __getitem__(self, index: int) -> tuple[Tensor, Tensor]:
        image_path = self.items[index]
        # load everything
        image = torch.load(image_path, weights_only=True).unsqueeze(0)
        classes = (
            torch.load(self.subset_dir / "classes" / image_path.parts[-1], weights_only=True) + 1
        )
        boxes = torch.load(
            self.subset_dir / "bounding_boxes" / image_path.parts[-1], weights_only=True
        )
        boxes = box_convert(boxes, "xyxy", "cxcywh") / self.image_size

        # convert boxes to cells
        targets = boxes_to_cells(
            boxes, classes, self.rescaled_anchors, self.grid_size, self.ignore_iou_thresh
        )
        return image, targets

In [None]:
def cells_to_bboxes(cells: Tensor, rescaled_anchors: Tensor, is_predictions=False) -> Tensor:
    """
    Переводит клеточное представление в bbox представление, где каждая рамка -
    (id класса, вероятность нахождения объекта, cx, cy, w, h), а клеточное представление
    имеет размер (batch_size, n_anchors, grid_size, grid_size, 6), в последней размерности
    хранятся признаки ячейки: класс объекта, вероятность объекта, координаты и размеры рамки
    относительно ячейки

    Args:
        cells (Tensor): тензор размера (batch_size, n_anchors, width, height, 6)
        rescaled_anchors (Tensor): тензор размера (n_anchors, 2) с размерами якорей в долях от размеров ячейки
        is_predictions (bool, optional): являются ли входные ячейки предсказаниями или верной аннотацией.
    """

    if is_predictions:
        scores, box_predictions, logits = process_yolo_preds(cells, rescaled_anchors)
        scores = torch.sigmoid(scores)
        best_class = torch.argmax(logits, dim=-1).unsqueeze(-1) + 1

    else:
        box_predictions = cells[..., 1:5].clone()
        scores = cells[..., 0:1]
        best_class = cells[..., 5:6]

    # масштабируем размер рамок [0, grid_size] -> [0, 1]
    _, _, H, W, _ = cells.shape
    range_y, range_x = torch.meshgrid(
        torch.arange(H, dtype=cells.dtype, device=cells.device),
        torch.arange(W, dtype=cells.dtype, device=cells.device),
        indexing="ij",
    )
    x = torch.cat(
        [
            best_class,
            scores,
            (box_predictions[..., 0:1] + range_x[None, None, :, :, None]) / W,  # X center
            (box_predictions[..., 1:2] + range_y[None, None, :, :, None]) / H,  # Y center
            box_predictions[..., 2:3] / W,  # Width
            box_predictions[..., 3:4] / H,  # Height
        ],
        -1,
    )

    return x.view(-1, 6)

In [None]:
class YOLOLoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.mse = nn.MSELoss()
        self.bce = nn.BCEWithLogitsLoss()
        self.cross_entropy = nn.CrossEntropyLoss()

    def forward(self, pred: Tensor, target: Tensor, anchors: Tensor) -> Tensor:
        # ниже входные тензоры будут меняться на месте, так что склонируем их
        pred = pred.clone()
        target = target.clone()

        # разделяем рамки на содержащие объекты и не содержащие
        # NB: ещё могут быть -1, куда отнеслось более 1 объекта - их не учитываем
        obj = target[..., 0] == 1
        no_obj = target[..., 0] == 0

        # преобразуем предсказания bbox
        scores, pred_boxes, logits = process_yolo_preds(pred, anchors)

        # no object loss: кросс-энтропия вместо MSE
        no_object_loss = self.bce(
            (scores[no_obj]),
            (target[..., 0:1][no_obj]),
        )

        # object loss: учим предсказывать IoU
        ious = intersection_over_union(pred_boxes[obj], target[..., 1:5][obj]).detach()
        object_loss = self.mse(scores[obj].sigmoid(), ious * target[..., 0:1][obj])

        # box coordinate loss: логарифмируем размеры рамок перед расчётом MSE
        anchors = anchors.reshape(1, len(anchors), 1, 1, 2)
        pred[..., 1:3] = pred[..., 1:3].sigmoid()
        target[..., 3:5] = torch.log(1e-6 + target[..., 3:5] / anchors)
        box_loss = self.mse(pred[..., 1:5][obj], target[..., 1:5][obj])

        # class loss: здесь всё обычно
        class_loss = self.cross_entropy(logits[obj], target[..., 5][obj].long() - 1)

        # Total loss
        return box_loss + object_loss + no_object_loss + class_loss

In [None]:
from pathlib import Path
from lightning.pytorch.utilities.types import EVAL_DATALOADERS, TRAIN_DATALOADERS

from torchmetrics.functional.detection import intersection_over_union
from torchvision.ops.boxes import box_convert

class YeastDetectionDataModule(L.LightningDataModule):
    def __init__(
        self,
        datadir: Path = Path("yeast_cell_in_microstructures_dataset"),
        anchors: list[tuple[int, int]] = ANCHORS,
        image_size: int = IMAGE_SIZE,
        grid_size: int = GRID_SIZE,
        batch_size: int = BATCH_SIZE,
        train_subdir: str = "train",
        val_subdir: str = "val",
        test_subdir: str = "test",
    ):
        super().__init__()
        self.anchors = anchors
        self.image_size = image_size
        self.grid_size = grid_size
        self.batch_size = batch_size
        self.train_dir = datadir / train_subdir
        self.val_dir = datadir / val_subdir
        self.test_dir = datadir / test_subdir

    def setup(self, stage: str) -> None:
        if stage in ("fit", "validate"):
            self.val_dataset = YeastDetectionDataset(self.val_dir, anchors=self.anchors)
        if stage == "fit":
            self.train_dataset = YeastDetectionDataset(self.train_dir, anchors=self.anchors)
        elif stage == "test":
            self.test_dataset = YeastDetectionDataset(self.test_dir, anchors=self.anchors)

    def train_dataloader(self):
        return DataLoader(
            self.train_dataset,
            batch_size=self.batch_size,
            shuffle=True
        )

    def val_dataloader(self):
        return DataLoader(
            self.val_dataset,
            batch_size=self.batch_size,
            shuffle=False
        )

    def test_dataloader(self):
        return DataLoader(
            self.val_dataset,
            batch_size=self.batch_size,
            shuffle=False
        )

In [None]:
def map_transform(preds, y):
    predicted_boxes = []
    target_boxes = []
    # итерируемся по элементам батча, собирая пердсказанные и верные рамки
    for i in range(len(y)):
        # получаем предсказанные рамки
        pred_boxes = cells_to_bboxes(preds[i:i+1], rescaled_anchors, is_predictions=True)
        # ВАЖНО: делаем non-max suppression ДО расчёта метрик
        pred_boxes = pred_boxes[nms(box_convert(pred_boxes[:, 2:], "cxcywh", "xyxy"), pred_boxes[:, 1], iou_threshold=0.3)]
        predicted_boxes.append(
            dict(
                boxes=pred_boxes[:, 2:],
                scores=pred_boxes[:, 1],
                labels=pred_boxes[:, 0].long(),
            )
        )
        true_boxes = cells_to_bboxes(y[i:i+1], rescaled_anchors)
        true_boxes = true_boxes[true_boxes[:, 1] == 1]

        target_boxes.append(
            dict(
                boxes=true_boxes[:, 2:],
                labels=true_boxes[:, 0].long(),
            )
        )
    return predicted_boxes, target_boxes

In [None]:
from typing import Any
from lightning.pytorch.utilities.types import STEP_OUTPUT
from torchmetrics.detection import MeanAveragePrecision
from torchvision.ops.boxes import nms
from lightning.pytorch.loggers import TensorBoardLogger

class Lit(L.LightningModule):
    def __init__(self, model: nn.Module, learning_rate: float = 0.001) -> None:
        super().__init__()
        self.save_hyperparameters()
        self.model = model
        self.learning_rate = learning_rate
        self.map = MeanAveragePrecision(iou_type="bbox", box_format="cxcywh", class_metrics=True)

    def training_step(
        self, batch: tuple[Tensor, Tensor], batch_idx: int
    ) -> STEP_OUTPUT:
        x, y = batch
        y_hat = self.model(x)

        loss = YOLOLoss().forward(pred=y_hat, target=y, anchors=rescaled_anchors)
        self.log("train_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
        return loss

    def validation_step(
        self, batch: tuple[Tensor, Tensor], batch_idx: int
    ) -> STEP_OUTPUT | None:
        x, y = batch
        y_hat = self.model(x)
        loss = YOLOLoss().forward(pred=y_hat, target=y, anchors=rescaled_anchors)

        self.map(*map_transform(y_hat, y))
        self.log("val_loss", loss, on_epoch=True, on_step=False, prog_bar=True)
        return {
            "loss": loss,
        }

    def on_validation_epoch_end(self) -> None:
        self.log("val_map", self.map.compute()['map'], on_epoch=True, on_step=False, prog_bar=True)
        self.map.reset()


    def configure_optimizers(self) -> dict[str, Any]:
        optimizer = torch.optim.Adam(self.model.parameters(), lr=self.learning_rate, weight_decay=0.0001)
        return {
            "optimizer": optimizer,
            "lr_scheduler": torch.optim.lr_scheduler.MultiStepLR(
                optimizer, milestones=[5, 10, 15]
            ),
        }

### Задание 1 (3 балла). Цикл обучения с расчётом Mean Average Precision

Запустите обучение модели из практики на всём обучающем датасете, выведите значение $mAP$ на валидационном датасете после окончания обучения.

В этом задании добейтесь $mAP > 0.3$, если всё сделано правильно - для этого должно хватать 30-50 эпох.

In [None]:
from lightning.pytorch.loggers import TensorBoardLogger
trainer = L.Trainer(
    accelerator="auto",
    max_epochs=35,
    logger=TensorBoardLogger(save_dir="."),
)
lit_module = Lit(
    model=TinyYOLO(), learning_rate=0.001
)
datamodule = YeastDetectionDataModule(image_size=256, anchors=ANCHORS, batch_size=BATCH_SIZE)
trainer.fit(model=lit_module, datamodule=datamodule,)
trainer.validate(model=lit_module, datamodule=datamodule,)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/utilities/parsing.py:208: Attribute 'model' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['model'])`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type                 | Params | Mode 
-------------------------------------------------------
0 | model | TinyYOLO             | 109 K  | train
1 | map   | MeanAveragePrecision | 0      | train
-------------------------------------------------------
109 K     Trainable params
0         Non-trainable params
109 K     Total params
0.438     Total estimated model params size (MB)
29        Modules in train mode
0         Modules in eval mode


Sanity Checking DataLoader 0:  50%|█████     | 1/2 [00:00<00:00,  7.25it/s]

/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


                                                                           

/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (37) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 0: 100%|██████████| 37/37 [00:01<00:00, 22.43it/s, v_num=16]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:  14%|█▍        | 1/7 [00:00<00:00,  6.46it/s][A
Validation DataLoader 0:  29%|██▊       | 2/7 [00:00<00:00,  5.70it/s][A
Validation DataLoader 0:  43%|████▎     | 3/7 [00:00<00:00,  5.74it/s][A
Validation DataLoader 0:  57%|█████▋    | 4/7 [00:00<00:00,  5.60it/s][A
Validation DataLoader 0:  71%|███████▏  | 5/7 [00:00<00:00,  5.56it/s][A
Validation DataLoader 0:  86%|████████▌ | 6/7 [00:01<00:00,  5.53it/s][A
Validation DataLoader 0: 100%|██████████| 7/7 [00:01<00:00,  6.12it/s][A
Epoch 1: 100%|██████████| 37/37 [00:01<00:00, 22.48it/s, v_num=16, val_loss=0.286, val_map=0.164, train_loss=0.590]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%

`Trainer.fit` stopped: `max_epochs=35` reached.


Epoch 34: 100%|██████████| 37/37 [00:03<00:00, 10.33it/s, v_num=16, val_loss=0.130, val_map=0.361, train_loss=0.0625]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.



Validation DataLoader 0: 100%|██████████| 7/7 [00:02<00:00,  3.46it/s]


[{'val_loss': 0.12970031797885895, 'val_map': 0.36059409379959106}]

### Задание 2 (2 балла). YoloV3 loss

Мы упоминали, что на практике использовалась не совсем та же самая ошибка, что и в YOLO. В этом задании исправьте в классе YoloLoss ошибку регрессии, приведя её в соответствие с тем, как она описана в статье [YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767) (см. раздел 2.1. Bounding Box Prediction).

Запустите обучение с изменённой ошибкой, добейтесь $mAP > 0.3$

In [None]:
class YOLOv3Loss(nn.Module):
    def __init__(self):
        super().__init__()
        self.mse = nn.MSELoss()
        self.bce = nn.BCEWithLogitsLoss()
        self.cross_entropy = nn.CrossEntropyLoss()

    def forward(self, pred: Tensor, target: Tensor, anchors: Tensor) -> Tensor:
        # ниже входные тензоры будут меняться на месте, так что склонируем их
        pred = pred.clone()
        target = target.clone()

        # разделяем рамки на содержащие объекты и не содержащие
        # NB: ещё могут быть -1, куда отнеслось более 1 объекта - их не учитываем
        obj = target[..., 0] == 1
        no_obj = target[..., 0] == 0

        # преобразуем предсказания bbox
        scores, pred_boxes, logits = process_yolo_preds(pred, anchors)

        # no object loss: кросс-энтропия вместо MSE
        no_object_loss = self.bce(
            (scores[no_obj]),
            (target[..., 0:1][no_obj]),
        )

        # object loss: учим предсказывать IoU
        ious = intersection_over_union(pred_boxes[obj], target[..., 1:5][obj]).detach()
        object_loss = self.mse(scores[obj].sigmoid(), ious * target[..., 0:1][obj])

        # box coordinate loss: логарифмируем размеры рамок перед расчётом MSE
        anchors = anchors.reshape(1, len(anchors), 1, 1, 2)
        pred[..., 1:3] = pred[..., 1:3].sigmoid()
        pred[..., 3:5] = torch.sqrt(anchors * torch.exp(pred[..., 3:5]))
        target[..., 3:5] = torch.sqrt(target[..., 3:5])
        box_loss = self.mse(pred[..., 1:5][obj], target[..., 1:5][obj])

        # class loss: здесь всё обычно
        class_loss = self.cross_entropy(logits[obj], target[..., 5][obj].long() - 1)

        # Total loss
        return box_loss + object_loss + no_object_loss + class_loss

In [None]:
class Lit_v3(L.LightningModule):
    def __init__(self, model: nn.Module, learning_rate: float = 0.001) -> None:
        super().__init__()
        self.save_hyperparameters()
        self.model = model
        self.learning_rate = learning_rate
        self.map = MeanAveragePrecision(iou_type="bbox", box_format="cxcywh", class_metrics=True)

    def training_step(
        self, batch: tuple[Tensor, Tensor], batch_idx: int
    ) -> STEP_OUTPUT:
        x, y = batch
        y_hat = self.model(x)

        loss = YOLOv3Loss().forward(pred=y_hat, target=y, anchors=rescaled_anchors)
        self.log("train_loss", loss, on_step=False, on_epoch=True, prog_bar=True)
        return loss

    def validation_step(
        self, batch: tuple[Tensor, Tensor], batch_idx: int
    ) -> STEP_OUTPUT | None:
        x, y = batch
        y_hat = self.model(x)
        loss = YOLOv3Loss().forward(pred=y_hat, target=y, anchors=rescaled_anchors)

        self.map(*map_transform(y_hat, y))
        self.log("val_loss", loss, on_epoch=True, on_step=False, prog_bar=True)
        return {
            "loss": loss,
        }

    def on_validation_epoch_end(self) -> None:
        self.log("val_map", self.map.compute()['map'], on_epoch=True, on_step=False, prog_bar=True)
        self.map.reset()


    def configure_optimizers(self) -> dict[str, Any]:
        optimizer = torch.optim.Adam(self.model.parameters(), lr=self.learning_rate, weight_decay=0.0001)
        return {
            "optimizer": optimizer,
            "lr_scheduler": torch.optim.lr_scheduler.MultiStepLR(
                optimizer, milestones=[5, 10, 15]
            ),
        }

In [None]:
trainer_v3 = L.Trainer(
    accelerator="auto",
    max_epochs=50,
    logger=TensorBoardLogger(save_dir="."),
)
lit_module_v3 = Lit_v3(
    model=TinyYOLO(), learning_rate=0.001
)
datamodule = YeastDetectionDataModule(image_size=256, anchors=ANCHORS, batch_size=BATCH_SIZE)
trainer_v3.fit(model=lit_module_v3, datamodule=datamodule,)
trainer_v3.validate(model=lit_module_v3, datamodule=datamodule,)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/utilities/parsing.py:208: Attribute 'model' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['model'])`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type                 | Params | Mode 
-------------------------------------------------------
0 | model | TinyYOLO             | 109 K  | train
1 | map   | MeanAveragePrecision | 0      | train
-------------------------------------------------------
109 K     Trainable params
0         Non-trainable params
109 K     Total params
0.438     Total estimated model params size (MB)
29        Modules in train mode
0         Modules in eval mode


Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]

/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


                                                                           

/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (37) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 0: 100%|██████████| 37/37 [00:01<00:00, 21.73it/s, v_num=21]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:  14%|█▍        | 1/7 [00:00<00:00,  6.68it/s][A
Validation DataLoader 0:  29%|██▊       | 2/7 [00:00<00:00,  5.87it/s][A
Validation DataLoader 0:  43%|████▎     | 3/7 [00:00<00:00,  5.92it/s][A
Validation DataLoader 0:  57%|█████▋    | 4/7 [00:00<00:00,  5.84it/s][A
Validation DataLoader 0:  71%|███████▏  | 5/7 [00:00<00:00,  5.81it/s][A
Validation DataLoader 0:  86%|████████▌ | 6/7 [00:01<00:00,  5.75it/s][A
Validation DataLoader 0: 100%|██████████| 7/7 [00:01<00:00,  6.38it/s][A
Epoch 1: 100%|██████████| 37/37 [00:01<00:00, 23.14it/s, v_num=21, val_loss=0.319, val_map=0.0612, train_loss=0.642]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0

`Trainer.fit` stopped: `max_epochs=50` reached.


Epoch 49: 100%|██████████| 37/37 [00:03<00:00, 10.38it/s, v_num=21, val_loss=0.110, val_map=0.324, train_loss=0.0685]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.



Validation DataLoader 0: 100%|██████████| 7/7 [00:01<00:00,  4.11it/s]


[{'val_loss': 0.10988053679466248, 'val_map': 0.3238782286643982}]

### Задание 3 (3 балла). Выбор anchors с помощью кластеризации

В статье [YOLO9000: Better, Faster, Stronger](https://arxiv.org/abs/1612.08242) в разделе 2. Better. Dimension clusters описан способ выбора anchor boxes через кластеризацию обучающего датасета.

Проделайте то же самое с вашим обучающим датасетом, чтобы выбрать несколько anchor boxes.

В качестве результата выведите получившиеся размеры anchors для # Clusters = 5

In [None]:
%pip install pyclustering

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [None]:
import pyclustering
from pyclustering.cluster.kmeans import kmeans
from pyclustering.utils.metric import type_metric, distance_metric
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer

In [None]:
class AnchorsDataset(Dataset):
    def __init__(
        self, subset_dir: Path, image_size: int = 256, grid_size: int = GRID_SIZE
    ) -> None:
        super().__init__()
        self.subset_dir = subset_dir
        self.items = list((self.subset_dir / "inputs").glob("*.pt"))
        # Ignore IoU threshold
        self.ignore_iou_thresh = 0.5
        self.grid_size = grid_size
        self.image_size = image_size

    def __len__(self) -> int:
        return len(self.items)

    def __getitem__(self, index: int) -> Tensor:
        boxes = torch.load(
            self.subset_dir / "bounding_boxes" / self.items[index].parts[-1], weights_only=True
        )
        boxes = box_convert(boxes, "xyxy", "cxcywh")
        return boxes

In [None]:
torch.manual_seed(42)

<torch._C.Generator at 0x7f4edcb644d0>

In [None]:
train_dataset = AnchorsDataset(Path("yeast_cell_in_microstructures_dataset/train"))
boxes = []
for i in range(len(train_dataset)):
    for j in range(len(train_dataset[i])):
        boxes.append([train_dataset[i][j][2], train_dataset[i][j][3]]) #нужна ширина и высота

In [None]:
def knn_dist(x, y):
    intersection =  min(x[0],y[0]) * min(x[1],y[1])
    union = x[0]*x[1] + y[0]*y[1] - intersection
    return 1 - intersection/union

In [None]:
manhattan_metric = distance_metric(type_metric.USER_DEFINED, func=knn_dist);
initial_centers = kmeans_plusplus_initializer(boxes, 5).initialize()
kmeans_ = kmeans(boxes, initial_centers, metric=manhattan_metric)

kmeans_.process()
clusters = kmeans_.get_clusters()
claster_anchors = [(int(w), int(h)) for w, h in kmeans_.get_centers()]
print(claster_anchors)

[(70, 105), (43, 47), (91, 88), (57, 84), (61, 61)]


In [None]:
rescaled_anchors = torch.tensor(claster_anchors, device = device) / IMAGE_SIZE * GRID_SIZE

In [None]:
trainer = L.Trainer(
    accelerator="auto",
    max_epochs=30,
    logger=TensorBoardLogger(save_dir="."),
)

lit_module = Lit(
    model=TinyYOLO(num_anchors=len(claster_anchors)), learning_rate=0.001
)
datamodule = YeastDetectionDataModule(image_size=256, anchors=claster_anchors, batch_size=BATCH_SIZE)
trainer.fit(model=lit_module, datamodule=datamodule,)
trainer.validate(model=lit_module, datamodule=datamodule,)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/utilities/parsing.py:208: Attribute 'model' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['model'])`.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type                 | Params | Mode 
-------------------------------------------------------
0 | model | TinyYOLO             | 116 K  | train
1 | map   | MeanAveragePrecision | 0      | train
-------------------------------------------------------
116 K     Trainable params
0         Non-trainable params
116 K     Total params
0.467     Total estimated model params size (MB)
29        Modules in train mode
0         Modules in eval mode


Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]

/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


                                                                           

/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (37) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 0: 100%|██████████| 37/37 [00:01<00:00, 19.68it/s, v_num=39]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:  14%|█▍        | 1/7 [00:00<00:01,  4.14it/s][A
Validation DataLoader 0:  29%|██▊       | 2/7 [00:00<00:01,  3.70it/s][A
Validation DataLoader 0:  43%|████▎     | 3/7 [00:00<00:01,  3.72it/s][A
Validation DataLoader 0:  57%|█████▋    | 4/7 [00:01<00:00,  3.59it/s][A
Validation DataLoader 0:  71%|███████▏  | 5/7 [00:01<00:00,  3.60it/s][A
Validation DataLoader 0:  86%|████████▌ | 6/7 [00:01<00:00,  3.54it/s][A
Validation DataLoader 0: 100%|██████████| 7/7 [00:01<00:00,  3.98it/s][A
Epoch 1: 100%|██████████| 37/37 [00:01<00:00, 19.90it/s, v_num=39, val_loss=0.297, val_map=0.132, train_loss=0.534]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%

`Trainer.fit` stopped: `max_epochs=30` reached.


Epoch 29: 100%|██████████| 37/37 [00:05<00:00,  7.34it/s, v_num=39, val_loss=0.113, val_map=0.388, train_loss=0.0659]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/jupyter/.local/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.



Validation DataLoader 0: 100%|██████████| 7/7 [00:02<00:00,  2.67it/s]


[{'val_loss': 0.11313091218471527, 'val_map': 0.38776394724845886}]

### Задание 4 (4 балла + бонусы за лучшую точность). Обучите модель


Ваша цель: $mAP > 0.6$ на валидации.

Можете использовать весь арсенал:
- использование множества якорных рамок (сформированных вручную или в результате кластеризации)
- любые изменения функции ошибки
- любые изменения архитектуры модели и регуляризация
- аугментации (вспоминаем `torchvision.transforms` и `albumentations`)
- любая длительность обучения, оптимизатор, расписание для learning rate

Бонусы за повышенную точность:
- **5 баллов**: $mAP > 0.65$
- **1 балл** за каждые следующие $0.01$ (т. е. за $mAP > 0.72$ в этом задании вы получите $4 + 12 = 16$ баллов)

**Важно**: перез запуском обучения зафиксируйте `torch.manual_seed()`

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
#изменим функцию активации на tanh
class resCNNBlock(nn.Module):
    def __init__(self, in_channels: int, out_channels: int, **kwargs):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels)
        self.activation = nn.tanh()

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return self.activation(x)

class resTinyYOLO(nn.Module):
    def __init__(self, num_classes: int = 2, num_anchors: int = 1, in_channels: int = 1) -> None:
        super().__init__()
        self.num_classes = num_classes
        self.in_channels = in_channels
        self.num_anchors = num_anchors
        self.layers = nn.Sequential(
            CNNBlock(1, 16, kernel_size=3, stride=2, padding=1, dilation=2),
            CNNBlock(16, 32, kernel_size=3, stride=2, padding=1, dilation=2),
            CNNBlock(32, 64, kernel_size=3, stride=2, padding=1, dilation=2),
            CNNBlock(64, 128, kernel_size=3, stride=2, padding=1, groups=8),
            CNNBlock(128, 256, kernel_size=3, stride=1, padding=1, groups=8),
            CNNBlock(256, 512, kernel_size=3, stride=1, padding=1, groups=16),
            CNNBlock(512, 1024, kernel_size=3, stride=1, padding=1, groups=16),
            CNNBlock(1024, 1024, kernel_size=3, stride=1, padding=1, groups=32),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(1024, num_anchors * (num_classes + 5), kernel_size=1)
        )

    def forward(self, x: Tensor) -> Tensor:
        x = self.layers(x)
        B, _, W, H = x.shape
        x = x.view(B, self.num_anchors, self.num_classes + 5, W, H)  # B A F W H
        x = x.permute(0, 1, 3, 4, 2)  # B A W H F
        return x

In [None]:
torch.manual_seed(42)
trainer = L.Trainer(
    accelerator="auto",
    max_epochs=25,
    logger=TensorBoardLogger(save_dir="."),
)

lit_module = Lit(
    model=resTinyYOLO(num_anchors=len(claster_anchors)), learning_rate=0.003
)
datamodule = YeastDetectionDataModule(image_size=256, anchors=claster_anchors, batch_size=BATCH_SIZE)
trainer.fit(model=lit_module, datamodule=datamodule,)
trainer.validate(model=lit_module, datamodule=datamodule,)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

  | Name  | Type                 | Params | Mode 
-------------------------------------------------------
0 | model | resTinyYOLO          | 774 K  | train
1 | map   | MeanAveragePrecision | 0      | train
-------------------------------------------------------
774 K     Trainable params
0         Non-trainable params
774 K     Total params
3.099     Total estimated model params size (MB)
37        Modules in train mode
0         Modules in eval mode


Epoch 0: 100%|██████████| 37/37 [00:05<00:00,  6.90it/s, v_num=140]        
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoader 0:  14%|█▍        | 1/7 [00:00<00:01,  4.25it/s][A
Validation DataLoader 0:  29%|██▊       | 2/7 [00:00<00:01,  3.86it/s][A
Validation DataLoader 0:  43%|████▎     | 3/7 [00:00<00:00,  4.02it/s][A
Validation DataLoader 0:  57%|█████▋    | 4/7 [00:01<00:00,  4.00it/s][A
Validation DataLoader 0:  71%|███████▏  | 5/7 [00:01<00:00,  4.02it/s][A
Validation DataLoader 0:  86%|████████▌ | 6/7 [00:01<00:00,  4.01it/s][A
Validation DataLoader 0: 100%|██████████| 7/7 [00:01<00:00,  4.54it/s][A
Epoch 1: 100%|██████████| 37/37 [00:05<00:00,  6.91it/s, v_num=140, val_loss=0.264, val_map=0.202, train_loss=0.446]
Validation: |          | 0/? [00:00<?, ?it/s][A
Validation:   0%|          | 0/7 [00:00<?, ?it/s][A
Validation DataLoad

`Trainer.fit` stopped: `max_epochs=25` reached.


Epoch 24: 100%|██████████| 37/37 [00:07<00:00,  4.82it/s, v_num=140, val_loss=0.0666, val_map=0.696, train_loss=0.0205]
Validation DataLoader 0: 100%|██████████| 7/7 [00:01<00:00,  3.96it/s]


[{'val_loss': 0.06664577126502991, 'val_map': 0.6955863237380981}]