<a href="https://colab.research.google.com/github/mishc9/dle-notebooks/blob/master/object_detection_and_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Object detection

## Single Shot Detector

Мы разберем реализацию SSD на [tensorflow2.0](https://github.com/calmisential/TensorFlow2.0_SSD)

In [0]:
!pip3 install git+https://github.com/mishc9/TensorFlow2.0_SSD.git --user -U

In [2]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [3]:
!pip install tensorflow-datasets



In [0]:
import tensorflow as tf
from tensorflow import keras
import tensorflow.keras.backend as K
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt

## Определим loss для SSD

### Sigmoid focal loss

Обычная кроссэнтропия: $$H(p, q) = -\sum_i^n p_i \cdot log(q_i)$$

В случае двух классов (BCE): 
$$L = - \big[p_i \cdot log(\hat{p_i}) + (1 - p_i) \cdot log(1 - \hat{p_i}) \big]$$

Определим $p_t$ как $\hat{p}$, если p = 1, иначе $1 - \hat{p}$. 

Тогда
 $$L = -\log{p_t}$$


Мы можем использовать взвешенную BCE:

 $$L = - \alpha_t \cdot -\log{p_t}$$

 $\alpha_t$ равно $\alpha$, если p = 1, иначе $1 - \alpha$; $\alpha, \gamma$ - гиперпараметры лосса. Значение $0 \leq \alpha \leq 1$ - это обратная частота класса / либо подбирается вручную.

 Теперь, наконец, можно определить фокальный лосс:

$$L = - (1 - p_t)^{\gamma} \cdot \log{p_t} $$ 

Когда $\gamma > 0$, плохо классифицируемые лейблы штрафуются сильнее.

Совмещение этих двух подходов дает нам итоговый лосс:

$$L = - \alpha_t \cdot (1 - p_t)^{\gamma} \cdot \log{p_t} $$

  подбирается, чтобы сбалансировать положительные и отрицательные классы.

### Зачем это нужно?
При детекции объектов негативных примеров >> позитивных (т.к. чаще объекта нет в кадре). Параметр $\alpha_t$ "выделяет" положительные примеры.

Параметр $\gamma$ снижает штраф за "хорошо" классифицированные примеры.


In [0]:
import numpy as np

alpha = (1 - 1/1000)
gamma = 0.5
p_t_neg = 0.99
neg_count = 10000
p_t_pos = 0.5
pos_count = 10

**У обычной BCE (Binary Cross Entropy) значительный перекос в сторону отрицательных классов**

In [6]:
vanilla_neg_loss = -np.log(p_t_neg) * neg_count
vanilla_pos_loss = -np.log(p_t_pos) * pos_count
vanilla_neg_loss, vanilla_pos_loss

(100.5033585350145, 6.931471805599453)

**BCE с фокальным лоссом значительно меньше штрафует хорошо различимый класс**

(т.е. класс с высокой $\hat{p}$)

In [7]:
focal_neg_loss = - (1 - p_t_neg)**gamma * np.log(p_t_neg) * neg_count
focal_pos_loss = - (1 - p_t_pos)**gamma * np.log(p_t_pos) * pos_count
focal_neg_loss, focal_pos_loss

(10.050335853501455, 4.901290717342736)

**Веса позволяют уменьшить дисбаланс классов**

In [8]:
w_focal_neg_loss = -(1 - alpha) * (1 - p_t_neg)**gamma * np.log(p_t_neg) * neg_count
w_focal_pos_loss = -(alpha) * (1 - p_t_pos)**gamma * np.log(p_t_pos) * pos_count
w_focal_neg_loss, w_focal_pos_loss

(0.010050335853501466, 4.896389426625393)

**конечно, "взвешивание" классов работает и без focal loss**

In [9]:
w_vanilla_neg_loss = (1 - alpha) * -np.log(p_t_neg) * neg_count
w_vanilla_pos_loss = (alpha) * -np.log(p_t_pos) * pos_count
w_vanilla_neg_loss, w_vanilla_pos_loss

(0.1005033585350146, 6.924540333793853)

In [0]:
def sigmoid_focal_loss(y_true, y_pred, alpha, gamma):
    ce = tf.keras.backend.binary_crossentropy(target=y_true, output=y_pred, from_logits=True)
    pred_prob = tf.math.sigmoid(y_pred)
    p_t = (y_true * pred_prob) + ((1 - y_true) * (1 - pred_prob))
    alpha_factor = y_true * alpha + (1 - y_true) * (1 - alpha)
    modulating_factor = tf.math.pow((1.0 - p_t), gamma)
    return tf.math.reduce_sum(alpha_factor * modulating_factor * ce, axis=-1)

### SSD loss

In [0]:
class SSDLoss(object):
    def __init__(self):
        self.smooth_l1_loss = SmoothL1Loss()
        self.reg_loss_weight = reg_loss_weight
        self.cls_loss_weight = 1 - reg_loss_weight
        self.num_classes = NUM_CLASSES + 1

    @staticmethod
    def __cover_background_boxes(true_boxes):
        symbol = true_boxes[..., -1]
        mask_symbol = tf.where(symbol < 0.5, 0.0, 1.0)
        mask_symbol = tf.expand_dims(input=mask_symbol, axis=-1)
        cover_boxes_tensor = tf.tile(input=mask_symbol, multiples=tf.constant([1, 1, 4], dtype=tf.dtypes.int32))
        return cover_boxes_tensor

    def __call__(self, y_true, y_pred, *args, **kwargs):
        # y_true : tensor, shape: (batch_size, total_num_of_default_boxes, 5)
        # y_pred : tensor, shape: (batch_size, total_num_of_default_boxes, NUM_CLASSES + 5)
        true_class = tf.cast(x=y_true[..., -1], dtype=tf.dtypes.int32)
        pred_class = y_pred[..., :self.num_classes]
        true_class = tf.one_hot(indices=true_class, depth=self.num_classes, axis=-1)
        class_loss_value = tf.math.reduce_sum(sigmoid_focal_loss(y_true=true_class, y_pred=pred_class, alpha=alpha, gamma=gamma))

        cover_boxes = self.__cover_background_boxes(true_boxes=y_true)
        true_coord = y_true[..., :4] * cover_boxes
        pred_coord = y_pred[..., self.num_classes:] * cover_boxes
        reg_loss_value = self.smooth_l1_loss(y_true=true_coord, y_pred=pred_coord)

        loss = self.cls_loss_weight * class_loss_value + self.reg_loss_weight * reg_loss_value
        return loss, class_loss_value, reg_loss_value

### SSD Model


In [0]:
class SSD(tf.keras.Model):
    def __init__(self):
        super(SSD, self).__init__()
        self.num_classes = NUM_CLASSES + 1
        self.anchor_ratios = ASPECT_RATIOS

        self.backbone = ResNet50()
        self.conv1 = tf.keras.layers.Conv2D(filters=1024, kernel_size=(1, 1), strides=1, padding="same")
        self.conv2_1 = tf.keras.layers.Conv2D(filters=256, kernel_size=(1, 1), strides=1, padding="same")
        self.conv2_2 = tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=2, padding="same")
        self.conv3_1 = tf.keras.layers.Conv2D(filters=128, kernel_size=(1, 1), strides=1, padding="same")
        self.conv3_2 = tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), strides=2, padding="same")
        self.conv4_1 = tf.keras.layers.Conv2D(filters=128, kernel_size=(1, 1), strides=1, padding="same")
        self.conv4_2 = tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), strides=2, padding="same")
        self.pool = tf.keras.layers.GlobalAveragePooling2D()

        self.predict_1 = self._predict_layer(k=self._get_k(i=0))
        self.predict_2 = self._predict_layer(k=self._get_k(i=1))
        self.predict_3 = self._predict_layer(k=self._get_k(i=2))
        self.predict_4 = self._predict_layer(k=self._get_k(i=3))
        self.predict_5 = self._predict_layer(k=self._get_k(i=4))
        self.predict_6 = self._predict_layer(k=self._get_k(i=5))

    def _predict_layer(self, k):
        filter_num = k * (self.num_classes + 4)
        return tf.keras.layers.Conv2D(filters=filter_num, kernel_size=(3, 3), strides=1, padding="same")

    def _get_k(self, i):
        # k is the number of boxes generated at each position of the feature map.
        return len(self.anchor_ratios[i]) + 1

    def call(self, inputs, training=None, mask=None):
        branch_1, x = self.backbone(inputs, training=training)
        predict_1 = self.predict_1(branch_1)

        x = self.conv1(x)
        branch_2 = x
        predict_2 = self.predict_2(branch_2)

        x = self.conv2_1(x)
        x = self.conv2_2(x)
        branch_3 = x
        predict_3 = self.predict_3(branch_3)

        x = self.conv3_1(x)
        x = self.conv3_2(x)
        branch_4 = x
        predict_4 = self.predict_4(branch_4)

        x = self.conv4_1(x)
        x = self.conv4_2(x)
        branch_5 = x
        predict_5 = self.predict_5(branch_5)

        branch_6 = self.pool(x)
        branch_6 = tf.expand_dims(input=branch_6, axis=1)
        branch_6 = tf.expand_dims(input=branch_6, axis=2)
        predict_6 = self.predict_6(branch_6)

        # predict_i shape : (batch_size, h, w, k * (c+4)), where c is self.num_classes.
        return [predict_1, predict_2, predict_3, predict_4, predict_5, predict_6]


def ssd_prediction(feature_maps, num_classes):
    batch_size = feature_maps[0].shape[0]
    predicted_features_list = []
    for feature in feature_maps:
        predicted_features_list.append(tf.reshape(tensor=feature, shape=(batch_size, -1, num_classes + 4)))
    predicted_features = tf.concat(values=predicted_features_list, axis=1)
    return predicted_features

## Training porcess


#### Download PASCAL Voc

In [0]:
from google.colab import drive 
import os

gdrive_path = '/content/gdrive'
drive.mount(gdrive_path)

In [0]:
!wget -O /content/gdrive/My\ Drive/VOCtrainval_11-May-2012.tar http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

In [0]:
!tar -xvf /content/gdrive/My\ Drive/VOCtrainval_11-May-2012.tar

In [0]:
!apt-get install tree

In [0]:
!mkdir dataset

In [0]:
!mv VOCdevkit/ dataset

In [18]:
!tree dataset -d 2

dataset
└── VOCdevkit
    └── VOC2012
        ├── Annotations
        ├── ImageSets
        │   ├── Action
        │   ├── Layout
        │   ├── Main
        │   └── Segmentation
        ├── JPEGImages
        ├── SegmentationClass
        └── SegmentationObject
2 [error opening dir]

11 directories


In [0]:
import tf2ssd
from tf2ssd.core import SSD, SSDLoss, TFDataset
import time
from tf2ssd.core import ReadDataset, MakeGT
from tf2ssd.core import SSD, ssd_prediction
from tf2ssd.core import SSDLoss
from tf2ssd.core import TFDataset
from tf2ssd.core.inference import visualize_training_results

In [0]:
from tf2ssd.core.parse_voc import ParsePascalVOC
from tf2ssd.configuration import TXT_DIR

voc = ParsePascalVOC()
voc.write_data_to_txt(txt_dir=TXT_DIR)

In [0]:
# GPU settings
from tf2ssd.configuration import IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS, EPOCHS, NUM_CLASSES, BATCH_SIZE, save_model_dir, \
    load_weights_before_training, load_weights_from_epoch, save_frequency, test_images_during_training, \
    test_images_dir_list

def print_model_summary(network):
    network.build(input_shape=(None, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS))
    network.summary()

gpus = tf.config.list_physical_devices("GPU")
if gpus:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

dataset = TFDataset()
train_data, train_count = dataset.generate_datatset()

ssd = SSD()
print_model_summary(network=ssd)

In [0]:
if load_weights_before_training:
    ssd.load_weights(filepath=save_model_dir+"epoch-{}".format(load_weights_from_epoch))
    print("Successfully load weights!")
else:
    load_weights_from_epoch = -1

### Training loop

#### Define training step first

In [0]:
# loss
loss = SSDLoss()

# optimizer
optimizer = tf.optimizers.Adam(learning_rate=0.001)

# metrics
loss_metric = tf.metrics.Mean()
cls_loss_metric = tf.metrics.Mean()
reg_loss_metric = tf.metrics.Mean()


def train_step(batch_images, batch_labels):
    with tf.GradientTape() as tape:
        pred = ssd(batch_images, training=True)
        output = ssd_prediction(feature_maps=pred, num_classes=NUM_CLASSES + 1)
        gt = MakeGT(batch_labels, pred)
        gt_boxes = gt.generate_gt_boxes()
        loss_value, cls_loss, reg_loss = loss(y_true=gt_boxes, y_pred=output)
    gradients = tape.gradient(loss_value, ssd.trainable_variables)
    optimizer.apply_gradients(grads_and_vars=zip(gradients, ssd.trainable_variables))
    loss_metric.update_state(values=loss_value)
    cls_loss_metric.update_state(values=cls_loss)
    reg_loss_metric.update_state(values=reg_loss)

#### Fit the model

In [0]:
for epoch in range(load_weights_from_epoch + 1, EPOCHS):
    for step, batch_data in enumerate(train_data):
        start_time = time.time()
        images, labels = ReadDataset().read(batch_data)
        train_step(batch_images=images, batch_labels=labels)
        spent_time = time.time() - start_time
        print("Epoch: {}/{}, step: {}/{}, time spent: {:.2f}s, loss: {:.5f}, "
              "cls loss: {:.5f}, reg loss: {:.5f}".format(epoch,
                                                          EPOCHS,
                                                          step,
                                                          tf.math.ceil(train_count / BATCH_SIZE),
                                                          spent_time,
                                                          loss_metric.result(),
                                                          cls_loss_metric.result(),
                                                          reg_loss_metric.result()))
    loss_metric.reset_states()
    cls_loss_metric.reset_states()
    reg_loss_metric.reset_states()

    if epoch % save_frequency == 0:
        ssd.save_weights(filepath=save_model_dir+"epoch-{}".format(epoch), save_format="tf")

    if test_images_during_training:
        visualize_training_results(pictures=test_images_dir_list, model=ssd, epoch=epoch)

ssd.save_weights(filepath=save_model_dir+"saved_model", save_format="tf")

Epoch: 0/50, step: 665/2141.0, time spent: 4.18s, loss: 3099724.00000, cls loss: 5195248.00000, reg loss: 1004202.50000
Epoch: 0/50, step: 666/2141.0, time spent: 4.29s, loss: 3095166.00000, cls loss: 5187529.50000, reg loss: 1002804.68750
Epoch: 0/50, step: 667/2141.0, time spent: 4.23s, loss: 3090773.00000, cls loss: 5179988.50000, reg loss: 1001559.68750
Epoch: 0/50, step: 668/2141.0, time spent: 3.65s, loss: 3086153.75000, cls loss: 5172247.00000, reg loss: 1000062.56250
Epoch: 0/50, step: 669/2141.0, time spent: 5.25s, loss: 3081571.00000, cls loss: 5164538.50000, reg loss: 998605.43750
Epoch: 0/50, step: 670/2141.0, time spent: 3.91s, loss: 3077556.00000, cls loss: 5157254.00000, reg loss: 997859.87500
Epoch: 0/50, step: 671/2141.0, time spent: 3.94s, loss: 3073425.25000, cls loss: 5150091.00000, reg loss: 996761.62500
Epoch: 0/50, step: 672/2141.0, time spent: 3.81s, loss: 3068875.25000, cls loss: 5142459.50000, reg loss: 995292.81250
Epoch: 0/50, step: 673/2141.0, time spent: 3