# This notebook was combined  from Remek Kinas(SAHI - Slicing Aided Hyper Inference - Yv5 and YX) and Good Moon(Leon-V5-infer 2.0). 

# Please Upvote them if you find this Helpful
https://www.kaggle.com/freshair1996/leon-v5-infer-2-0

https://www.kaggle.com/remekkinas/sahi-slicing-aided-hyper-inference-yv5-and-yx

Unfortunately, because the public score was  low, I didn't choose it.

Commemorate my second competitionã€‚

# SAHI: Slicing Aided Hyper Inference for Yolov5 and YoloX

A lightweight vision library for performing large scale object detection & instance segmentation on Kaggle. Full source code and tutorial you can find on Fatih Cagatay Akyon (author: Akyon, Fatih Cagatay and Cengiz, Cemil and Altinuc, Sinan Onur and Cavusoglu, Devrim and Sahin, Kadir and Eryuksel, Ogulcan) github: [SAHI: A vision library for large-scale object detection & instance segmentation](https://github.com/obss/sahi)

* In this notebook (tutorial) you can find:
* Installation of SAHI on Kaggle
* Sliced inference with SAHI for Yolov5
* Sliced inference with SAHI for YolovX (soon)


<div class="alert alert-success" role="alert">
Other my work in this competition:
    <ul>
        <li> <a href="https://www.kaggle.com/remekkinas/yolox-full-training-pipeline-for-cots-dataset">YoloX full training pipeline for COTS dataset</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolox-inference-on-kaggle-for-cots-lb-0-507">YoloX detections submission made on COTS dataset</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolor-p6-w6-one-more-yolo-on-kaggle-infer">YoloR [P6/W6] ... one more yolo on Kaggle [INFER]</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolor-p6-w6-one-more-yolo-on-kaggle-train">YoloR [P6/W6]... one more yolo on Kaggle [TRAIN]</a></li>
    </ul>
    
</div>


<div class="alert alert-warning">Note: My goal was to implement and share tool for experimentations  - I was not looking for best parameters to submit over 0.6 or ... even 0.7. This is your part of this journey. Enjoy experimenting and progressing!</div>

The concept of sliced inference is basically; performing inference over smaller slices of the original image and then merging the sliced predictions on the original image. It can be illustrated as below:

<div align="center"><img src="https://raw.githubusercontent.com/obss/sahi/main/resources/sliced_inference.gif"/></div>

## 0 . IMPORT AND INSTALL MODULES

In [None]:
DATASET_PATH = '/kaggle/input/tensorflow-great-barrier-reef/train_images/'
CKPT_PATH = '/kaggle/input/yolov5-655/655.pt'

#CUSTOM_YOLO5_CLASS const (we can execute using standard SAHI predict or custom one implemented in this notebook). 
CUSTOM_YOLO5_CLASS = True

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import sys
import cv2
import torch
from PIL import Image as Img
from IPython.display import display

### 0.A - CUTOM MAGIC FUNCTION

I implemented magic function to skip execution of notebook cell - it depends on CUSTOM_YOLO5_CLASS const (we can execute using standard SAHI predict or custom one). 

In [None]:
from IPython.core.magic import (register_line_cell_magic)

In [None]:
@register_line_cell_magic
def custom_yolo5(line, cell=None):
    if eval(line):
        print("Cell skipped - not executed")
        return
    get_ipython().ex(cell)

### 0.B - INSTALL MODULES

In [None]:
%cd /kaggle/input/sahihub/s-lib
!pip install ./fire-0.4.0/fire-0.4.0.tar -f ./ --no-index
!pip install terminaltables-3.1.10-py2.py3-none-any.whl -f ./ --no-index
!pip install sahi-0.8.22-py3-none-any.whl -f ./ --no-index
!pip install thop-0.0.31.post2005241907-py3-none-any.whl -f ./ --no-index
!pip install yolov5-6.0.6-py36.py37.py38-none-any.whl -f ./ --no-index
!pip install yolo5-0.0.1-py36.py37.py38-none-any.whl -f ./ --no-index

!mkdir -p /root/.config/Ultralytics
!cp /kaggle/input/sahihub/Arial.ttf /root/.config/Ultralytics/

%cd /kaggle/working

## 1. IMPORT SAHI MODULES

In [None]:
from sahi.model import Yolov5DetectionModel
from sahi.utils.cv import read_image
from sahi.predict import get_prediction, get_sliced_prediction, predict
from IPython.display import Image
from sahi.utils.yolov5 import (
    download_yolov5s6_model,
)

## 2. HELPER FUNCTIONS

In [None]:
def show_prediction(img, bboxes, scores, show = True):
    colors = [(0, 0, 255)]

    obj_names = ["s"]

    for box, score in zip(bboxes, scores):
        cv2.rectangle(img, (int(box[0]), int(box[1])), (int(box[0] + box[2]), int(box[1] + box[3])), (255,0,0), 2)
        cv2.putText(img, f'{score}', (int(box[0]), int(box[1])-3), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,0,0), 1, cv2.LINE_AA)
    
    if show:
        img = Img.fromarray(img).resize((1280, 720))
    return img

## 3. MODELS

<div align="center"><img src="https://user-images.githubusercontent.com/34196005/144092739-c1d9bade-a128-4346-947f-424ce00e5c4f.gif"/></div>

### A. YOLOv5 - get_sliced_prediction

* **image**: str or np.ndarray - Location of image or numpy image matrix to slice
* **detection_model**: model.DetectionModel
* **image_size**: int: Input image size for each inference (image is scaled by preserving asp. rat.).
* **slice_height**: int: Height of each slice.  Defaults to ``512``.
* **slice_width**: int: Width of each slice.  Defaults to ``512``.
* **overlap_height_ratio**: float: Fractional overlap in height of each window (e.g. an overlap of 0.2 for a window of size 512 yields an overlap of 102 pixels). Default to ``0.2``.
* **overlap_width_ratio**: float: Fractional overlap in width of each window (e.g. an overlap of 0.2 for a window of size 512 yields an overlap of 102 pixels). Default to ``0.2``.
* **perform_standard_pred**: bool: Perform a standard prediction on top of sliced predictions to increase large object detection accuracy. Default: True.
* **postprocess_type**: str: Type of the postprocess to be used after sliced inference while merging/eliminating predictions. Options are 'NMM', 'GRREDYNMM' or 'NMS'. Default is 'GRREDYNMM'.
* **postprocess_match_metric**: str: Metric to be used during object prediction matching after sliced prediction. 'IOU' for intersection over union, 'IOS' for intersection over smaller area.
* **postprocess_match_threshold**: float: Sliced predictions having higher iou than postprocess_match_threshold will be postprocessed after sliced prediction.
* **postprocess_class_agnostic**: bool: If True, postprocess will ignore category ids.
* **verbose**: int: 0: no print, 1: print number of slices (default), 2: print number of slices and slice/prediction durations

### A1. CUSTOM Yolo5 PREDICTION CLASS
This is not obligatory but I decided to write this to have more control over prediction.
Idea provided by Dewei Chen @dwchen in this discussion: https://www.kaggle.com/c/tensorflow-great-barrier-reef/discussion/302761

In [None]:
from sahi.prediction import ObjectPrediction
from sahi.model import DetectionModel
from typing import Dict, List, Optional, Union
from sahi.utils.compatibility import fix_full_shape_list, fix_shift_amount_list

In [None]:
class COTSYolov5DetectionModel(DetectionModel):

    
    def load_model(self):
        model = torch.hub.load('/kaggle/input/yolov5-lib-ds', 
                               'custom', 
                               path=self.model_path,
                               source='local',
                               force_reload=True)
        
        model.conf = self.confidence_threshold
        self.model = model
        
        if not self.category_mapping:
            category_mapping = {str(ind): category_name for ind, category_name in enumerate(self.category_names)}
            self.category_mapping = category_mapping

    def perform_inference(self, image: np.ndarray, image_size: int = None):
        if image_size is not None:
            warnings.warn("Set 'image_size' at DetectionModel init.", DeprecationWarning)
            prediction_result = self.model(image, size=image_size, augment=True)
            if debug_mode:
                display(Img.fromarray(image).resize((320, 200)))
        elif self.image_size is not None:
            prediction_result = self.model(image, size=self.image_size, augment=True)
        else:
            prediction_result = self.model(image)

        self._original_predictions = prediction_result

    @property
    def num_categories(self):
        """
        Returns number of categories
        """
        return len(self.model.names)

    @property
    def has_mask(self):
        """
        Returns if model output contains segmentation mask
        """
        has_mask = self.model.with_mask
        return has_mask

    @property
    def category_names(self):
        return self.model.names

    def _create_object_prediction_list_from_original_predictions(
        self,
        shift_amount_list: Optional[List[List[int]]] = [[0, 0]],
        full_shape_list: Optional[List[List[int]]] = None,):

        original_predictions = self._original_predictions
        shift_amount_list = fix_shift_amount_list(shift_amount_list)
        full_shape_list = fix_full_shape_list(full_shape_list)

        # handle all predictions
        object_prediction_list_per_image = []
        for image_ind, image_predictions_in_xyxy_format in enumerate(original_predictions.xyxy):
            shift_amount = shift_amount_list[image_ind]
            full_shape = None if full_shape_list is None else full_shape_list[image_ind]
            object_prediction_list = []

            # process predictions
            for prediction in image_predictions_in_xyxy_format.cpu().detach().numpy():
                x1 = int(prediction[0])
                y1 = int(prediction[1])
                x2 = int(prediction[2])
                y2 = int(prediction[3])
                bbox = [x1, y1, x2, y2]
                score = prediction[4]
                category_id = int(prediction[5])
                category_name = self.category_mapping[str(category_id)]

                # ignore invalid predictions
                if bbox[0] > bbox[2] or bbox[1] > bbox[3] or bbox[0] < 0 or bbox[1] < 0 or bbox[2] < 0 or bbox[3] < 0:
                    logger.warning(f"ignoring invalid prediction with bbox: {bbox}")
                    continue
                if full_shape is not None and (
                    bbox[1] > full_shape[0]
                    or bbox[3] > full_shape[0]
                    or bbox[0] > full_shape[1]
                    or bbox[2] > full_shape[1]
                ):
                    logger.warning(f"ignoring invalid prediction with bbox: {bbox}")
                    continue

                object_prediction = ObjectPrediction(
                    bbox=bbox,
                    category_id=category_id,
                    score=score,
                    bool_mask=None,
                    category_name=category_name,
                    shift_amount=shift_amount,
                    full_shape=full_shape,
                )
                object_prediction_list.append(object_prediction)
            object_prediction_list_per_image.append(object_prediction_list)

        self._object_prediction_list_per_image = object_prediction_list_per_image 

### A2. HELPER FUNCTION

In [None]:
def predict(img, model, sw, sh, ohr, owr, pmt, img_size, verb):
    result = get_sliced_prediction(img,
                                   model,
                                   slice_width = sw,
                                   slice_height = sh,
                                   overlap_height_ratio = ohr,
                                   overlap_width_ratio = owr,
                                   postprocess_match_threshold = pmt,
                                   image_size = img_size,
                                   verbose = verb,
                                   perform_standard_pred = True)
    
    
    bboxes = []
    scores = []
    result_len = result.to_coco_annotations()
    for pred in result_len:
        bboxes.append(pred['bbox'])
        scores.append(pred['score'])
    
    return bboxes, scores 

In [None]:
detection_model = COTSYolov5DetectionModel(
   model_path = CKPT_PATH,
   confidence_threshold = 0.35,
   device="cuda",
)

detection_model.model.iou = 0.4

In [None]:
%%custom_yolo5 $CUSTOM_YOLO5_CLASS

detection_model = Yolov5DetectionModel(
   model_path = CKPT_PATH,
   confidence_threshold = 0.35,
   device="cuda",
)

detection_model.model.iou = 0.4

### A3. PREDICTION

In [None]:
# I intruduced DEBUG_MODE so you can understand how SAHI make a slices for predition. 
# If True then it shows slices and ... oryginal image when:
# perform_standard_pred is set to True; if False then only slices are presented) 

debug_mode = True #show slices and oryginal image

In [None]:
dir = f'{DATASET_PATH}'

imgs = [dir + f for f in ('video_2/5748.jpg', 'video_2/5748.jpg', 'video_2/5772.jpg')]

# imgs = [dir + f for f in ('video_2/5748.jpg',
#                           'video_2/5772.jpg',
#                           'video_2/5820.jpg',
#                           'video_1/4159.jpg', 
#                           'video_1/4183.jpg', 
#                           'video_1/4501.jpg',)]

for img in(imgs):
    im = cv2.imread(img)
    im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    if debug_mode:
        print("\n>>>> DEBUG MODE - SHOW SLICES AND FULL FRAME <<<<")
    bboxes, scores = predict(img, detection_model, 768, 432, 0.2, 0.2, 0.45, 3200, 2) #
    display(show_prediction(im, bboxes, scores))

### B. YoloX

In [None]:
# in progress (it will be soon)

## 4. MAKE VIDEO FROM PREDS

In [None]:
import ast
import os
import pandas as pd
import subprocess

from ast import literal_eval
from tqdm.auto import tqdm

from IPython.display import HTML
from base64 import b64encode

In [None]:
df = pd.read_csv("/kaggle/input/tensorflow-great-barrier-reef/train.csv")

In [None]:
def add_path(row):
    return f"{DATASET_PATH}/video_{row.video_id}/{row.video_frame}.jpg"

df['path'] = df.apply(lambda row: add_path(row), axis=1)

In [None]:
def load_image(video_id, video_frame, image_dir):
    assert os.path.exists(image_dir), f'{image_dir} does not exist.'
    img = cv2.imread(image_dir)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return img


def decode_annotations(annotaitons_str):
    return literal_eval(annotaitons_str)

def load_image_with_annotations(img, annotaitons_str):
    annotations = decode_annotations(annotaitons_str)
    if len(annotations) > 0:
        for ann in annotations:
            cv2.rectangle(img, (ann['x'], ann['y']),
                (ann['x'] + ann['width'], ann['y'] + ann['height']),
                (0, 255, 255), thickness=2,)
    return img

In [None]:
df.query('video_id == 2 and sequence == 22643 and video_frame > 5700 ').head(5)

In [None]:
## This code I found in: https://www.kaggle.com/bamps53/create-annotated-video Thank you for sharing.

def make_sahi_video(df, video_id, sequence_id, out_dir):
    fps = 15 
    width = 1280
    height = 720

    save_path = f'{out_dir}/video-{video_id}.mp4'
    tmp_path =  f'{out_dir}/tmp-video-{video_id}.mp4'
    output_video = cv2.VideoWriter(tmp_path, cv2.VideoWriter_fourcc(*"MP4V"), fps, (width, height))
    
    # I just generate ony part of video
    video_df = df.query('video_id == @video_id and sequence == @sequence_id and video_frame > 5700 and video_frame < 6000')
    for _, row in tqdm(video_df.iterrows(), total=len(video_df)):
        video_id = row.video_id
        video_frame = row.video_frame
        annotations_str = row.annotations
        img_file = row.path
        img = load_image(video_id, video_frame, img_file)
        bboxes, scores = predict(img, detection_model, 768, 432, 0.2, 0.2, 0.45, 3200, 0)
        img = show_prediction(img, bboxes, scores, False)
        img = load_image_with_annotations(img, annotations_str)
        cv2.putText(img, f'{video_id}-{video_frame}', (10,70), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,0), 1, cv2.LINE_AA)
        img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
        output_video.write(img)

    
    output_video.release()

    if os.path.exists(save_path):
        os.remove(save_path)
    subprocess.run(
        ["ffmpeg", "-i", tmp_path, "-crf", "18", "-preset", "veryfast", "-vcodec", "libx264", save_path]
    )
    os.remove(tmp_path)

In [None]:
# To speed up I just generate ony part of video
# This prediction is for sure overfitted but it is for demo only (I can see it on prediction)

debug_mode = False

make_sahi_video(df, 2, 22643, '/kaggle/working/')

In [None]:
def play(filename):
    html = ''
    video = open(filename,'rb').read()
    src = 'data:video/mp4;base64,' + b64encode(video).decode()
    html += '<video width=800 controls autoplay loop><source src="%s" type="video/mp4"></video>' % src 
    return HTML(html)

play('/kaggle/working/video-2.mp4')

## 4. SUBMIT

In [None]:
import greatbarrierreef
env = greatbarrierreef.make_env()# initialize the environment
iter_test = env.iter_test()      # an iterator which loops over the test set and sample submission

In [None]:
debug_mode = False

for (image_np, sample_prediction_df) in iter_test:
    
    bboxes, scores = predict(image_np, detection_model, 768, 432, 0.2, 0.2, 0.45, 3200, 0)
    
    predictions = []
    detects = []
    
    for i in range(len(bboxes)):
        box = bboxes[i]
        score = scores[i]
        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[0]) + int(box[2])
        y_max = int(box[1]) + int(box[3])
        
        bbox_width = x_max - x_min
        bbox_height = y_max - y_min
        detects.append([x_min, y_min, x_max, y_max, score])
        
        predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))
    
    
    prediction_str = ' '.join(predictions)
    sample_prediction_df['annotations'] = prediction_str
    env.predict(sample_prediction_df)

In [None]:
sub_df = pd.read_csv('submission.csv')
sub_df.head()