# **SIIM COVID-19 Detectron2 Training**

## Inferance ,EDA and Dataset 
- [SIIM COVID-19 Detectron2 Inferance](https://www.kaggle.com/ammarnassanalhajali/siim-covid19-detectron2-inferance)
- [SIIM-FISABIO-RSNA COVID-19 Detection-EDA](https://www.kaggle.com/ammarnassanalhajali/siim-fisabio-rsna-covid-19-detection-eda)
- [SIIM-COVID-19 Detection Training Labels (Dataset)](https://www.kaggle.com/ammarnassanalhajali/siimcovid19-detection-training-label)


### Hi kagglers, This is `training` notebook using `Detectron2`.

> #### Thanks:
> - https://www.kaggle.com/xhlulu/siim-covid19-resized-to-256px-jpg


### Please if this kernel is useful, <font color='red'>please upvote !!</font>

# Detectron2
Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

# Installation
* detectron2 is not pre-installed in this kaggle docker, so let's install it.
* we need to know CUDA and pytorch version to install correct detectron2.

In [None]:
!nvidia-smi

In [None]:
!nvcc --version

In [None]:
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

* It seems CUDA=11.0 and torch==1.7.0 is used in this kaggle docker image.
* See installation for details. https://detectron2.readthedocs.io/en/latest/tutorials/install.html

# Install Pre-Built Detectron2

In [None]:
!pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html

# Import Libraries

In [None]:

import numpy as np 
import pandas as pd 
from datetime import datetime
import time
from tqdm import tqdm_notebook as tqdm # progress bar
import matplotlib.pyplot as plt

import os, json, cv2, random
import skimage.io as io
import copy
import pickle
from pathlib import Path
from typing import Optional
from tqdm import tqdm

# torch
import torch



# Albumenatations
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2

#from pycocotools.coco import COCO
from sklearn.model_selection import StratifiedKFold

# glob
from glob import glob

# numba
import numba
from numba import jit

import warnings
warnings.filterwarnings('ignore') #Ignore "future" warnings and Data-Frame-Slicing warnings.


# detectron2
from detectron2.structures import BoxMode
from detectron2 import model_zoo
from detectron2.config import get_cfg
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.engine import DefaultPredictor, DefaultTrainer, launch
from detectron2.evaluation import COCOEvaluator
from detectron2.structures import BoxMode
from detectron2.utils.visualizer import ColorMode
from detectron2.utils.logger import setup_logger
from detectron2.utils.visualizer import Visualizer

from detectron2.data import DatasetCatalog, MetadataCatalog, build_detection_test_loader, build_detection_train_loader
from detectron2.data import detection_utils as utils


from detectron2.data import DatasetCatalog, MetadataCatalog, build_detection_test_loader, build_detection_train_loader
from detectron2.data import detection_utils as utils
import detectron2.data.transforms as T
from detectron2.evaluation import COCOEvaluator, inference_on_dataset

setup_logger()

# Data Loading

In [None]:
# --- Read data ---
imgdir = "../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640" # Orginial Size
# Read in the data CSV files
train_df = pd.read_csv("../input/siimcovid19-detection-training-label/train_image_df.csv")
len(train_df)

# configs

In [None]:
# --- configs ---
thing_classes = [
    "atypical",
    "indeterminate",
    "negative",
    "typical"
]

debug=False
split_mode="valid20" # Or  valid20 all_train


category_name_to_id = {class_name: index for index, class_name in enumerate(thing_classes)}
category_name_to_id

# Data preparation
* `detectron2` provides high-level API for training custom dataset.

To define custom dataset, we need to create **list of dict** (`dataset_dicts`) where each dict contains following:

 - file_name: file name of the image.
 - image_id: id of the image, index is used here.
 - height: height of the image.
 - width: width of the image.
 - annotation: This is the ground truth annotation data for object detection, which contains following
     - bbox: bounding box pixel location with shape (n_boxes, 4)
     - bbox_mode: `BoxMode.XYXY_ABS` is used here, meaning that absolute value of (x_min, y_min, x_max, y_max) annotation is used in the `bbox`.
     - category_id: class label id for each bounding box, with shape (n_boxes,)

`get_COVID19_data_dicts` is for train dataset preparation and `get_COVID19_data_dicts_test` is for test dataset preparation.

In [None]:
from glob import glob

def get_COVID19_data_dicts(
    imgdir: Path,
    train_df: pd.DataFrame,
    use_cache: bool = True,
    target_indices: Optional[np.ndarray] = None,
    debug: bool = False,
    data_type:str="train"
   
):

    cache_path = Path(".") / f"dataset_dicts_cache_{data_type}.pkl"
    if not use_cache or not cache_path.exists():
        print("Creating data...")
        df_meta = pd.read_csv("../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640/meta_sz_640.csv")
        train_meta=df_meta[df_meta.split=="train"]
        if debug:
            train_meta = train_meta.iloc[:100]  # For debug....

        # Load 1 image to get image size.
        image_id = train_meta.iloc[0,0]
        #image_path = str(imgdir / "train" / f"{image_id}.jpg")
        image_path = str(f'../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640/train/{image_id}.jpg')
        image = cv2.imread(image_path)
        resized_height, resized_width, ch = image.shape
        print(f"image shape: {image.shape}")

        dataset_dicts = []
        for index, train_meta_row in tqdm(train_meta.iterrows(), total=len(train_meta)):
            record = {}
            image_id, height, width,s = train_meta_row.values
            #filename = str(imgdir / "train" / f"{image_id}.jpg")
            filename = str(f'../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640/train/{image_id}.jpg')
            record["file_name"] = filename
            record["image_id"] = image_id
            record["height"] = resized_height
            record["width"] = resized_width
            objs = []
            for index2, row in train_df.query("id == @image_id").iterrows():
                # print(row)
                # print(row["class_name"])
                # class_name = row["class_name"]
                class_id = row["integer_label"]
                if class_id == 2: # NO class
                    # It is "No finding"
 
                    # Use this No finding class with the bbox covering all image area.
                    bbox_resized = [0, 0, resized_width, resized_height]
                    #bbox_resized = [50, 50, 200, 200]
                    obj = {
                        "bbox": bbox_resized,
                        "bbox_mode": BoxMode.XYXY_ABS,
                        "category_id": class_id,
                    }
                    objs.append(obj)
                else:
                    #bbox_original = [int(row["x_min"]), int(row["y_min"]), int(row["x_max"]), int(row["y_max"])]
                    h_ratio = resized_height / height
                    w_ratio = resized_width / width
                    bbox_resized = [
                        float(row["x_min"]) * w_ratio,
                        float(row["y_min"]) * h_ratio,
                        float(row["x_max"]) * w_ratio,
                        float(row["y_max"]) * h_ratio,
                    ]
                    obj = {
                        "bbox": bbox_resized,# bbox_original, #bbox_resized#
                        "bbox_mode": BoxMode.XYXY_ABS,
                        "category_id": class_id,
                    }
                    objs.append(obj)
            record["annotations"] = objs
            dataset_dicts.append(record)
        with open(cache_path, mode="wb") as f:
            pickle.dump(dataset_dicts, f)

    print(f"Load from cache {cache_path}")
    with open(cache_path, mode="rb") as f:
        dataset_dicts = pickle.load(f)
    if target_indices is not None:
        dataset_dicts = [dataset_dicts[i] for i in target_indices]
    return dataset_dicts


def get_COVID19_data_dicts_test(
    imgdir: Path, test_meta: pd.DataFrame, use_cache: bool = True, debug: bool = False,
):
    debug_str = f"_debug{int(debug)}"
    cache_path = Path(".") / f"dataset_dicts_cache_test.pkl"
    if not use_cache or not cache_path.exists():
        print("Creating data...")
        # test_meta = pd.read_csv(imgdir / "test_meta.csv")
        df_meta = pd.read_csv("../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640/meta_sz_640.csv")
        test_meta=df_meta[df_meta.split=="test"]
        if debug:
            test_meta = test_meta.iloc[:100]  # For debug....
        # Load 1 image to get image size.
        image_id = test_meta.iloc[0,0]
        #image_path = str(imgdir / "test" / f"{image_id}.jpg")
        image_path = str(f'../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640/test/{image_id}.jpg')
        #image = cv2.imread(image_path)
        #resized_height, resized_width, ch = image.shape
        #print(f"image shape: {image.shape}")

        dataset_dicts = []
        for index, test_meta_row in tqdm(test_meta.iterrows(), total=len(test_meta)):
            record = {}

            image_id, height, width,s = test_meta_row.values
            #filename = str(imgdir / "test" / f"{image_id}.jpg")
            filename = str(f'../input/siim-covid19-resized-384512-and-640px/SIIM-COVID19-Resized/img_sz_640/test/{image_id}.jpg')
            record["file_name"] = filename
            # record["image_id"] = index
            record["image_id"] = image_id
            record["height"] = height
            record["width"] = width
            # objs = []
            # record["annotations"] = objs
            dataset_dicts.append(record)
        with open(cache_path, mode="wb") as f:
            pickle.dump(dataset_dicts, f)

    #print(f"Load from cache {cache_path}")
    with open(cache_path, mode="rb") as f:
        dataset_dicts = pickle.load(f)
    return dataset_dicts

In [None]:
if split_mode == "all_train":
    DatasetCatalog.register(
        "COVID19_data_train",
        lambda: get_COVID19_data_dicts(
            imgdir,
            train_df,
            debug=debug,
            data_type="train"
        ),
    )
    MetadataCatalog.get("COVID19_data_train").set(thing_classes=thing_classes)
    
    
    dataset_dicts_train = DatasetCatalog.get("COVID19_data_train")
    metadata_dicts_train = MetadataCatalog.get("COVID19_data_train")
    
    
elif split_mode == "valid20":

    n_dataset = len(
        get_COVID19_data_dicts(
            imgdir, train_df, debug=debug,data_type="All"
        )
    )
    n_train = int(n_dataset * 0.90)
    print("n_dataset", n_dataset, "n_train", n_train)
    rs = np.random.RandomState(42)
    inds = rs.permutation(n_dataset)
    train_inds, valid_inds = inds[:n_train], inds[n_train:]

    DatasetCatalog.register(
        "COVID19_data_train",
        lambda: get_COVID19_data_dicts(
            imgdir,
            train_df,
            target_indices=train_inds,
            debug=debug,
            data_type="train"
        ),
    )
    MetadataCatalog.get("COVID19_data_train").set(thing_classes=thing_classes)
    

    DatasetCatalog.register(
        "COVID19_data_valid",
        lambda: get_COVID19_data_dicts(
            imgdir,
            train_df,
            target_indices=valid_inds,
            debug=debug,
            data_type="val"
            ),
        )
    MetadataCatalog.get("COVID19_data_valid").set(thing_classes=thing_classes)
    
    dataset_dicts_train = DatasetCatalog.get("COVID19_data_train")
    metadata_dicts_train = MetadataCatalog.get("COVID19_data_train")

    dataset_dicts_valid = DatasetCatalog.get("COVID19_data_valid")
    metadata_dicts_valid = MetadataCatalog.get("COVID19_data_valid")
    
else:
    raise ValueError(f"[ERROR] Unexpected value split_mode={split_mode}")

In [None]:
dataset_dicts_train[0]

In [None]:
df=train_df[train_df.id=="4e95d8ca7f2f"]
df


<a id="data_vis"></a>
# Data Visualization

It's also very easy to visualize prepared training dataset with `detectron2`.<br/>
It provides `Visualizer` class, we can use it to draw an image with bounding box as following.

In [None]:
fig, ax = plt.subplots(2, 4, figsize =(20,10))
indices=[ax[0][0],ax[1][0],ax[0][1],ax[1][1],ax[0][2],ax[1][2],ax[0][3],ax[1][3]]
i=-1
for d in random.sample(dataset_dicts_train, 8):
    i=i+1    
    img = cv2.imread(d["file_name"])
    v = Visualizer(img[:, :, ::-1],
                   metadata=metadata_dicts_train, 
                   scale=0.5, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_dataset_dict(d)
    indices[i].grid(False)
    indices[i].axis('off')
    indices[i].imshow(out.get_image()[:, :, ::-1])

# Data Augmentation
The dataset is transformed by changing the brighness and flipping the image with 50% probability.

In [None]:
def custom_mapper(dataset_dict):
    
    dataset_dict = copy.deepcopy(dataset_dict)
    image = utils.read_image(dataset_dict["file_name"], format="BGR")
    transform_list = [#T.Resize((640,640)),
                      T.RandomBrightness(0.8, 1.2),
                      T.RandomFlip(prob=0.5, horizontal=False, vertical=True),
                      T.RandomFlip(prob=0.5, horizontal=True, vertical=False)
                      ]
    image, transforms = T.apply_transform_gens(transform_list, image)
    dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))

    annos = [
        utils.transform_instance_annotations(obj, transforms, image.shape[:2])
        for obj in dataset_dict.pop("annotations")
        if obj.get("iscrowd", 0) == 0
    ]
    instances = utils.annotations_to_instances(annos, image.shape[:2])
    dataset_dict["instances"] = utils.filter_empty_instances(instances)
    return dataset_dict
class AugTrainer(DefaultTrainer):
    @classmethod
    def build_train_loader(cls, cfg):
        return build_detection_train_loader(cfg, mapper=custom_mapper)

In [None]:
import torch
torch.cuda.empty_cache()

In [None]:
import gc
gc.collect()
torch.cuda.empty_cache()

# Training

In [None]:
cfg = get_cfg()
#config_name = "COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml" #1
#config_name = "COCO-Detection/faster_rcnn_R_101_DC5_3x.yaml"      #2
config_name = "COCO-Detection/faster_rcnn_R_101_C4_3x.yaml"      #3


cfg.merge_from_file(model_zoo.get_config_file(config_name))
cfg.DATASETS.TRAIN = ("COVID19_data_train",)

if split_mode == "all_train":
    cfg.DATASETS.TEST = ()
else:
    cfg.DATASETS.TEST = ("COVID19_data_valid",)
    cfg.TEST.EVAL_PERIOD = 1000

cfg.DATALOADER.NUM_WORKERS = 4
#cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(config_name)
cfg.MODEL.WEIGHTS="../input/1siim-covid19-detectron2-weights/output/model_final.pth"


cfg.SOLVER.IMS_PER_BATCH = 1
cfg.SOLVER.BASE_LR = 0.00025


cfg.SOLVER.WARMUP_ITERS = 1000
cfg.SOLVER.MAX_ITER = 20000 #adjust up if val mAP is still rising, adjust down if overfit
#cfg.SOLVER.STEPS = (100, 500) # must be less than  MAX_ITER 
#cfg.SOLVER.GAMMA = 0.05


cfg.SOLVER.CHECKPOINT_PERIOD = 1000  # Small value=Frequent save need a lot of storage.
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4


os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)


#Training using custom trainer defined above
trainer = AugTrainer(cfg) 
#trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

# Evaluator

* Famouns dataset's evaluator is already implemented in detectron2.
* For example, many kinds of AP (Average Precision) is calculted in COCOEvaluator.
* COCOEvaluator only calculates AP with IoU from 0.50 to 0.95

In [None]:
evaluator = COCOEvaluator("COVID19_data_valid", cfg, False, output_dir="./output/")
#cfg.MODEL.WEIGHTS="./output/model_final.pth"
#cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.001   # set a custom testing threshold
val_loader = build_detection_test_loader(cfg, "COVID19_data_valid")
inference_on_dataset(trainer.model, val_loader, evaluator)

In [None]:
import pandas as pd
metrics_df = pd.read_json("./output/metrics.json", orient="records", lines=True)
mdf = metrics_df.sort_values("iteration")
mdf.head(10).T

In [None]:
# 1. Loss curve
fig, ax = plt.subplots()

mdf1 = mdf[~mdf["total_loss"].isna()]
ax.plot(mdf1["iteration"], mdf1["total_loss"], c="C0", label="train")
if "validation_loss" in mdf.columns:
    mdf2 = mdf[~mdf["validation_loss"].isna()]
    ax.plot(mdf2["iteration"], mdf2["validation_loss"], c="C1", label="validation")

# ax.set_ylim([0, 0.5])
ax.legend()
ax.set_title("Loss curve")
plt.show()

In [None]:
# 1. Loss curve
fig, ax = plt.subplots()

mdf1 = mdf[~mdf["fast_rcnn/cls_accuracy"].isna()]
ax.plot(mdf1["iteration"], mdf1["fast_rcnn/cls_accuracy"], c="C0", label="train")
# ax.set_ylim([0, 0.5])
ax.legend()
ax.set_title("Accuracy curve")
plt.show()

# References
1. https://www.kaggle.com/ammarnassanalhajali/training-detectron2-for-blood-cells-detection
1. https://www.kaggle.com/corochann/vinbigdata-detectron2-train
