## UNet starter for Steel defect detection challenge


This kernel uses a UNet model with pretrained resnet18 encoder for this challenge, with simple augmentations using albumentations library, uses BCE loss, metrics like Dice and IoU. I've used [segmentation_models.pytorch](https://github.com/qubvel/segmentation_models.pytorch) which comes with a lot pre-implemented segmentation architectures. This is a modified version of my previous [kernel](https://www.kaggle.com/rishabhiitbhu/unet-with-resnet34-encoder-pytorch) for [siim-acr-pneumothorax-segmentation](https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/) competition.

**As internet is not allowed for this competition, I tried installing `segmentation_models.pytorch` by source using pip but due to some reasons it didn't work. So, as a [Jugaad](https://en.wikipedia.org/wiki/Jugaad) I took all of `segmentation_models.pytorch`'s UNet code and wrote it in a single file and added it as a dataset so as to use it for this kernel, its dependency [pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch) is also added as a dataset.

In [None]:
!pip install ../input/pretrainedmodels/pretrainedmodels-0.7.4/pretrainedmodels-0.7.4/ > /dev/null # no output
package_path = '../input/unetmodelscript' # add unet script dataset
import sys
sys.path.append(package_path)
from model import Unet # import Unet model from the script, modelってどこで設定されたんだ

## Imports

In [None]:
import os
import cv2
import pdb
import time
import warnings
import random
import numpy as np
import pandas as pd
from tqdm import tqdm_notebook as tqdm
from torch.optim.lr_scheduler import ReduceLROnPlateau
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader, Dataset, sampler
from matplotlib import pyplot as plt
from albumentations import (HorizontalFlip, ShiftScaleRotate, Normalize, Resize, Compose, GaussNoise)
from albumentations.torch import ToTensor
warnings.filterwarnings("ignore")
seed = 69
random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
np.random.seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True

## RLE-Mask utility functions

In [None]:
#https://www.kaggle.com/paulorzp/rle-functions-run-lenght-encode-decode
def mask2rle(img): #このkernelでは使ってない(多分)
    '''
    img: numpy array, 1 -> mask, 0 -> background
    Returns run length as string formated
    '''
    pixels= img.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

def make_mask(row_id, df):
    '''Given a row index, return image_id and mask (256, 1600, 4) from the dataframe `df`'''
    fname = df.iloc[row_id].name #ilocはseriesを返し、そのseriesのnameはもとの行のindex
    labels = df.iloc[row_id][:4] #defectsを除いたseries
    masks = np.zeros((256, 1600, 4), dtype=np.float32) # float32 is V.Imp(←what is V.Imp) 
    # 4:class 1～4 (ch:0～3)

    for idx, label in enumerate(labels.values): #labels.valuesは1×4のnp.arrayで要素はencoded pixels
        if label is not np.nan:
            label = label.split(" ") #一つのクラスのencoded pixelsを縦にひたすら並べたやつ
            positions = map(int, label[0::2])
            length = map(int, label[1::2])
            mask = np.zeros(256 * 1600, dtype=np.uint8) #これは一次元配列
            for pos, le in zip(positions, length):
                mask[pos:(pos + le)] = 1 #傷あるところを1にしてmask
            masks[:, :, idx] = mask.reshape(256, 1600, order='F') #order="F"にして一次元配列を縦に並べていく
    return fname, masks

## Dataloader

In [None]:
class SteelDataset(Dataset):
    def __init__(self, df, data_folder, mean, std, phase):
        self.df = df
        self.root = data_folder
        self.mean = mean
        self.std = std
        self.phase = phase
        self.transforms = get_transforms(phase, mean, std)
        self.fnames = self.df.index.tolist() #imageidのリスト

    def __getitem__(self, idx): #idxは数字
        image_id, mask = make_mask(idx, self.df)
        image_path = os.path.join(self.root, "train_images",  image_id)
        img = cv2.imread(image_path)
        augmented = self.transforms(image=img, mask=mask) #list_trfmsはmaskもtargetとして取る
        img = augmented['image']
        mask = augmented['mask'] # 1x256x1600x4
        mask = mask[0].permute(2, 0, 1) # 1x4x256x1600
        return img, mask #imgは1×1(3?)×256×1600

    def __len__(self):
        return len(self.fnames)


def get_transforms(phase, mean, std):
    list_transforms = []
    if phase == "train":
        list_transforms.extend(
            [
                HorizontalFlip(p=0.5), # only horizontal flip as of now,中心軸に関して確率0.5で対称移動
            ]
        )
    list_transforms.extend(
        [
            Normalize(mean=mean, std=std, p=1), #meanを引いてstdで割る(?)
            ToTensor(),
        ]
    )
    list_trfms = Compose(list_transforms)
    return list_trfms

def provider(
    data_folder,
    df_path,
    phase,
    mean=None,
    std=None,
    batch_size=8,
    num_workers=4,
):
    '''Returns dataloader for the model training'''
    df = pd.read_csv(df_path)
    # https://www.kaggle.com/amanooo/defect-detection-starter-u-net
    df['ImageId'], df['ClassId'] = zip(*df['ImageId_ClassId'].str.split('_'))
    df['ClassId'] = df['ClassId'].astype(int)
    df = df.pivot(index='ImageId',columns='ClassId',values='EncodedPixels')
    df['defects'] = df.count(axis=1) #行ごとのnanじゃないところの数
    
    train_df, val_df = train_test_split(df, test_size=0.2, stratify=df["defects"], random_state=69)
    df = train_df if phase == "train" else val_df
    image_dataset = SteelDataset(df, data_folder, mean, std, phase)
    dataloader = DataLoader(
        image_dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        pin_memory=True,
        shuffle=True,   
    )

    return dataloader


## Some more utility functions

Dice and IoU metric implementations, metric logger for training and validation.

In [None]:
def predict(X, threshold):
    '''X is sigmoid output of the model'''
    X_p = np.copy(X)
    preds = (X_p > threshold).astype('uint8') #thredholdより大きいところは1
    return preds

def metric(probability, truth, threshold=0.5, reduction='none'):
    '''Calculates dice of positive and negative images seperately''' #陽性陰性のpositive,negative(?)
    '''probability and truth must be torch tensors'''
    batch_size = len(truth) #truthはbatch_size×4×256×1600なのでlenはbatch_size
    with torch.no_grad():
        probability = probability.view(batch_size, -1) #torch.Size([batch_size,4×256×1600])）(二次元配列にする)
        truth = truth.view(batch_size, -1)
        assert(probability.shape == truth.shape)

        p = (probability > threshold).float() # thresholdを超えてるところが1ほかは0のtorch.Tensor,torch.Size([batch_size,4×256×1600])
                                              # 各行が写真一枚に対応(?)
                                              # これtorch.Size([batch_size×4,256×1600])して各行をある写真のあるクラスに対応させるべきでは?
        t = (truth > 0.5).float() #これいるのか?floatにしたいだけ?

        t_sum = t.sum(-1) #行ごとに足し算,torch.Size([1,batch_size])(横ベクトル)
        p_sum = p.sum(-1)
        neg_index = torch.nonzero(t_sum == 0) # t_sumが0のところの位置,torch.Size([hoge,1])(縦ベクトル)。
                                              # 要するに一つも傷がない写真のbatch内でのindex(0~batch_size-1)
        pos_index = torch.nonzero(t_sum >= 1) # 傷が一つ以上ある写真のbatch内でのindex
        
        dice_neg = (p_sum == 0).float() # 傷がある写真は0,無い写真は1。torch.Size([1,batch_size])
        dice_pos = 2 * (p*t).sum(-1)/((p+t).sum(-1)) # 分母0になりうるのでは?,torch.Size([1,batch_size])
                                                     # 要素はそれぞれの写真のdice_score(ただしtに1が含まれるときのみ意味を持つ)

        dice_neg = dice_neg[neg_index] 
        dice_pos = dice_pos[pos_index] 
        dice = torch.cat([dice_pos, dice_neg]) #torch.Size([1,batch_size])ただし並び順はt_sumやp_sumとは異なる

        dice_neg = np.nan_to_num(dice_neg.mean().item(), 0) # .mean()で平均を取って.item()でtensorからnumberにする,nanなら0に直す
                                                            # nanは分母が0だったときに発生?
        dice_pos = np.nan_to_num(dice_pos.mean().item(), 0)
        dice = dice.mean().item()
        
        num_neg = len(neg_index)
        num_pos = len(pos_index)

    return dice, dice_neg, dice_pos, num_neg, num_pos

class Meter:
    '''A meter to keep track of iou and dice scores throughout an epoch'''
    def __init__(self, phase, epoch):
        self.base_threshold = 0.5 # <<<<<<<<<<< here's the threshold
        self.base_dice_scores = []
        self.dice_neg_scores = []
        self.dice_pos_scores = []
        self.iou_scores = []

    def update(self, targets, outputs): #dice,dice_neg,dice_pos,iouのリストを更新
        probs = torch.sigmoid(outputs)
        dice, dice_neg, dice_pos, _, _ = metric(probs, targets, self.base_threshold)
        self.base_dice_scores.append(dice)
        self.dice_pos_scores.append(dice_pos)
        self.dice_neg_scores.append(dice_neg)
        preds = predict(probs, self.base_threshold)
        iou = compute_iou_batch(preds, targets, classes=[1])
        self.iou_scores.append(iou)

    def get_metrics(self):
        dice = np.mean(self.base_dice_scores)
        dice_neg = np.mean(self.dice_neg_scores)
        dice_pos = np.mean(self.dice_pos_scores)
        dices = [dice, dice_neg, dice_pos]
        iou = np.nanmean(self.iou_scores)
        return dices, iou

def epoch_log(phase, epoch, epoch_loss, meter, start):
    '''logging the metrics at the end of an epoch'''
    dices, iou = meter.get_metrics()
    dice, dice_neg, dice_pos = dices
    print("Loss: %0.4f | IoU: %0.4f | dice: %0.4f | dice_neg: %0.4f | dice_pos: %0.4f" % (epoch_loss, iou, dice, dice_neg, dice_pos))
    return dice, iou

def compute_ious(pred, label, classes, ignore_index=255, only_present=True): #pred,labelのshapeは(4,256,1600)
    '''computes iou for one ground truth mask and predicted mask'''
    pred[label == ignore_index] = 0 #labelに255のところとかあるの?すべて0or1では?
    ious = []
    for c in classes: #classes=[1]
        label_c = label == c # c=1なら変化なし、c=0なら01が反転
        if only_present and np.sum(label_c) == 0: #np.sum(label_c)はc=1ならlabelのなかの1の数,np.sum(label_c)==0なら傷なし
            ious.append(np.nan)
            continue
        pred_c = pred == c
        intersection = np.logical_and(pred_c, label_c).sum() 
        union = np.logical_or(pred_c, label_c).sum()
        if union != 0:
            ious.append(intersection / union) #IoUを計算
    return ious if ious else [1] # if iousとは,iousが空なら[1]を返す?

def compute_iou_batch(outputs, labels, classes=None): #これについても画像ごとではなくて画像のさらに1クラスごとにやるべき?
    '''computes mean iou for a batch of ground truth masks and predicted masks'''
    ious = []
    preds = np.copy(outputs) # copy is imp(impとは),np.shape(batch_size,4,256,1600)
    labels = np.array(labels) # tensor to np, labelsは正解のmask
    for pred, label in zip(preds, labels): #pred,labelのshapeは(4,256,1600)
        ious.append(np.nanmean(compute_ious(pred, label, classes))) #np.nanmeanはnanを除外した平均
    iou = np.nanmean(ious)
    return iou


## Model Initialization

In [None]:
!mkdir -p /tmp/.cache/torch/checkpoints/
!cp ../input/resnet18/resnet18.pth /tmp/.cache/torch/checkpoints/resnet18-5c106cde.pth

In [None]:
model = Unet("resnet18", encoder_weights="imagenet", classes=4, activation=None)

In [None]:
model # a *deeper* look

### Training and Validation

In [None]:
class Trainer(object):
    '''This class takes care of training and validation of our model'''
    def __init__(self, model):
        self.num_workers = 6
        self.batch_size = {"train": 4, "val": 4}
        self.accumulation_steps = 32 // self.batch_size['train'] 
        self.lr = 5e-4
        self.num_epochs = 20
        self.best_loss = float("inf") #無限大で初期化
        self.phases = ["train", "val"]
        self.device = torch.device("cuda:0")
        torch.set_default_tensor_type("torch.cuda.FloatTensor")
        self.net = model
        self.criterion = torch.nn.BCEWithLogitsLoss() #loss_funcはあるやつを使う(diceにしちゃう?)こいつは内部的にsigmoidを挟んでる?
        self.optimizer = optim.Adam(self.net.parameters(), lr=self.lr)
        self.scheduler = ReduceLROnPlateau(self.optimizer, mode="min", patience=3, verbose=True)
        self.net = self.net.to(self.device)
        cudnn.benchmark = True
        self.dataloaders = {
            phase: provider(
                data_folder=data_folder,
                df_path=train_df_path,
                phase=phase,
                mean=(0.485, 0.456, 0.406), #ここを見るとcolorな感じだけどどうなんだろう
                std=(0.229, 0.224, 0.225),
                batch_size=self.batch_size[phase],
                num_workers=self.num_workers,
            )
            for phase in self.phases
        }
        self.losses = {phase: [] for phase in self.phases}
        self.iou_scores = {phase: [] for phase in self.phases}
        self.dice_scores = {phase: [] for phase in self.phases}
        
    def forward(self, images, targets):
        images = images.to(self.device)
        masks = targets.to(self.device)
        outputs = self.net(images) #このときoutputsはbatch_size×4×256×1600
        loss = self.criterion(outputs, masks)
        return loss, outputs

    def iterate(self, epoch, phase):
        meter = Meter(phase, epoch) #meterはepochごとに作り変える(それはそう)
        start = time.strftime("%H:%M:%S")
        print(f"Starting epoch: {epoch} | phase: {phase} | ⏰: {start}")
        batch_size = self.batch_size[phase]
        self.net.train(phase == "train") #tain_modeにするかどうか
        dataloader = self.dataloaders[phase]
        running_loss = 0.0
        total_batches = len(dataloader)
#         tk0 = tqdm(dataloader, total=total_batches)
        self.optimizer.zero_grad()
        for itr, batch in enumerate(dataloader): # replace `dataloader` with `tk0` for tqdm
            images, targets = batch #imagesはbatch_size×1×256×1600(grayscale?),targetsはbatch_size×4×256×1600(多分)
            loss, outputs = self.forward(images, targets)
            loss = loss / self.accumulation_steps
            if phase == "train":
                loss.backward()
                if (itr + 1 ) % self.accumulation_steps == 0:
                    self.optimizer.step()
                    self.optimizer.zero_grad() #最適化を実行するのはaccumulateしてから(defaultでは8epochごと)
            running_loss += loss.item()
            outputs = outputs.detach().cpu()
            meter.update(targets, outputs)
#             tk0.set_postfix(loss=(running_loss / ((itr + 1))))
        epoch_loss = (running_loss * self.accumulation_steps) / total_batches
        dice, iou = epoch_log(phase, epoch, epoch_loss, meter, start)
        self.losses[phase].append(epoch_loss)
        self.dice_scores[phase].append(dice)
        self.iou_scores[phase].append(iou)
        torch.cuda.empty_cache()
        return epoch_loss

    def start(self):
        for epoch in range(self.num_epochs):
            self.iterate(epoch, "train")
            state = {
                "epoch": epoch,
                "best_loss": self.best_loss,
                "state_dict": self.net.state_dict(),
                "optimizer": self.optimizer.state_dict(),
            }
            with torch.no_grad():
                val_loss = self.iterate(epoch, "val")
                self.scheduler.step(val_loss)
            if val_loss < self.best_loss:
                print("******** New optimal found, saving state ********")
                state["best_loss"] = self.best_loss = val_loss
                torch.save(state, "./model.pth")
            print()


In [None]:
sample_submission_path = '../input/severstal-steel-defect-detection/sample_submission.csv'
train_df_path = '../input/severstal-steel-defect-detection/train.csv'
data_folder = "../input/severstal-steel-defect-detection/"
test_data_folder = "../input/severstal-steel-defect-detection/test_images"

In [None]:
model_trainer = Trainer(model)
model_trainer.start()

In [None]:
# PLOT TRAINING
losses = model_trainer.losses
dice_scores = model_trainer.dice_scores # overall dice
iou_scores = model_trainer.iou_scores

def plot(scores, name):
    plt.figure(figsize=(15,5))
    plt.plot(range(len(scores["train"])), scores["train"], label=f'train {name}')
    plt.plot(range(len(scores["train"])), scores["val"], label=f'val {name}')
    plt.title(f'{name} plot'); plt.xlabel('Epoch'); plt.ylabel(f'{name}');
    plt.legend(); 
    plt.show()

plot(losses, "BCE loss")
plot(dice_scores, "Dice score")
plot(iou_scores, "IoU score")

## Test prediction and submission

This training and validation takes about ~400 minutes which exceeds Kaggle's GPU usage limit of 60 minutes, we won't be able to submit the `submission.csv` file generated from this kernel. So, for test prediction and submission I've written a separate [UNet inference kernel](https://www.kaggle.com/rishabhiitbhu/unet-pytorch-inference-kernel), make sure you add the `model.pth` file generated from this kernel as dataset to test inference kernel.

I've used resnet-18 architecture in this kernel. It scores ~0.89 on LB. Try to play around with other architectures of `segmenation_models.pytorch` and see what works best for you, let me know in the comments :) and do upvote if you liked this kernel, I need some medals too. 😬

## Refrences:

Few kernels from which I've borrowed some cod[](http://)e:

* https://www.kaggle.com/amanooo/defect-detection-starter-u-net
* https://www.kaggle.com/go1dfish/clear-mask-visualization-and-simple-eda

A big thank you to all those who share their code on Kaggle, I'm nobody without you guys. I've learnt a lot from fellow kagglers, special shout-out to [@Abhishek](https://www.kaggle.com/abhishek), [@Yury](https://www.kaggle.com/deyury), [@Heng](https://www.kaggle.com/hengck23), [@Ekhtiar](https://www.kaggle.com/ekhtiar), [@lafoss](https://www.kaggle.com/iafoss), [@Siddhartha](https://www.kaggle.com/meaninglesslives), [@xhulu](https://www.kaggle.com/xhlulu), and the list goes on..