# Motivation

**In my earlier discussion post titled [DRN — Dilated Residual Networks (Image Classification & Semantic Segmentation)](https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/111546) ** a kaggler name Qishen Ha (@haqishen) said "If you want to try out dilated convolution, Deeplabv3 is a good sample to learn" :) so i decided to make this kernel for my fellow kaggle mates and also self teaching deeplabv3 a little bit :)

**Note : i am completely new in deep learning world and has very little knowledge about this field,if you find any implementation error or bug please let me know in the comment box and it will be highly appreciated**

#  Algorithms for Image segmentation

Image segmentation is a long standing computer Vision problem. Quite a few algorithms have been designed to solve this task, such as the Watershed algorithm, Image thresholding , K-means clustering, Graph partitioning methods, etc.

Many deep learning architectures (like fully connected networks for image segmentation) have also been proposed, but Google’s DeepLab model has given the best results till date. That’s why we’ll focus on using DeepLab.DeepLab uses atrous convolution with rates 6, 12 and 18.

# Why using DEEPLABV3-RESNET101?

*For the semantic segmentation model, GluonCV-Torch mainly supports pre-trained FCN, PSPNet and DeepLab-V3. DeepLab-V3 is a very common open source model, which has very good effect on semantic segmentation tasks. The pre-training effects of these three models in the Pascal VOC dataset are shown below, where Pascal VOC contains 20 categories of images:*

![](http://www.programmersought.com/images/427/d258324e597cccec479b706fa7a2a04b.png)

**The following shows the effects of three semantic segmentation models in the ADE20K dataset, where ADE20K is a scene resolution dataset published by MIT that contains a variety of scenarios, including people, backgrounds, and objects.**

![](http://www.programmersought.com/images/928/522173f7eab9d4d27b9bbdd0b833fde8.png)

*  Source of informations above : http://www.programmersought.com/article/5710352893/

*  <font size="4" color="green">Google's Deeplabv3 uses dilated convolution,lets first talk little bit about dilated convolutions!</font>

# Dilated Convolution

 equation of DilatedNet:
 
![](https://miro.medium.com/proxy/1*mlHFvK6H_wMCyURSZNZWGQ.png) 


The left one is the standard convolution. The right one is the dilated convolution. We can see that at the summation, it is s+lt=p that we will skip some points during convolution.

1. When l=1, it is standard convolution

2. When l>1, it is dilated convolution.

![](https://miro.medium.com/max/1185/1*btockft7dtKyzwXqfq70_w.gif)
                  **Standard Convolution (l=1) (Left) Dilated Convolution (l=2) (Right)**

* The above illustrate an example of dilated convolution when l=2. We can see that the receptive field is larger compared with the standard one.

![](https://miro.medium.com/proxy/1*tnDNIyPePgHvb8JIx8SbqA.png)

**l=1 (left), l=2 (Middle), l=4 (Right)**

 The above figure shows more examples about the receptive field.

# Reasons of Dilated Convolution?

1. It is found that with small output feature map obtained at the end of the network, the accuracy is reduced in semantic segmentation.

2. In FCN, it also shows that when 32× upsampling is needed, we can only get a very rough segmentation results. Thus, a larger output feature map is desired.

3. A naive approach is to simply remove subsampling (striding) steps in the network in order to increase the resolution of feature map. However, this also reduces the receptive field which severely reduces the amount of context. such reduction in receptive field is an unacceptable price to pay for higher resolution.

4. For this reason, dilated convolutions are used to increase the receptive field of the higher layers, compensating for the reduction in receptive field induced by removing subsampling.

5. And it is found that using dilated convolution can also help for image classification task in this paper.


## Dilated Residual Networks (DRN)

1. In the paper(mentioned below), it uses d as dilation factor.

2. When d=1, it is standard convolution.

3. When d>1, it is dilated convolution.

![](https://miro.medium.com/max/794/1*-67TMJkhBO3sTtzAg2oUHg.png)

SOURCES : 
1. https://towardsdatascience.com/review-dilated-convolution-semantic-segmentation-9d5a5bd768f5
2. https://towardsdatascience.com/review-drn-dilated-residual-networks-image-classification-semantic-segmentation-d527e1a8fb5

I  highly recommend you to check this link : [Semantic Segmentation: Introduction to the Deep Learning Technique Behind Google Pixel’s Camera!](https://www.analyticsvidhya.com/blog/2019/02/tutorial-semantic-segmentation-google-deeplab/)
i have used some of the contents from that link,definitely a great resource for understanding deeplabv3 well

<font size="6" color="red">Please UPVOTE if you find this kernel Helpful!</font>

**IMPORTS**

In [None]:
from __future__ import print_function

from collections import defaultdict, deque
import datetime
import pickle
import time
import torch.distributed as dist
import errno
from fastai import metrics

import cv2
import collections
import os
import numpy as np

import torch.utils.data
from PIL import Image, ImageFile
import pandas as pd
from tqdm import tqdm_notebook as tqdm
from torchvision import transforms
import torchvision
import random
from torch.utils.data import DataLoader, Dataset, sampler
ImageFile.LOAD_TRUNCATED_IMAGES = True
import cv2
import pdb
import time
import warnings
import random
import numpy as np
import pandas as pd
from tqdm import tqdm_notebook as tqdm
from torch.optim.lr_scheduler import ReduceLROnPlateau
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torch.utils.data import DataLoader, Dataset, sampler
from matplotlib import pyplot as plt
from albumentations import (HorizontalFlip,VerticalFlip,Cutout,SmallestMaxSize,
                            ToGray, ShiftScaleRotate, Blur,Normalize, Resize, Compose, GaussNoise)
from albumentations.pytorch import ToTensor

import matplotlib.pyplot as plt
import pandas as pd
import os
from tqdm import tqdm_notebook
import cv2
from PIL import Image

from torchvision import models
from torch.utils.data import DataLoader, Dataset
import torch.utils.data as utils



In [None]:
import platform
print(f'Python version: {platform.python_version()}')
print(f'PyTorch version: {torch.__version__}')

In [None]:
def seed_everything(seed=43):
    '''
      Make PyTorch deterministic.
    '''    
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.deterministic = True

seed_everything()

IS_DEBUG = False

**Utility**

In [None]:
def warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor):

    def f(x):
        if x >= warmup_iters:
            return 1
        alpha = float(x) / warmup_iters
        return warmup_factor * (1 - alpha) + alpha

    return torch.optim.lr_scheduler.LambdaLR(optimizer, f)

**Loss Function**

In [None]:
class DiceLoss(torch.nn.Module):
    def __init__(self):
        super(DiceLoss, self).__init__()
 
    def forward(self, logits, targets):
        ''' fastai.metrics.dice uses argmax() which is not differentiable, so it 
          can NOT be used in training, however it can be used in prediction.
          see https://github.com/fastai/fastai/blob/master/fastai/metrics.py#L53
        '''
        N = targets.size(0)
        preds = torch.sigmoid(logits)
        #preds = logits.argmax(dim=1) # do NOT use argmax in training, because it is NOT differentiable
        # https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/keras/backend.py#L96
        EPSILON = 1e-7
 
        preds_flat = preds.view(N, -1)
        targets_flat = targets.view(N, -1)
 
        intersection = (preds_flat * targets_flat).sum()#.float()
        union = (preds_flat + targets_flat).sum()#.float()
        
        loss = (2.0 * intersection + EPSILON) / (union + EPSILON)
        loss = 1 - loss / N
        return loss

**Function for training**

In [None]:


def train_one_epoch(model, optimizer, data_loader, device, epoch):
    model.train()
    loss_func = DiceLoss() #nn.BCEWithLogitsLoss() 

    lr_scheduler = None
    if epoch == 0:
        warmup_factor = 1. / 1000
        warmup_iters = min(1000, len(data_loader) - 1)

        lr_scheduler = warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor)

    lossf=None
    inner_tq = tqdm(data_loader, total=len(data_loader), leave=False, desc= f'Iteration {epoch}')
    for images, masks in inner_tq:
        y_preds = model(images.to(device))
        y_preds = y_preds['out'][:, 1, :, :] #

        loss = loss_func(y_preds, masks.to(device))

        if torch.cuda.device_count() > 1:
            loss = loss.mean() # mean() to average on multi-gpu.

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if lr_scheduler is not None:
            lr_scheduler.step()

        if lossf:
            lossf = 0.98*lossf+0.02*loss.item()
        else:
            lossf = loss.item()
        inner_tq.set_postfix(loss = lossf)

**Mask Decoder**

In [None]:
def rle2mask(rle, imgshape):
    width = imgshape[0]
    height= imgshape[1]
    
    mask= np.zeros( width*height ).astype(np.uint8)
    
    array = np.asarray([int(x) for x in rle.split()])
    starts = array[0::2]
    lengths = array[1::2]

    current_position = 0
    for index, start in enumerate(starts):
        mask[int(start):int(start+lengths[index])] = 1
        current_position += lengths[index]
        
    return np.flipud( np.rot90( mask.reshape(height, width), k=1 ) )

**Steel Dataset paths**

In [None]:
print(os.listdir('../input/severstal-steel-defect-detection/'))

In [None]:

path = '../input/severstal-steel-defect-detection/'


In [None]:
tr = pd.read_csv(path + 'train.csv')
print(len(tr))
tr.head()

In [None]:
df_train = tr[tr['EncodedPixels'].notnull()].reset_index(drop=True)

#df_train1 = df_train[df_train['ImageId_ClassId'].apply(lambda x: x.split('_')[1] == '1')].reset_index(drop=True)
#df_train2 = df_train[df_train['ImageId_ClassId'].apply(lambda x: x.split('_')[1] == '2')].reset_index(drop=True)
#df_train3 = df_train[df_train['ImageId_ClassId'].apply(lambda x: x.split('_')[1] == '3')].reset_index(drop=True)
#df_train4 = df_train[df_train['ImageId_ClassId'].apply(lambda x: x.split('_')[1] == '4')].reset_index(drop=True)

#df_train = tr[tr['EncodedPixels']].reset_index(drop=True)
#df_train = tr
print(len(df_train))
df_train.head()

In [None]:
df_train

**Steel DataLoader**

In [None]:
class ImageData(Dataset):
    def __init__(self, df, transform, subset="train"):
        super().__init__()
        self.df = df
        self.transform = transform
        self.subset = subset
        
        if self.subset == "train":
            self.data_path = path + 'train_images/'
        elif self.subset == "test":
            self.data_path = path + 'test_images/'

    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):                      
        fn = self.df['ImageId_ClassId'].iloc[index].split('_')[0]         
        img = Image.open(self.data_path + fn)
        img = self.transform(img)

        if self.subset == 'train': 
            mask = rle2mask(self.df['EncodedPixels'].iloc[index], (256, 1600))
            mask = transforms.ToPILImage()(mask)            
            mask = self.transform(mask)
            return img, mask
        else: 
            mask = None
            return img  

In [None]:
data_transf = transforms.Compose([
                                  transforms.Scale((256, 1600)),
                                  #HorizontalFlip(p=0.5),
                                  #VerticalFlip(p = 0.5),
                                  #Blur(),
                                  #Cutout(),
                                  #ShiftScaleRotate(),
                                  #GaussNoise(),
                                  #ToGray(),
                                  transforms.ToTensor()])
train_data = ImageData(df = df_train, transform = data_transf)

# Understanding the DeepLab Model Architecture

DeepLab V3 uses ImageNet’s pretrained Resnet-101 with atrous convolutions as its main feature extractor. In the modified ResNet model, the last ResNet block uses atrous convolutions with different dilation rates. It uses Atrous Spatial Pyramid Pooling and bilinear upsampling for the decoder module on top of the modified ResNet block.

DeepLab V3+ uses Aligned Xception as its main feature extractor, with the following modifications:

1. All max pooling operations are replaced by depthwise separable convolution with striding
2. Extra batch normalization and ReLU activation are added after each 3 x 3 depthwise convolution
3. Depth of the model is increased without changing the entry flow network structure

![](https://i0.wp.com/s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2019/01/semantic_8.png?resize=649%2C333&ssl=1)

**[DeepLabV3 model with a ResNet-101 backbone](https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101/)**

In [None]:
model_ft = torchvision.models.segmentation.deeplabv3_resnet101(pretrained=False, num_classes=4)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_ft.to(device)
NUM_GPUS = torch.cuda.device_count()
if NUM_GPUS > 1:
    model_ft = torch.nn.DataParallel(model_ft)
_ = model_ft.to(device)

In [None]:
data_loader = torch.utils.data.DataLoader(
    train_data, batch_size=4, shuffle=True, num_workers=NUM_GPUS,drop_last=True
)

In [None]:
# construct an optimizer
params = [p for p in model_ft.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.0001, momentum=0.9, weight_decay=0.0005)

In [None]:
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=5,
                                               gamma=0.1)

In [None]:
num_epochs = 2
for epoch in range(num_epochs):
    train_one_epoch(model_ft, optimizer, data_loader, device, epoch)
    lr_scheduler.step()

In [None]:

for param in model_ft.parameters():
    param.requires_grad = False
model_ft.to(torch.device('cuda'))
#assert model_ft.training == False

model_ft.eval()

<font size="6" color="green">Please UPVOTE if you find this kernel Helpful!</font>

In [None]:
torch.save(model_ft.state_dict(), 'deeplabv3Resnet101.pth')
torch.cuda.empty_cache()

# References 
* [mask-rcnn with augmentation and multiple masks](https://www.kaggle.com/abhishek/mask-rcnn-with-augmentation-and-multiple-masks)

* [SIIM-DeepLabV3](https://www.kaggle.com/soulmachine/siim-deeplabv3)

* [PyTorch Starter (U-Net ResNet)](https://www.kaggle.com/ateplyuk/pytorch-starter-u-net-resnet)