<img src="https://storage.googleapis.com/kaggle-media/competitions/MaxPlanck/Teaser_AnimationwLabels.gif" width="800px">



**About the project**


In this challenge, we have to build a model to classify cloud organization patterns from satellite images.
There are many ways in which clouds can organize, but the boundaries between different forms of organization are murky. This makes it challenging to build traditional rule-based algorithms to separate cloud features. The human eye, however, is really good at detecting featuresâ€”such as clouds that resemble flowers.
The input data are images with 4 different types of clouds (fish, flower, gravel and sugar) and a csv file which describes the clouds position in the image.
The predicted encodings should be against images that are scaled by 0.25 per side. In other words, while the images in Train and Test are 1400 x 2100 pixels, the predictions should be scaled down to a 350 x 525 pixel image. The reduction is required to achieve reasonable submission evaluation times

**Libraries: **  

Torch(with Catalyst and segmentation_models_pytorch), OpenCV, Albumentation and other standard Python libraries(numpy, pandas, matplotlib)  

**Data visualization: ** 

check how many fish,flower,gravel and sugar valid data are in the training set  
check how many clouds are per picture in the training set  
show a specific or random image with the segmentation data plotted over the original image  

**Image augmentation**  

Final version: Resize(320, 640), HorizontalFlip,VerticalFlip,ShiftScaleRotate  
Other attempts: Blur, MedianBlur, GridDistortion  
  
**Loss function**  

Although the project is evaluated by Dice loss I have tested multiple loss function:  
Dice loss  
IoU loss  
Focal loss  
A custom metric which contains a linear combination of Dice loss, IoU loss and Focal loss each with a specific weight  

**Network arhitecture**  

I have build the segmentation arhitecture based on different types of networks:  
Resnet50  
Resnet101
Efficientnet-b2
Efficientnet-b7  
Densenet121  

**Other aspects**  
  
  I have tried various learning rate options starting from 1e-3 for Encoder and 1e-2 for Decoder to smaller ones(5e-4 and 5e-3) and even equal learning rates for Encoder and Decoder  
  Also, I have used and really helped ReduceLROnPlateau with factor=0.3 and patience=5  
  
**Problems and Issues**  

I faced a problem with the Catalyst callbacks (DiceCallback(),InferCallback()). The RAM memory keep increasing along with the training iterations until it reaches the kernel limits. It seems that it is a memory leak somewhere or the I am not using them right

**Acknowledgements and Inspirations**  

A lot of thanks to Andrew Lukyanenko. His great kernel was a source of inspiration for general aproach and a lot of usefull functions(https://www.kaggle.com/artgor/segmentation-in-pytorch-using-convenient-tools#Exploring-augmentations-with-albumentations) 
  
**Next steps**  

Do more data visualization and analytics(cloud types corellations per picture), Aleksandra Deis has a great kernel on data exploration and visualization (https://www.kaggle.com/aleksandradeis/understanding-clouds-eda)  
Try other data augmentation (Affine, Fliplr, ElasticTransformation, etc.)  
Try new network arhitectures  
Implement the same design using PyTorch from scratch(Dhananjay Raut has a great kernel on this: https://www.kaggle.com/dhananjay3/image-segmentation-from-scratch-in-pytorch)

**Best score so far**  

0.6331

In [None]:
!pip install git+https://github.com/qubvel/segmentation_models.pytorch

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import random
from torch.utils.data import Dataset
import os
import cv2
import albumentations as albu
from sklearn.model_selection import train_test_split
import segmentation_models_pytorch as smp
from torch.utils.data import DataLoader
import torch
from catalyst.dl.runner import SupervisedRunner
from catalyst.dl.callbacks import DiceCallback, InferCallback, OptimizerCallback,CriterionCallback,CheckpointCallback,JaccardCallback,IouCallback
from torch.optim.lr_scheduler import ReduceLROnPlateau
import tqdm
from catalyst.contrib.criterion import DiceLoss, IoULoss, FocalLossBinary
import torch.nn as nn
import pandas as pd
from catalyst.dl import utils
import gc
from torch.optim.optimizer import Optimizer, required
import math

In [None]:
#plot descriptive stats
def DescriptiveStats(trainData):
    #check how many fish,flower,gravel and sugar valid data are in the training set
    occuranceDict={"Fish":0,"Flower":0,"Gravel":0,"Sugar":0}
    for i in range(len(trainData['EncodedPixels'])):
        if (pd.isnull(trainData['EncodedPixels'][i]))==False:
            cloudType=trainData["Image_Label"][i].split('_')[1]
            occuranceDict[cloudType]+=1   
    print("how many fish,flower,gravel and sugar valid data are in the training set")
    labels = occuranceDict.keys()
    sizes = occuranceDict.values()
    explode = (0.1, 0.1, 0.1, 0.1) 
    
    fig1, ax1 = plt.subplots()
    ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
            shadow=True, startangle=90)
    ax1.axis('equal')  
    plt.show()
    print("how many clouds are per picture")
    #check how many clouds are per picture in the training set
    occurancePerPic=trainData.loc[trainData['EncodedPixels'].isnull() == False, 'Image_Label'].apply(lambda x: x.split('_')[0]).value_counts().value_counts()
    labels2 = occurancePerPic.keys() 
    fig2, ax2 = plt.subplots()
    ax2.pie(occurancePerPic, explode=explode, labels=labels2, autopct='%1.1f%%',
            shadow=True, startangle=90)
    ax2.axis('equal')  
    plt.show()

In [None]:
#show a specific image
def ShowImg(trainData,imgName):
    image = Image.open(path+"/train_images/"+imgName)
    print("Original Img")
    plt.imshow(image)
    plt.show()
    ss=trainData.loc[trainData['im_id']==imgName, 'EncodedPixels']
    for row in ss:
        print(trainData.loc[trainData['EncodedPixels']==row, "label"])
        try: # label might not be there!
            mask = rle_decode(row)
        except Exception as exception:
            mask = np.zeros((1400, 2100))
            continue
            
        plt.imshow(image)
        plt.imshow(mask, alpha=0.3, cmap='gray')
        plt.show()

In [None]:
# create a custom loss (total loss=alfa*IoULoss + beta*DiceLoss + gamma*FocalLossBinary)
class CustomLoss(nn.Module):
    def __init__(self, alpha=.7, beta=1.5, gamma=.4):
        super().__init__()
        self.alpha=alpha
        self.beta=beta
        self.gamma=gamma
        self.lossIOU=IoULoss()
        self.lossDice=DiceLoss()
        self.lossFocal=FocalLossBinary()
    def forward(self, input, target):
        loss=self.alpha*self.lossIOU(input.cpu(), target.cpu()) + self.beta*self.lossDice(input.cpu(), target.cpu()) + self.beta*self.lossFocal(input.cpu(), target.cpu())
        return loss.mean()/(alpha+beta+gamma)

In [None]:
class RAdam(Optimizer):

    def __init__(self, params, lr=2*1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0, degenerated_to_sgd=True):
        if not 0.0 <= lr:
            raise ValueError("Invalid learning rate: {}".format(lr))
        if not 0.0 <= eps:
            raise ValueError("Invalid epsilon value: {}".format(eps))
        if not 0.0 <= betas[0] < 1.0:
            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
        if not 0.0 <= betas[1] < 1.0:
            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))
        
        self.degenerated_to_sgd = degenerated_to_sgd
        if isinstance(params, (list, tuple)) and len(params) > 0 and isinstance(params[0], dict):
            for param in params:
                if 'betas' in param and (param['betas'][0] != betas[0] or param['betas'][1] != betas[1]):
                    param['buffer'] = [[None, None, None] for _ in range(10)]
        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, buffer=[[None, None, None] for _ in range(10)])
        super(RAdam, self).__init__(params, defaults)

    def __setstate__(self, state):
        super(RAdam, self).__setstate__(state)

    def step(self, closure=None):

        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:

            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data.float()
                if grad.is_sparse:
                    raise RuntimeError('RAdam does not support sparse gradients')

                p_data_fp32 = p.data.float()

                state = self.state[p]

                if len(state) == 0:
                    state['step'] = 0
                    state['exp_avg'] = torch.zeros_like(p_data_fp32)
                    state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
                else:
                    state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
                    state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)

                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']

                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                exp_avg.mul_(beta1).add_(1 - beta1, grad)

                state['step'] += 1
                buffered = group['buffer'][int(state['step'] % 10)]
                if state['step'] == buffered[0]:
                    N_sma, step_size = buffered[1], buffered[2]
                else:
                    buffered[0] = state['step']
                    beta2_t = beta2 ** state['step']
                    N_sma_max = 2 / (1 - beta2) - 1
                    N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
                    buffered[1] = N_sma

                    # more conservative since it's an approximated value
                    if N_sma >= 5:
                        step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
                    elif self.degenerated_to_sgd:
                        step_size = 1.0 / (1 - beta1 ** state['step'])
                    else:
                        step_size = -1
                    buffered[2] = step_size

                # more conservative since it's an approximated value
                if N_sma >= 5:
                    if group['weight_decay'] != 0:
                        p_data_fp32.add_(-group['weight_decay'] * group['lr'], p_data_fp32)
                    denom = exp_avg_sq.sqrt().add_(group['eps'])
                    p_data_fp32.addcdiv_(-step_size * group['lr'], exp_avg, denom)
                    p.data.copy_(p_data_fp32)
                elif step_size > 0:
                    if group['weight_decay'] != 0:
                        p_data_fp32.add_(-group['weight_decay'] * group['lr'], p_data_fp32)
                    p_data_fp32.add_(-step_size * group['lr'], exp_avg)
                    p.data.copy_(p_data_fp32)

        return loss

In [None]:
#Create mask based on df, image name and shape
def make_mask(df, image_name= 'img.jpg',
              shape= (1400, 2100)):
    encoded_masks = df.loc[df['im_id'] == image_name, 'EncodedPixels']
    masks = np.zeros((shape[0], shape[1], 4), dtype=np.float32)

    for idx, label in enumerate(encoded_masks.values):
        if label is not np.nan:
            mask = rle_decode(label)
            masks[:, :, idx] = mask

    return masks

In [None]:
#Decode rle encoded mask    
def rle_decode(mask_rle='', shape=(1400, 2100)):
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int)
                       for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape, order='F')

In [None]:
def mask2rle(img):
    pixels = img.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

In [None]:
#reshape after using albumentation
def ConvertToTensorFormat(x, **kwargs):
    return x.transpose(2, 0, 1).astype('float32')

In [None]:
def post_process(probability, threshold, min_size):
    mask = cv2.threshold(probability, threshold, 1, cv2.THRESH_BINARY)[1]
    num_component, component = cv2.connectedComponents(mask.astype(np.uint8))
    predictions = np.zeros((350, 525), np.float32)
    num = 0
    for c in range(1, num_component):
        p = (component == c)
        if p.sum() > min_size:
            predictions[p] = 1
            num += 1
    return predictions, num

In [None]:
def dice(img1, img2):
    img1 = np.asarray(img1).astype(np.bool)
    img2 = np.asarray(img2).astype(np.bool)

    intersection = np.logical_and(img1, img2)

    return 2. * intersection.sum() / (img1.sum() + img2.sum())

def sigmoid(x): return 1/(1+np.exp(-x))

In [None]:
class CloudDataset(Dataset):
    def __init__(self,data,dataSetType,transforms,img_ids,preprocessing):
        self.data=data
        self.preprocessing=preprocessing
        self.transforms=transforms
        self.img_ids=img_ids
        if (dataSetType=="train"):
            self.imgFolder=path+"/train_images"
        if (dataSetType=="test"):
            self.imgFolder=path+"/test_images"

    def __getitem__(self, idx):
        image_name = self.img_ids[idx]

        mask = make_mask(self.data, image_name)
        image_path = os.path.join(self.imgFolder, image_name)
        
        img = cv2.imread(image_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        augmented = self.transforms(image=img, mask=mask)
        img = augmented['image']
        mask = augmented['mask']
        
        if self.preprocessing:
            preprocessed = self.preprocessing(image=img, mask=mask)
            img = preprocessed['image']
            mask = preprocessed['mask']
            
            
        return img, mask

    def __len__(self):
        return len(self.img_ids)      

In [None]:
#preprocess for specific network used
def get_preprocessing(preprocessing_fn=None):
    if preprocessing_fn is not None:
        _transform = [
            albu.Lambda(image=preprocessing_fn),
            albu.Lambda(image=ConvertToTensorFormat, mask=ConvertToTensorFormat),
        ]
    else:
        _transform = [
            albu.Normalize(),
            albu.Lambda(image=ConvertToTensorFormat, mask=ConvertToTensorFormat),
        ]
    return albu.Compose(_transform)
def get_training_augmentation(p=0.5):
    train_transform = [
        albu.Resize(320, 640),
        albu.HorizontalFlip(p=0.25),
        albu.VerticalFlip(p=0.25),
        albu.ShiftScaleRotate(scale_limit=0.5, rotate_limit=0, shift_limit=0.1, p=0.5, border_mode=0),
        albu.GridDistortion(p=0.25)
    ]
    return albu.Compose(train_transform)

#for validation dataset it is just resize
def get_validation_augmentation():
    train_transform = [
        albu.Resize(320, 640)
    ]
    return albu.Compose(train_transform)

In [None]:
path = '../input/understanding_cloud_organization'

#read files
trainData = pd.read_csv(path+'/train.csv')
subSample = pd.read_csv(path+'/sample_submission.csv')

#We can see that are not major imbalance in the dataset
DescriptiveStats(trainData)

#rearrange dataframe
trainData['label'] = trainData['Image_Label'].apply(lambda x: x.split('_')[1])
trainData['im_id'] = trainData['Image_Label'].apply(lambda x: x.split('_')[0])

subSample['label'] = subSample['Image_Label'].apply(lambda x: x.split('_')[1])
subSample['im_id'] = subSample['Image_Label'].apply(lambda x: x.split('_')[0])

In [None]:
#plot a random image
imageToPlot=random.choice(trainData['im_id'])
ShowImg(trainData,imageToPlot)


In [None]:
uniqueImgId=trainData.im_id.unique()
#split in train-test
train_ids, valid_ids = train_test_split(
        uniqueImgId,
        random_state=142,
        test_size=0.08)
#using efficientnet-b3 with imagenet weights
ENCODER = 'se_resnext101_32x4d'
ENCODER_WEIGHTS = 'imagenet'
DEVICE = 'cuda'
preprocessing_fn = smp.encoders.get_preprocessing_fn(ENCODER, ENCODER_WEIGHTS)
num_workers = 0
bs = 6

ACTIVATION = None
model = smp.FPN(
    encoder_name=ENCODER, 
    encoder_weights=ENCODER_WEIGHTS, 
    classes=4, 
    activation=ACTIVATION,
)

In [None]:
train_dataset = CloudDataset(data=trainData, dataSetType='train', img_ids=train_ids, transforms = get_training_augmentation(), preprocessing=get_preprocessing(preprocessing_fn))
valid_dataset = CloudDataset(data=trainData, dataSetType='train', img_ids=valid_ids, transforms = get_validation_augmentation(), preprocessing=get_preprocessing(preprocessing_fn))

train_loader = DataLoader(train_dataset, batch_size=bs, shuffle=True, num_workers=num_workers)
valid_loader = DataLoader(valid_dataset, batch_size=bs, shuffle=False, num_workers=num_workers)
loaders = {
    "train": train_loader,
    "valid": valid_loader
}

In [None]:
num_epochs = 50
logdir = "./logs/CloudsSegmentation/"

optimizer = RAdam(model.parameters())
            
#criterion=CustomLoss()
criterion=smp.utils.losses.BCEDiceLoss()
scheduler = ReduceLROnPlateau(optimizer, factor=0.3, patience=5)

runner = SupervisedRunner()
torch.cuda.empty_cache()
gc.collect()    

In [None]:
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=[CriterionCallback(),OptimizerCallback(accumulation_steps =3)],
    logdir=logdir,
    num_epochs=num_epochs,
    verbose=True
)

In [None]:
#load best checkpoint

encoded_pixels = []
loaders = {"infer": valid_loader}
runner.infer(
    model=model,
    loaders=loaders,
    callbacks=[
        CheckpointCallback(
            resume=f"{logdir}/checkpoints/best.pth"),
        InferCallback()
    ],
)

In [None]:
resizedMasks=[]
probabilities = np.zeros((len(valid_dataset)*4, 350, 525))
# for each valid set and prediction on valid set:
# the predictions should be scaled down to a 350 x 525 pixel image
# make a resizedMasks of the valid elements
#make a probabilities mask with 4*len(valid_dataset) (labels)
for i in range(len(valid_dataset)):
    batch=valid_dataset[i]
    output=runner.callbacks[0].predictions["logits"][i]
    image, mask = batch
    for m in mask:
        m = cv2.resize(m, dsize=(525, 350), interpolation=cv2.INTER_LINEAR)
        resizedMasks.append(m)
        
    for j in range(len(output)):
        probability = cv2.resize(output[j], dsize=(525, 350), interpolation=cv2.INTER_LINEAR)
        probabilities[i * 4 + j, :, :] = probability

In [None]:
#find the optimum threshold for each class

class_params = {}
for class_id in range(4):
    print("Calculating optimum threshold for class:",class_id)
    attempts = []
    for t in range(300, 1000, 5):
        t /= 100
        for ms in [9000,10000,11000,12000,13000,14000,15000,16000,17000,18000,19000,20000,21000,22000,23000,24000,25000]:
            masks = []
            for i in range(class_id, len(probabilities), 4):
                probability = probabilities[i]
                predict, num_predict = post_process(sigmoid(probability), t, ms)
                masks.append(predict)

            d = []
            for i, j in zip(masks, resizedMasks[class_id::4]):
                if (i.sum() == 0) & (j.sum() == 0):
                    d.append(1)
                else:
                    d.append(dice(i, j))

            attempts.append((t, ms, np.mean(d)))

    attempts_df = pd.DataFrame(attempts, columns=['threshold', 'size', 'dice'])


    attempts_df = attempts_df.sort_values('dice', ascending=False)
    print(attempts_df.head())
    best_threshold = attempts_df['threshold'].values[0]
    best_size = attempts_df['size'].values[0]
    
    class_params[class_id] = (best_threshold, best_size)
        
        
print(class_params)     

In [None]:
del probabilities
del resizedMasks
del attempts_df
torch.cuda.empty_cache()
gc.collect()   

In [None]:
torch.cuda.empty_cache()
gc.collect()    

test_ids = subSample['Image_Label'].apply(lambda x: x.split('_')[0]).drop_duplicates().values

test_dataset = CloudDataset(data=subSample, dataSetType='test', img_ids=test_ids, transforms = get_validation_augmentation(), preprocessing=get_preprocessing(preprocessing_fn))
test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0)

loaders = {"test": test_loader}  


encoded_pixels = []
image_id = 0
for i, test_batch in enumerate(tqdm.tqdm(loaders['test'])):
    runner_out = runner.predict_batch({"features": test_batch[0].cuda()})['logits']
    for i, batch in enumerate(runner_out):
        for probability in batch:
            
            probability = probability.cpu().detach().numpy()
            if probability.shape != (350, 525):
                probability = cv2.resize(probability, dsize=(525, 350), interpolation=cv2.INTER_LINEAR)
            predict, num_predict = post_process(sigmoid(probability), class_params[image_id % 4][0], class_params[image_id % 4][1])
            if num_predict == 0:
                encoded_pixels.append('')
            else:
                r = mask2rle(predict)
                encoded_pixels.append(r)
            image_id += 1

In [None]:
subSample['EncodedPixels'] = encoded_pixels
subSample.to_csv('sub23k.csv', columns=['Image_Label', 'EncodedPixels'], index=False)