### Overview
The primary goal of this competition is identification and segmentation of chest radiographic images with pneumothorax. In this kernel a U-net based approach is used, which provides end-to-end framework for image segmentation. In prior image segmentation competitions ([Airbus Ship Detection Challenge](https://www.kaggle.com/c/airbus-ship-detection/discussion) and [TGS Salt Identification Challenge](https://www.kaggle.com/c/tgs-salt-identification-challenge)), U-net based model architecture has demonstrated supperior performence, and top solutions are based on it. The current competition is similar to TGS Salt Identification Challenge in terms of identifying the correct mask based on visual inspection of images. Therefore, I have tried a technique that was extremely effective in Satl competition - [Hypercolumns](https://towardsdatascience.com/review-hypercolumn-instance-segmentation-367180495979).

As a starting point [this public kernel](https://www.kaggle.com/mnpinto/pneumothorax-fastai-u-net) is used, and the following things are added (see text below for mode details):
* Hypercolumns
* Gradient accumulation
* TTA based on horizontal flip
* Noise removal (if the predicted mask contains too few pixels, it is assumed to be empty)
* Image equilibration

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
# os.environ['CUDA_VISIBLE_DEVICES']='0,1,2,3'
os.environ['CUDA_VISIBLE_DEVICES']='0'
import sys
sys.path.insert(0, '../input/siim-acr-pneumothorax-segmentation')

import fastai
from fastai.vision import *
from mask_functions import *
from fastai.callbacks import SaveModelCallback
import gc
from sklearn.model_selection import KFold
from PIL import Image

fastai.__version__

The original images, provided in this competition, have 1024x1024 resolution. To prevent additional overhead on image loading, the datasets composed of 128x128 and 256x256 scaled down images are prepared separately and used as an input. Check [this keknel](https://www.kaggle.com/iafoss/data-repack-and-image-statistics) for more details on image rescaling and mask generation. Also In that kernel I apply image normalization based on histograms (exposure.equalize_adapthist) that provides some improvement of image appearance as well as a small boost of the model performance. The corresponding pixel statistics are computed in the kernel.

In [None]:
sz = 1024
bs = 2
n_acc = 64//bs #gradinet accumulation steps
nfolds = 4
SEED = 2019

#eliminate all predictions with a few (noise_th) pixesls
noise_th = 50.0*(sz/128.0)**2 #threshold for the number of predicted pixels
best_thr0 = 0.2 #preliminary value of the threshold for metric calculation

if sz == 256:
    stats = ([0.540,0.540,0.540],[0.264,0.264,0.264])
    TRAIN = '../input/siimacr-pneumothorax-segmentation-data-256/train'
    TEST = '../input/siimacr-pneumothorax-segmentation-data-256/test'
    MASKS = '../input/siimacr-pneumothorax-segmentation-data-256/masks'
elif sz == 128:
    stats = ([0.615,0.615,0.615],[0.291,0.291,0.291])
    TRAIN = '../input/siimacr-pneumothorax-segmentation-data-128/train'
    TEST = '../input/siimacr-pneumothorax-segmentation-data-128/test'
    MASKS = '../input/siimacr-pneumothorax-segmentation-data-128/masks'
elif sz ==512:
#     mean: 0.5292001691897411 , std: 0.2588056436836543
#     mean: 0.5265718130841701 , std: 0.2589975116265956
    stats = ([0.5292,0.5292,0.5292],[0.2588,0.2588,0.2588])
    TRAIN = '../input/siimacr-pneumothorax-segmentation-data-512/train'
    TEST = '../input/siimacr-pneumothorax-segmentation-data-512/test'
    MASKS = '../input/siimacr-pneumothorax-segmentation-data-512/masks'
elif sz ==1024:
    
#     mean: 0.521344684992593 , std: 0.2542653719896922
#     mean: 0.5188847691560139 , std: 0.2546409761578348
    stats = ([0.521,0.521,0.521],[0.254,0.254,0.254])
    TRAIN = '../input/siimacr-pneumothorax-segmentation-data-1024/train'
    TEST = '../input/siimacr-pneumothorax-segmentation-data-1024/test'
    MASKS = '../input/siimacr-pneumothorax-segmentation-data-1024/masks'

# copy pretrained weights for resnet34 to the folder fastai will search by default
Path('/tmp/.cache/torch/checkpoints/').mkdir(exist_ok=True, parents=True)
# !cp '../input/resnet34/resnet34.pth' '/tmp/.cache/torch/checkpoints/resnet34-333f7ec4.pth'

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    #tf.set_random_seed(seed)
seed_everything(SEED)

In [None]:
from fastai.vision.learner import create_head, cnn_config, num_features_model, create_head#, Hook
from fastai.callbacks.hooks import model_sizes, hook_outputs, dummy_eval, Hook, _hook_inner
from fastai.vision.models.unet import _get_sfs_idxs, UnetBlock

class Hcolumns(nn.Module):
    def __init__(self, hooks:Collection[Hook], nc:Collection[int]=None):
        super(Hcolumns,self).__init__()
        self.hooks = hooks
        self.n = len(self.hooks)
        self.factorization = None 
        if nc is not None:
            self.factorization = nn.ModuleList()
            for i in range(self.n):
                self.factorization.append(conv2d(nc[i],nc[-1],1,bias=True))
        
    def forward(self, x:Tensor):
        n = len(self.hooks)
        out = [F.interpolate(self.hooks[i].stored if self.factorization is None
                    else self.factorization[i](self.hooks[i].stored), scale_factor=2**(self.n-i),
                    mode='bilinear',align_corners=False) for i in range(self.n)] + [x]
        return torch.cat(out, dim=1)

class DynamicUnet_Hcolumns(SequentialEx):
    "Create a U-Net from a given architecture."
    def __init__(self, encoder:nn.Module, n_classes:int, blur:bool=False, blur_final=True, 
                 self_attention:bool=False,
                 y_range:Optional[Tuple[float,float]]=None,
                 last_cross:bool=True, bottle:bool=False, **kwargs):
        imsize = (sz,sz)
        sfs_szs = model_sizes(encoder, size=imsize)
        sfs_idxs = list(reversed(_get_sfs_idxs(sfs_szs)))
        self.sfs = hook_outputs([encoder[i] for i in sfs_idxs])
        x = dummy_eval(encoder, imsize).detach()

        ni = sfs_szs[-1][1]
        middle_conv = nn.Sequential(conv_layer(ni, ni*2, **kwargs),
                                    conv_layer(ni*2, ni, **kwargs)).eval()
        x = middle_conv(x)
        layers = [encoder, batchnorm_2d(ni), nn.ReLU(), middle_conv]

        self.hc_hooks = []
        hc_c = []
        self.hc_hooks.append(Hook(layers[-1], _hook_inner, detach=False))
        hc_c.append(x.shape[1])
        
        for i,idx in enumerate(sfs_idxs):
            not_final = i!=len(sfs_idxs)-1
            up_in_c, x_in_c = int(x.shape[1]), int(sfs_szs[idx][1])
            do_blur = blur and (not_final or blur_final)
            sa = self_attention and (i==len(sfs_idxs)-3)
            unet_block = UnetBlock(up_in_c, x_in_c, self.sfs[i], final_div=not_final, blur=blur, self_attention=sa,
                                   **kwargs).eval()
            layers.append(unet_block)
            x = unet_block(x)
            self.hc_hooks.append(Hook(layers[-1], _hook_inner, detach=False))
            hc_c.append(x.shape[1])

        ni = x.shape[1]
        if imsize != sfs_szs[0][-2:]: layers.append(PixelShuffle_ICNR(ni, **kwargs))
        if last_cross:
            layers.append(MergeLayer(dense=True))
            ni += in_channels(encoder)
            layers.append(res_block(ni, bottle=bottle, **kwargs))
        hc_c.append(ni)
        layers.append(Hcolumns(self.hc_hooks, hc_c))
        layers += [conv_layer(ni*len(hc_c), 1, ks=1, use_activ=False, **kwargs)]
        if y_range is not None: layers.append(SigmoidRange(*y_range))
        super().__init__(*layers)

    def __del__(self):
        if hasattr(self, "sfs"): self.sfs.remove()
            
def unet_learner(data:DataBunch, arch:Callable, pretrained:bool=True, blur_final:bool=True,
                 norm_type:Optional[NormType]=NormType, split_on:Optional[SplitFuncOrIdxList]=None, blur:bool=False,
                 self_attention:bool=False, y_range:Optional[Tuple[float,float]]=None, last_cross:bool=True,
                 bottle:bool=False, cut:Union[int,Callable]=None, hypercolumns=True, **learn_kwargs:Any)->Learner:
    "Build Unet learner from `data` and `arch`."
    meta = cnn_config(arch)
    body = create_body(arch, pretrained, cut)
    M = DynamicUnet_Hcolumns if hypercolumns else DynamicUnet
#     model = to_device(M(body, n_classes=data.c, blur=blur, blur_final=blur_final,
#           self_attention=self_attention, y_range=y_range, norm_type=norm_type, last_cross=last_cross,
#           bottle=bottle), data.device)
    model = to_device(M(body, n_classes=data.c, blur=blur, blur_final=blur_final,
          self_attention=self_attention, y_range=y_range, norm_type=norm_type, last_cross=last_cross,
          bottle=bottle),data.device)
    learn = Learner(data, model, **learn_kwargs)
    learn.split(ifnone(split_on, meta['split']))
    if False:
        learn.model = torch.nn.DataParallel(learn.model)
    
    if pretrained: learn.freeze()
    apply_init(model[2], nn.init.kaiming_normal_)
    return learn

In [None]:
# Setting div=True in open_mask
class SegmentationLabelList(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)
    
class SegmentationItemList(SegmentationItemList):
    _label_cls = SegmentationLabelList

# Setting transformations on masks to False on test set
def transform(self, tfms:Optional[Tuple[TfmList,TfmList]]=(None,None), **kwargs):
    if not tfms: tfms=(None,None)
    assert is_listy(tfms) and len(tfms) == 2
    self.train.transform(tfms[0], **kwargs)
    self.valid.transform(tfms[1], **kwargs)
    kwargs['tfm_y'] = False # Test data has no labels
    if self.test: self.test.transform(tfms[1], **kwargs)
    return self
fastai.data_block.ItemLists.transform = transform

### Model

The model used in this kernel is based on U-net like architecture with ResNet34 encoder. To boost the model performance, Hypercolumns are incorporated into DynamicUnet fast.ai class (see code below). The idea of Hypercolumns is schematically illustrated in the following figure. ![](https://i.ibb.co/3y7f8rj/Hypercolumns1.png)
Each upscaling block is connected to the output layer through linear resize to the original image size. So the final image is produced based on concatenation of U-net output with resized outputs of intermediate layers. These skip-connections provide a shortcut for gradient flow improving model performance and convergence speed. Since intermediate layers have many channels, their upscaling and use as an input for the final layer would introduce a significant overhead in terms the computational time and memory. Therefore, 1x1 convolutions are applied (factorization) before the resize to reduce the number of channels.
Further details on Hypercolumns can be found [here](http://home.bharathh.info/pubs/pdfs/BharathCVPR2015.pdf) and [here](https://towardsdatascience.com/review-hypercolumn-instance-segmentation-367180495979).

Accumulation of gradients to overcome the problem of too small batches. The code is mostly based on [this post](https://forums.fast.ai/t/accumulating-gradients/33219/25) with slight adjustment to work with mean reduction.

In [None]:
class AccumulateOptimWrapper(OptimWrapper):
    def step(self):          pass
    def zero_grad(self):      pass
    def real_step(self):      super().step()
    def real_zero_grad(self): super().zero_grad()
        
def acc_create_opt(self, lr:Floats, wd:Floats=0.):
        "Create optimizer with `lr` learning rate and `wd` weight decay."
        self.opt = AccumulateOptimWrapper.create(self.opt_func, lr, self.layer_groups,
                                         wd=wd, true_wd=self.true_wd, bn_wd=self.bn_wd)
Learner.create_opt = acc_create_opt   

@dataclass
class AccumulateStep(LearnerCallback):
    """
    Does accumlated step every nth step by accumulating gradients
    """
    def __init__(self, learn:Learner, n_step:int = 1):
        super().__init__(learn)
        self.n_step = n_step

    def on_epoch_begin(self, **kwargs):
        "init samples and batches, change optimizer"
        self.acc_batches = 0
        
    def on_batch_begin(self, last_input, last_target, **kwargs):
        "accumulate samples and batches"
        self.acc_batches += 1
        #print(f"At batch {self.acc_batches}")
        
    def on_backward_end(self, **kwargs):
        "step if number of desired batches accumulated, reset samples"
        if (self.acc_batches % self.n_step) == self.n_step - 1:
            for p in (self.learn.model.parameters()):
                if p.requires_grad: p.grad.div_(self.acc_batches)
    
            #print(f"Stepping at batch: {self.acc_batches}")
            self.learn.opt.real_step()
            self.learn.opt.real_zero_grad()
            self.acc_batches = 0
    
    def on_epoch_end(self, **kwargs):
        "step the rest of the accumulated grads"
        if self.acc_batches > 0:
            for p in (self.learn.model.parameters()):
                if p.requires_grad: p.grad.div_(self.acc_batches)
            self.learn.opt.real_step()
            self.learn.opt.real_zero_grad()
            self.acc_batches = 0

A slight modification of the default dice metric to make it comparable with the competition metric: dice is computed for each image independently, and dice of empty image with zero prediction is 1. Also I use noise removal and similar threshold as in my prediction pipline.

In [None]:
def dice(input:Tensor, targs:Tensor, iou:bool=False, eps:float=1e-8)->Rank0Tensor:
    "Dice coefficient metric for binary target. If iou=True, returns iou metric, classic for segmentation problems."
    n = targs.shape[0]
    if mode =='bce':
        input = torch.softmax(input, dim=1)[:,1,...].view(n,-1)
    else:
        input = torch.sigmoid(input).view(n, -1)
    input = (input > best_thr0).long()
    input[input.sum(-1) < noise_th,...] = 0.0 
    #input = input.argmax(dim=1).view(n,-1)
    targs = targs.view(n,-1)
    intersect = (input * targs).sum(-1).float()
    union = (input+targs).sum(-1).float()
    if not iou: return ((2.0*intersect + eps) / (union+eps)).mean()
    else: return ((intersect + eps) / (union - intersect + eps)).mean()

In [None]:
#dice for threshold selection
def dice_overall(preds, targs):
    n = preds.shape[0]
    preds = preds.view(n, -1)
    targs = targs.view(n, -1)
    intersect = (preds * targs).sum(-1).float()
    union = (preds+targs).sum(-1).float()
    u0 = union==0
    intersect[u0] = 1
    union[u0] = 2
    return (2. * intersect / union)

The following function generates predictions with using flip TTA (average the result for the original image and a flipped one).

In [None]:
# Prediction with flip TTA
def pred_with_flip(learn:fastai.basic_train.Learner,
                   ds_type:fastai.basic_data.DatasetType=DatasetType.Valid):
    #get prediction
    preds, ys = learn.get_preds(ds_type)
    if False:
        preds = preds[:,1,...]
    else:
        preds = torch.sigmoid(preds)
    #add fiip to dataset and get prediction
    learn.data.dl(ds_type).dl.dataset.tfms.append(flip_lr())
    preds_lr, ys = learn.get_preds(ds_type)
    del learn.data.dl(ds_type).dl.dataset.tfms[-1]
    if False:
        preds_lr = preds_lr[:,1,...]
    else:
        preds_lr = torch.sigmoid(preds_lr)
    ys = ys.squeeze()
    preds = 0.5*(preds + torch.flip(preds_lr,[-1]))
    del preds_lr
    gc.collect()
    torch.cuda.empty_cache()
    return preds, ys

In [None]:
! ls ../input/ids-for-test/items_test_dgx01_1024

### Data

In [None]:
def get_data(fold):
    kf = KFold(n_splits=nfolds, shuffle=True, random_state=SEED)
    valid_idx = list(kf.split(list(range(len(Path(TRAIN).ls())))))[fold][1]
    segs = pickle.load(open('../input/folder-list/cv_list','rb'))
    
    # Create databunch
    data = (#SegmentationItemList.from_folder(TRAIN)
            segs
            .split_by_idx(valid_idx)
            .label_from_func(lambda x : str(x).replace('train', 'masks'), classes=[0,1])
            .add_test(pickle.load(open('../input/ids-for-test/items_test_dgx01_1024','rb')), label=None)
#             .add_test(Path(TEST).ls(), label=None)
            .transform(get_transforms(), size=sz, tfm_y=True)
            .databunch(path=Path('.'), bs=bs)
            .normalize(stats))
    return data

# Display some images with masks
get_data(0).show_batch()

### Training

Expand the following cell to see the model printout. The model is based on Unet like architecture with ResNet34 based pretrained encoder. The upscaling is based on [pixel shuffling technique](https://arxiv.org/pdf/1609.05158.pdf). On the top, hypercolumns are added to provide additional skip-connections between the upscaling blocks and the output.

In [None]:
unet_learner(get_data(0), models.resnet34, metrics=[dice]).model

In [None]:
mode = 'bce_soft_dice'
if mode == 'weighted_bce':
    def criterion_pixel(logit_pixel, truth_pixel):
        logit = logit_pixel.view(-1)
        truth = truth_pixel.view(-1).float()
#         print(logit.shape, truth.shape)
        assert(logit.shape == truth.shape)

        loss = F.binary_cross_entropy_with_logits(logit, truth, reduction='none')
        if 0:
            loss = loss.mean()
        if 1:
            pos = (truth > 0.5).float()
            neg = (truth < 0.5).float()
            pos_weight = pos.sum().item() + 1e-12
            neg_weight = neg.sum().item() + 1e-12
            loss = (0.25* pos*loss/pos_weight + 0.75*neg*loss/neg_weight).sum()
        return loss
elif mode == 'bce_soft_dice':
    def soft_dice_criterion(logit, truth, weight=[0.5, 0.5]):
        batch_size = len(logit)
        probability = torch.sigmoid(logit)
        p = probability.view(batch_size, -1)
        t = truth.view(batch_size, -1)
        w = truth.detach()
        w = w*(weight[1] - weight[0])+weight[0]
        
        p = w*(p*2-1) # convert to [0,1] --> [-1,1]
        t = w*(t*2-1)
        
        intersection = (p*t).sum(-1)
        union = (p*p).sum(-1) + (t*t).sum(-1)
        dice = 1 - 2* intersection/union
        
        loss = dice
        return loss
        
    def criterion_pixel(logit_pixel, truth_pixel):
        batch_size = len(logit_pixel)
        logit = logit_pixel.view(batch_size, -1)
        truth = truth_pixel.view(batch_size,-1).float()
        assert(logit.shape==truth.shape)
        
        loss = soft_dice_criterion(logit, truth)
        
        loss1 = loss.mean()
        
        logit = logit_pixel.view(-1)
        truth = truth_pixel.view(-1).float()
#         print(logit.shape, truth.shape)
        assert(logit.shape == truth.shape)

        loss = F.binary_cross_entropy_with_logits(logit, truth, reduction='none')
        loss2 = loss.mean()
        loss = (loss1 + loss2) /2
        return loss
    
elif mode == 'lovasz':
    def lovasz_loss(logit, truth, margin=[1,5]):
        def compute_lovasz_gradient(truth):
            truth_sum = truth.sum()
            intersection = truth_sum - truth.cumsum(0)
            union = truth_sum + (1-truth).cumsum(0)
            jaccard = 1. - intersection / union
            T = len(truth)
            jaccard[1:T] = jaccard[1:T] - jaccard[0:T-1]

            gradient = jaccard
            return gradient

        def lovasz_hinge_one(logit, truth):
            m = truth.detach()
            m = m*(margin[1] - margin[0])+margin[0]

            truth = truth.float()
            sign = 2. * truth-1
            hinge = (m - logit * sign)
            hinge, permutation = torch.sort(hinge, dim=0, descending= True)
            hinge = F.relu(hinge)

            truth = truth[permutation.data]
            gradient = compute_lovasz_gradient(truth)

            loss = torch.dot(hinge, gradient)
            return loss
        
        lovasz_one = lovasz_hinge_one
        
        batch_size = len(truth)
        loss = torch.zeros(batch_size).cuda()
        for b in range(batch_size):
            l, t = logit[b].view(-1), truth[b].view(-1)
            loss[b] = lovasz_one(l, t)
        return loss

    def criterion_pixel(logit_pixel, truth_pixel):
        batch_size = len(logit_pixel)
        logit = logit_pixel.view(batch_size, -1)
        truth = truth_pixel.view(batch_size,-1).float()
        assert(logit.shape==truth.shape)

        loss = lovasz_loss(logit, truth)

        loss = loss.mean()
        return loss
elif mode == 'bce_sigmoid':
    def criterion_pixel(logit_pixel, truth_pixel):
        logit = logit_pixel.view(-1)
        truth = truth_pixel.view(-1).float()
#         print(logit.shape, truth.shape)
        assert(logit.shape == truth.shape)

        loss = F.binary_cross_entropy_with_logits(logit, truth, reduction='none')
        loss = loss.mean()
        return loss        
else:
    mode = None
    

In [None]:
gc.collect()
torch.cuda.empty_cache()

In [None]:
scores, best_thrs = [],[]
# first phase for bce then lovasz??
fold = 0
if True:
# for fold in range(nfolds):
    print('fold: ', fold)
    data = get_data(fold)
    learn = unet_learner(data, models.resnet34, metrics=[dice])
    print(mode)
    if mode is not None:
        learn.loss_func = criterion_pixel
    else:
        mode = 'bce'
    
    epochpost=12
    
    learn.clip_grad(1.0);
    
    #fit the decoder part of the model keeping the encode frozen
    lr = 7e-3
    epochpre = 6
#     learn.load('resnet34_imsizefixed_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(sz,fold,lr,epochpre,epochpost))
    learn.fit_one_cycle(epochpre, lr, callbacks = [AccumulateStep(learn,n_acc)])
    
    #fit entire model with saving on the best epoch
    learn.unfreeze()
#     learn.fit_one_cycle(12, slice(lr/80, lr/2), callbacks = [AccumulateStep(learn,n_acc)])
    learn.fit_one_cycle(epochpost, slice(lr/80, lr/2), callbacks = [
        AccumulateStep(learn,n_acc), 
        SaveModelCallback(learn, 
                          monitor='dice', 
                          mode='max',
                          name='resnet34_imsizefixed_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))])
    learn.save('resnet34_imsizefixed_{}_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost));
    if False:
        #prediction on val and test sets
        preds, ys = pred_with_flip(learn)
        pt, _ = pred_with_flip(learn,DatasetType.Test)

        if fold == 0: preds_test = pt
        else: preds_test += pt

        #remove noise
        preds[preds.view(preds.shape[0],-1).sum(-1) < noise_th,...] = 0.0

        #optimal threshold 
        #the best way would be collecting all oof predictions followed by single threshold calculation
        #however, it requres too much RAM for high image resolution
        dices = []
        thrs = np.arange(0.01, 1, 0.01)
        for th in progress_bar(thrs):
            preds_m = (preds>th).long()
            dices.append(dice_overall(preds_m, ys).mean())
        dices = np.array(dices)    
        scores.append(dices.max())
        best_thrs.append(thrs[dices.argmax()])

        if fold != nfolds-1: del preds, ys
    gc.collect()
    torch.cuda.empty_cache()
    
# preds_test /= nfolds

In [None]:
# ph2 
scores, best_thrs = [],[]
# first phase for bce then lovasz??
fold = 0
if True:
# for fold in range(nfolds):
    print('fold: ', fold)
    data = get_data(fold)
    learn = unet_learner(data, models.resnet34, metrics=[dice])
    print(mode)
    if mode is not None:
        learn.loss_func = criterion_pixel
    else:
        mode = 'bce'
    
    epochpost=12
    
    learn.clip_grad(1.0);
    
    #fit the decoder part of the model keeping the encode frozen
    lr = 7e-3
    epochpre = 6
    learn.load('resnet34_imsizefixed_{}_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))
#     learn.fit_one_cycle(epochpre, lr, callbacks = [AccumulateStep(learn,n_acc)])
    
    #fit entire model with saving on the best epoch
    learn.unfreeze()
#     learn.fit_one_cycle(12, slice(lr/80, lr/2), callbacks = [AccumulateStep(learn,n_acc)])
    learn.fit_one_cycle(epochpost, slice(lr/80, lr/2), callbacks = [
        AccumulateStep(learn,n_acc), 
        SaveModelCallback(learn, 
                          monitor='dice', 
                          mode='max',
                          name='resnet34_imsizefixed_finetune_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))])
    learn.save('resnet34_imsizefixed_finetune_{}_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost));
    if False:
        #prediction on val and test sets
        preds, ys = pred_with_flip(learn)
        pt, _ = pred_with_flip(learn,DatasetType.Test)

        if fold == 0: preds_test = pt
        else: preds_test += pt

        #remove noise
        preds[preds.view(preds.shape[0],-1).sum(-1) < noise_th,...] = 0.0

        #optimal threshold 
        #the best way would be collecting all oof predictions followed by single threshold calculation
        #however, it requres too much RAM for high image resolution
        dices = []
        thrs = np.arange(0.01, 1, 0.01)
        for th in progress_bar(thrs):
            preds_m = (preds>th).long()
            dices.append(dice_overall(preds_m, ys).mean())
        dices = np.array(dices)    
        scores.append(dices.max())
        best_thrs.append(thrs[dices.argmax()])

        if fold != nfolds-1: del preds, ys
    gc.collect()
    torch.cuda.empty_cache()
    
# preds_test /= nfolds

In [None]:
# ph3
scores, best_thrs = [],[]
# first phase for bce then lovasz??
fold = 0
if True:
# for fold in range(nfolds):
    print('fold: ', fold)
    data = get_data(fold)
    learn = unet_learner(data, models.resnet34, metrics=[dice])
    print(mode)
    if mode is not None:
        learn.loss_func = criterion_pixel
    else:
        mode = 'bce'
    
    epochpost=12
    
    learn.clip_grad(1.0);
    
    #fit the decoder part of the model keeping the encode frozen
    lr = 7e-3
    epochpre = 6
    learn.load('resnet34_imsizefixed_finetune_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))
#     learn.fit_one_cycle(epochpre, lr, callbacks = [AccumulateStep(learn,n_acc)])
    
    #fit entire model with saving on the best epoch
    learn.unfreeze()
    
    
#     learn.fit_one_cycle(12, slice(lr/80, lr/2), callbacks = [AccumulateStep(learn,n_acc)])
    learn.fit_one_cycle(epochpost, slice(lr/80, lr/4), callbacks = [
        AccumulateStep(learn,n_acc), 
        SaveModelCallback(learn, 
                          monitor='dice', 
                          mode='max',
                          name='resnet34_imsizefixed_finetune_0.25_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))])
    learn.save('resnet34_imsizefixed_finetune_0.25_{}_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost));
    if False:
        #prediction on val and test sets
        preds, ys = pred_with_flip(learn)
        pt, _ = pred_with_flip(learn,DatasetType.Test)

        if fold == 0: preds_test = pt
        else: preds_test += pt

        #remove noise
        preds[preds.view(preds.shape[0],-1).sum(-1) < noise_th,...] = 0.0

        #optimal threshold 
        #the best way would be collecting all oof predictions followed by single threshold calculation
        #however, it requres too much RAM for high image resolution
        dices = []
        thrs = np.arange(0.01, 1, 0.01)
        for th in progress_bar(thrs):
            preds_m = (preds>th).long()
            dices.append(dice_overall(preds_m, ys).mean())
        dices = np.array(dices)    
        scores.append(dices.max())
        best_thrs.append(thrs[dices.argmax()])

        if fold != nfolds-1: del preds, ys
    gc.collect()
    torch.cuda.empty_cache()
    
# preds_test /= nfolds

In [None]:
# ph4
scores, best_thrs = [],[]
# first phase for bce then lovasz??
fold = 0
if True:
# for fold in range(nfolds):
    print('fold: ', fold)
    data = get_data(fold)
    learn = unet_learner(data, models.resnet34, metrics=[dice])
    print(mode)
    if mode is not None:
        learn.loss_func = criterion_pixel
    else:
        mode = 'bce'
    
    epochpost=12
    
    learn.clip_grad(1.0);
    
    #fit the decoder part of the model keeping the encode frozen
    lr = 7e-3
    epochpre = 6
    learn.load('resnet34_imsizefixed_finetune_0.25_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))
#     learn.fit_one_cycle(epochpre, lr, callbacks = [AccumulateStep(learn,n_acc)])
    
    #fit entire model with saving on the best epoch
    learn.unfreeze()
#     learn.fit_one_cycle(12, slice(lr/80, lr/2), callbacks = [AccumulateStep(learn,n_acc)])
    learn.fit_one_cycle(epochpost, slice(lr/80, lr/8), callbacks = [
        AccumulateStep(learn,n_acc), 
        SaveModelCallback(learn, 
                          monitor='dice', 
                          mode='max',
                          name='resnet34_imsizefixed_finetune_0.125_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))])
    learn.save('resnet34_imsizefixed_finetune_0.125_{}_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost));
    if False:
        #prediction on val and test sets
        preds, ys = pred_with_flip(learn)
        pt, _ = pred_with_flip(learn,DatasetType.Test)

        if fold == 0: preds_test = pt
        else: preds_test += pt

        #remove noise
        preds[preds.view(preds.shape[0],-1).sum(-1) < noise_th,...] = 0.0

        #optimal threshold 
        #the best way would be collecting all oof predictions followed by single threshold calculation
        #however, it requres too much RAM for high image resolution
        dices = []
        thrs = np.arange(0.01, 1, 0.01)
        for th in progress_bar(thrs):
            preds_m = (preds>th).long()
            dices.append(dice_overall(preds_m, ys).mean())
        dices = np.array(dices)    
        scores.append(dices.max())
        best_thrs.append(thrs[dices.argmax()])

        if fold != nfolds-1: del preds, ys
    gc.collect()
    torch.cuda.empty_cache()
    
# preds_test /= nfolds

In [None]:
learn.recorder.metrics

In [None]:
# for submit pres_test probabilities
scores, best_thrs = [],[]
# if True:
for fold in range(nfolds):
    print('fold: ', fold)
    data = get_data(fold)
    learn = unet_learner(data, models.resnet34, metrics=[dice])
    print(mode)
    if mode is not None:
        learn.loss_func = criterion_pixel
    else:
        mode = 'bce'
    
    epochpost=12
    
    learn.clip_grad(1.0);
    
    #fit the decoder part of the model keeping the encode frozen
    lr = 7e-3
    epochpre = 6
    learn.load('resnet34_imsizefixed_finetune_0.125_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))
#     learn.fit_one_cycle(epochpre, lr, callbacks = [AccumulateStep(learn,n_acc)])
    if False:
        #fit entire model with saving on the best epoch
        learn.unfreeze()
    #     learn.fit_one_cycle(12, slice(lr/80, lr/2), callbacks = [AccumulateStep(learn,n_acc)])
        learn.fit_one_cycle(epochpost, slice(lr/80, lr/2), callbacks = [
            AccumulateStep(learn,n_acc), 
            SaveModelCallback(learn, 
                              monitor='dice', 
                              mode='max',
                              name='resnet34_imsizefixed_{}_best_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost))])
        learn.save('resnet34_imsizefixed_{}_img{}_fold{}_lr{}_epochpre{}_epochpost{}'.format(mode,sz,fold,lr,epochpre,epochpost));

        #prediction on val and test sets
        preds, ys = pred_with_flip(learn)
    if True:
        pt, _ = pred_with_flip(learn,DatasetType.Test)

        if fold == 0: preds_test = pt
        else: preds_test += pt
    if False:
        #remove noise
        preds[preds.view(preds.shape[0],-1).sum(-1) < noise_th,...] = 0.0

        #optimal threshold 
        #the best way would be collecting all oof predictions followed by single threshold calculation
        #however, it requres too much RAM for high image resolution
        dices = []
        thrs = np.arange(0.01, 1, 0.01)
        for th in progress_bar(thrs):
            preds_m = (preds>th).long()
            dices.append(dice_overall(preds_m, ys).mean())
        dices = np.array(dices)    
        scores.append(dices.max())
        best_thrs.append(thrs[dices.argmax()])

        if fold != nfolds-1: del preds, ys
    gc.collect()
    torch.cuda.empty_cache()
    
preds_test /= nfolds

In [None]:
if False:
    torch.save(preds_test, 'preds_test_resnet34_imsizefixed_finetune_0.125_{}_best_img{}_all_lr{}_epochpre{}_epochpost{}'.format(mode,sz,lr,epochpre,epochpost))
# preds_test = torch.load('preds_test_resnet34_imsizefixed_finetune_0.125_{}_best_img{}_all_lr{}_epochpre{}_epochpost{}'.format(mode,sz,lr,epochpre,epochpost))

In [None]:
# for submit and postprocessing
import cv2
sz = 1024
noise_th = noise_th_multiplier*(sz/128.0)**2 #threshold for the number of predicted pixels
best_thr = 0.21
cell_prob_thr = 0.35
# Generate rle encodings (images are first converted to the original size)
preds_t = (preds_test>best_thr).long().numpy()
rles = []
all_probs = []
for i,p in enumerate(preds_t):
    if(p.sum() > 0):
        pred_image = preds_test[i].numpy().T
        im = PIL.Image.fromarray((p.T*255).astype(np.uint8)).resize((1024,1024))
        im = np.asarray(im)
        num_component, component = cv2.connectedComponents(im)
        im_temp = np.zeros((1024,1024),np.float32)
        num = 0
        
        cell_probs = []
        for c in range(1, num_component):
            p = (component==c )
            each_probs = np.mean(pred_image[p])
            cell_probs.append(each_probs)
            
            
            print('cell probs {}'.format(each_probs))
            if each_probs > cell_prob_thr :
                im_temp[p] = 255
                num +=1
            else:
                print('drop {}'.format(p.sum()))
        all_probs.append(cell_probs)
        rles.append(mask2rle(im_temp, 1024, 1024))
        
    else: rles.append('-1')
    
sub_df = pd.DataFrame({'ImageId': ids_test, 'EncodedPixels': rles})
sub_df.to_csv('submission_probpost{}_singlemask_best0.8729_without_noiseremoval_correct.csv'.format(cell_prob_thr,best_thr), index=False)

In [None]:
# ======dgx01 bce+dice
# fold0
# [[tensor(0.7610)],
#  [tensor(0.8041)],
#  [tensor(0.8097)],
#  [tensor(0.7132)],
#  [tensor(0.7936)],
#  [tensor(0.8077)],
#  [tensor(0.8111)],
#  [tensor(0.8253)],
#  [tensor(0.8110)],
#  [tensor(0.8325)],
#  [tensor(0.8410)],
#  [tensor(0.8405)]]

# fold0 ph2
# [[tensor(0.8200)],
#  [tensor(0.7529)],
#  [tensor(0.8317)],
#  [tensor(0.8283)],
#  [tensor(0.8111)],
#  [tensor(0.8371)],
#  [tensor(0.8340)],
#  [tensor(0.8387)],
#  [tensor(0.8224)],
#  [tensor(0.8353)],
#  [tensor(0.8444)],
#  [tensor(0.8385)]]

# fold0 ph3
# [[tensor(0.8403)],
#  [tensor(0.8375)],
#  [tensor(0.8047)],
#  [tensor(0.7944)],
#  [tensor(0.8317)],
#  [tensor(0.8130)],
#  [tensor(0.8389)],
#  [tensor(0.8421)],
#  [tensor(0.8400)],
#  [tensor(0.8402)],
#  [tensor(0.8451)],
#  [tensor(0.8448)]]

# fold0 ph4
# [[tensor(0.8448)],
#  [tensor(0.8384)],
#  [tensor(0.8219)],
#  [tensor(0.8246)],
#  [tensor(0.8281)],
#  [tensor(0.8332)],
#  [tensor(0.8408)],
#  [tensor(0.8447)],
#  [tensor(0.8437)],
#  [tensor(0.8442)],
#  [tensor(0.8433)],
#  [tensor(0.8420)]]

# fold0 ph5 0.8453
# [[tensor(0.8453)],
#  [tensor(0.8339)],
#  [tensor(0.8289)],
#  [tensor(0.8268)],
#  [tensor(0.8417)],
#  [tensor(0.8382)],
#  [tensor(0.8432)],
#  [tensor(0.8388)],
#  [tensor(0.8426)],
#  [tensor(0.8427)],
#  [tensor(0.8444)],
#  [tensor(0.8447)]]

# fold0 ph6 0.8451
# [[tensor(0.8398)],
#  [tensor(0.8412)],
#  [tensor(0.8351)],
#  [tensor(0.8346)],
#  [tensor(0.8438)],
#  [tensor(0.8391)],
#  [tensor(0.8430)],
#  [tensor(0.8423)],
#  [tensor(0.8451)],
#  [tensor(0.8428)],
#  [tensor(0.8448)],
#  [tensor(0.8440)]]

# fold1
# [[tensor(0.7996)],
#  [tensor(0.8182)],
#  [tensor(0.7909)],
#  [tensor(0.7418)],
#  [tensor(0.8085)],
#  [tensor(0.8155)],
#  [tensor(0.8028)],
#  [tensor(0.8173)],
#  [tensor(0.8403)],
#  [tensor(0.8244)],
#  [tensor(0.8425)],
#  [tensor(0.8436)]]

# fold1 ph2
# [[tensor(0.8474)],
#  [tensor(0.8165)],
#  [tensor(0.8410)],
#  [tensor(0.8238)],
#  [tensor(0.8356)],
#  [tensor(0.8425)],
#  [tensor(0.8482)],
#  [tensor(0.8215)],
#  [tensor(0.8382)],
#  [tensor(0.8492)],
#  [tensor(0.8492)],
#  [tensor(0.8510)]]

# fold1 ph3
# [[tensor(0.8461)],
#  [tensor(0.8470)],
#  [tensor(0.8275)],
#  [tensor(0.8319)],
#  [tensor(0.8418)],
#  [tensor(0.8248)],
#  [tensor(0.8291)],
#  [tensor(0.8524)],
#  [tensor(0.8506)],
#  [tensor(0.8439)],
#  [tensor(0.8526)],
#  [tensor(0.8527)]]

# fold1 ph4
# [[tensor(0.8517)],
#  [tensor(0.8512)],
#  [tensor(0.8448)],
#  [tensor(0.8524)],
#  [tensor(0.8482)],
#  [tensor(0.8367)],
#  [tensor(0.8540)],
#  [tensor(0.8519)],
#  [tensor(0.8509)],
#  [tensor(0.8552)],
#  [tensor(0.8544)],
#  [tensor(0.8537)]]

# fold1 ph5
# [[tensor(0.8536)],
#  [tensor(0.8385)],
#  [tensor(0.8545)],
#  [tensor(0.8497)],
#  [tensor(0.8496)],
#  [tensor(0.8483)],
#  [tensor(0.8480)],
#  [tensor(0.8496)],
#  [tensor(0.8518)],
#  [tensor(0.8554)],
#  [tensor(0.8553)],
#  [tensor(0.8537)]]

# fold2
# [[tensor(0.8269)],
#  [tensor(0.7563)],
#  [tensor(0.7875)],
#  [tensor(0.8114)],
#  [tensor(0.7880)],
#  [tensor(0.8278)],
#  [tensor(0.8184)],
#  [tensor(0.8305)],
#  [tensor(0.8379)],
#  [tensor(0.8347)],
#  [tensor(0.8409)],
#  [tensor(0.8393)]]

# fold2 ph2
# [[tensor(0.8361)],
#  [tensor(0.8081)],
#  [tensor(0.8213)],
#  [tensor(0.8044)],
#  [tensor(0.8368)],
#  [tensor(0.8188)],
#  [tensor(0.8359)],
#  [tensor(0.8319)],
#  [tensor(0.8367)],
#  [tensor(0.8410)],
#  [tensor(0.8401)],
#  [tensor(0.8415)]]

# fold2 ph3
# [[tensor(0.8411)],
#  [tensor(0.8151)],
#  [tensor(0.8408)],
#  [tensor(0.8419)],
#  [tensor(0.8409)],
#  [tensor(0.8407)],
#  [tensor(0.8375)],
#  [tensor(0.8343)],
#  [tensor(0.8399)],
#  [tensor(0.8429)],
#  [tensor(0.8410)],
#  [tensor(0.8424)]]

# fold2 ph4
# [[tensor(0.8432)],
#  [tensor(0.8428)],
#  [tensor(0.8133)],
#  [tensor(0.8381)],
#  [tensor(0.8412)],
#  [tensor(0.8365)],
#  [tensor(0.8448)],
#  [tensor(0.8374)],
#  [tensor(0.8430)],
#  [tensor(0.8452)],
#  [tensor(0.8417)],
#  [tensor(0.8451)]]

# fold2 ph5
# [[tensor(0.8452)],
#  [tensor(0.8385)],
#  [tensor(0.8380)],
#  [tensor(0.8391)],
#  [tensor(0.8430)],
#  [tensor(0.8331)],
#  [tensor(0.8442)],
#  [tensor(0.8430)],
#  [tensor(0.8444)],
#  [tensor(0.8433)],
#  [tensor(0.8449)],
#  [tensor(0.8448)]]

# fold3
# [[tensor(0.8108)],
#  [tensor(0.7294)],
#  [tensor(0.8016)],
#  [tensor(0.8030)],
#  [tensor(0.8100)],
#  [tensor(0.7928)],
#  [tensor(0.8057)],
#  [tensor(0.8155)],
#  [tensor(0.8287)],
#  [tensor(0.8241)],
#  [tensor(0.8257)],
#  [tensor(0.8202)]]

# fold3 ph2
# [[tensor(0.8275)],
#  [tensor(0.8160)],
#  [tensor(0.8226)],
#  [tensor(0.8029)],
#  [tensor(0.7808)],
#  [tensor(0.8048)],
#  [tensor(0.8250)],
#  [tensor(0.8262)],
#  [tensor(0.8279)],
#  [tensor(0.8321)],
#  [tensor(0.8344)],
#  [tensor(0.8348)]]

# fold3 ph3
# [[tensor(0.8338)],
#  [tensor(0.8211)],
#  [tensor(0.8198)],
#  [tensor(0.8046)],
#  [tensor(0.8250)],
#  [tensor(0.8252)],
#  [tensor(0.8140)],
#  [tensor(0.8227)],
#  [tensor(0.8303)],
#  [tensor(0.8326)],
#  [tensor(0.8336)],
#  [tensor(0.8318)]]

# fold3 ph4
# [[tensor(0.8298)],
#  [tensor(0.8266)],
#  [tensor(0.8264)],
#  [tensor(0.7987)],
#  [tensor(0.8187)],
#  [tensor(0.8204)],
#  [tensor(0.8256)],
#  [tensor(0.8313)],
#  [tensor(0.8289)],
#  [tensor(0.8313)],
#  [tensor(0.8330)],
#  [tensor(0.8338)]]

# fold3 ph5
# [[tensor(0.8344)],
#  [tensor(0.8274)],
#  [tensor(0.8188)],
#  [tensor(0.8253)],
#  [tensor(0.8288)],
#  [tensor(0.8286)],
#  [tensor(0.8355)],
#  [tensor(0.8326)],
#  [tensor(0.8348)],
#  [tensor(0.8340)],
#  [tensor(0.8353)],
#  [tensor(0.8333)]]

# fold3 ph6
# [[tensor(0.8337)],
#  [tensor(0.8282)],
#  [tensor(0.8276)],
#  [tensor(0.8180)],
#  [tensor(0.8307)],
#  [tensor(0.8309)],
#  [tensor(0.8319)],
#  [tensor(0.8292)],
#  [tensor(0.8302)],
#  [tensor(0.8328)],
#  [tensor(0.8314)],
#  [tensor(0.8315)]]

In [None]:
# ======bce+dice resnet50 3e-4
# fold 0 0.8277
# [[tensor(0.7856)],
#  [tensor(0.7795)],
#  [tensor(0.7898)],
#  [tensor(0.7569)],
#  [tensor(0.8013)],
#  [tensor(0.8114)],
#  [tensor(0.8205)],
#  [tensor(0.8277)],
#  [tensor(0.8241)],
#  [tensor(0.8254)],
#  [tensor(0.8259)],
#  [tensor(0.8224)]]

# fold 0 ph2 0.8300
# epoch	train_loss	valid_loss	dice	time
# 0	0.004460	0.005886	0.821651	51:52
# 1	0.005969	0.005893	0.801328	51:46
# 2	0.007122	0.006117	0.804395	51:43
# 3	0.006183	0.006231	0.812023	51:40
# 4	0.005936	0.006027	0.811091	51:38
# 5	0.003460	0.006016	0.782531	51:38
# 6	0.004310	0.005997	0.804455	51:17
# 7	0.003708	0.006566	0.819687	51:17
# 8	0.003695	0.006318	0.824498	51:17
# 9	0.003874	0.006612	0.821759	51:19
# 10	0.002778	0.006484	0.822675	51:18
# 11	0.002100	0.006779	0.830086	51:18
# Better model found at epoch 0 with dice value: 0.8216508626937866.
# Better model found at epoch 8 with dice value: 0.8244979977607727.
# Better model found at epoch 11 with dice value: 0.8300861120223999

# fold1 0.8309
# [[tensor(0.7777)],
#  [tensor(0.7778)],
#  [tensor(0.7457)],
#  [tensor(0.8054)],
#  [tensor(0.8236)],
#  [tensor(0.7926)],
#  [tensor(0.8081)],
#  [tensor(0.8243)],
#  [tensor(0.8281)],
#  [tensor(0.8304)],
#  [tensor(0.8269)],
#  [tensor(0.8309)]]

# fold1 ph2 0.8346
# [[tensor(0.8257)],
#  [tensor(0.8350)],
#  [tensor(0.8087)],
#  [tensor(0.8362)],
#  [tensor(0.8161)],
#  [tensor(0.8340)],
#  [tensor(0.8198)],
#  [tensor(0.8336)],
#  [tensor(0.8326)],
#  [tensor(0.8346)],
#  [tensor(0.8343)],
#  [tensor(0.8325)]]

# fold2 0.8271
# [[tensor(0.7998)],
#  [tensor(0.7681)],
#  [tensor(0.8180)],
#  [tensor(0.8195)],
#  [tensor(0.7847)],
#  [tensor(0.8117)],
#  [tensor(0.7417)],
#  [tensor(0.8149)],
#  [tensor(0.8271)],
#  [tensor(0.8257)],
#  [tensor(0.8244)],
#  [tensor(0.8193)]]

# fold3 0.8178
# [[tensor(0.7780)],
#  [tensor(0.7779)],
#  [tensor(0.7817)],
#  [tensor(0.7963)],
#  [tensor(0.7897)],
#  [tensor(0.7657)],
#  [tensor(0.8178)],
#  [tensor(0.8102)],
#  [tensor(0.8022)],
#  [tensor(0.8110)],
#  [tensor(0.8114)],
#  [tensor(0.8144)]]

In [None]:
fold 0
epoch	train_loss	valid_loss	dice	time
0	0.005329	0.008973	0.820277	1:30:01
1	0.004792	0.009235	0.824711	1:29:40
2	0.004611	0.010663	0.825052	1:29:36
3	0.004171	0.011810	0.812811	1:29:34
4	0.003634	0.009718	0.824433	1:29:29
5	0.005561	0.010448	0.813646	1:29:27
6	0.004434	0.008877	0.822849	1:29:31
7	0.004877	0.009485	0.822946	1:29:29
8	0.002459	0.010549	0.820190	1:29:31
9	0.003624	0.009636	0.822931	1:29:29
10	0.003978	0.011270	0.822492	1:29:30
11	0.002737	0.011042	0.822096	1:29:34
Better model found at epoch 0 with dice value: 0.8202774524688721.
Better model found at epoch 1 with dice value: 0.8247113227844238.
Better model found at epoch 2 with dice value: 0.8250521421432495.


fold:  1

epoch	train_loss	valid_loss	dice	time
0	0.007127	0.008310	0.836103	1:30:32
1	0.004536	0.009290	0.831446	1:29:56
2	0.004288	0.009331	0.831958	1:29:53
3	0.004563	0.009137	0.833080	1:29:45
4	0.004158	0.010182	0.832579	1:29:41
5	0.003921	0.009543	0.833295	1:29:43
6	0.004194	0.009401	0.835947	1:29:44
7	0.004584	0.009832	0.837901	1:29:45
8	0.004033	0.010654	0.831863	1:29:45
9	0.004907	0.010494	0.835302	1:29:44
10	0.002935	0.010810	0.831416	1:29:44
11	0.002860	0.010834	0.831417	1:29:42
Better model found at epoch 0 with dice value: 0.8361029624938965.
Better model found at epoch 7 with dice value: 0.8379009366035461.

fold:  2
epoch	train_loss	valid_loss	dice	time
0	0.014262	0.014819	0.687674	1:22:04
1	0.011007	0.009866	0.806199	1:21:13
2	0.012556	0.010506	0.805177	1:20:59
3	0.010263	0.008242	0.813777	1:20:57
4	0.006284	0.007857	0.818966	1:21:05
5	0.009344	0.007730	0.818302	1:21:46
epoch	train_loss	valid_loss	dice	time
0	0.009150	0.008387	0.822880	1:29:24
1	0.007065	0.007723	0.819144	1:29:20
2	0.004986	0.009361	0.814605	1:29:31
3	0.007759	0.008976	0.820553	1:29:37
4	0.008809	0.008335	0.822547	1:28:23
5	0.006168	0.008478	0.823449	1:28:11
6	0.005340	0.008670	0.822819	1:28:15
7	0.004951	0.007760	0.822989	1:28:07
8	0.004814	0.007861	0.827206	1:28:12
9	0.009067	0.008105	0.827544	1:28:07
10	0.005683	0.008157	0.827294	1:28:08
11	0.005105	0.008057	0.823705	1:27:51
Better model found at epoch 0 with dice value: 0.8228797912597656.
Better model found at epoch 5 with dice value: 0.8234494924545288.
Better model found at epoch 8 with dice value: 0.8272055983543396.
Better model found at epoch 9 with dice value: 0.8275436758995056.
fold:  3
epoch	train_loss	valid_loss	dice	time
0	0.026892	0.014963	0.770922	1:20:51
1	0.013744	0.011118	0.786418	1:20:20
2	0.007804	0.011807	0.792785	1:20:09
3	0.006909	0.009720	0.807148	1:20:02
4	0.011643	0.009819	0.808711	1:20:11
5	0.005560	0.009124	0.808388	1:21:24
epoch	train_loss	valid_loss	dice	time
0	0.006412	0.009343	0.812534	1:29:12
1	0.005806	0.009915	0.804407	1:29:03
2	0.007079	0.011265	0.800056	1:29:24
3	0.008566	0.009434	0.805174	1:29:20
4	0.006620	0.010455	0.807575	1:28:20
5	0.006316	0.010035	0.805124	1:28:00
6	0.005538	0.009238	0.814438	1:28:01
7	0.007481	0.009090	0.810495	1:27:56
8	0.004882	0.009584	0.815818	1:27:58
9	0.003599	0.009886	0.813863	1:27:56
10	0.004064	0.009721	0.815348	1:27:52
11	0.004554	0.009581	0.818450	1:27:44
Better model found at epoch 0 with dice value: 0.8125340342521667.
Better model found at epoch 6 with dice value: 0.8144382238388062.
Better model found at epoch 8 with dice value: 0.8158184885978699.
Better model found at epoch 11 with dice value: 0.8184497952461243.
-------------------512 finetune 0.25 RESNEXT101-32X8D
fold:  0
epoch	train_loss	valid_loss	dice	time
0	0.005299	0.008649	0.825610	1:30:29
1	0.004567	0.008544	0.826394	1:29:59
2	0.003797	0.009225	0.822685	1:29:55
3	0.003726	0.009869	0.821343	1:29:49
4	0.003118	0.008614	0.828068	1:29:44
5	0.004983	0.009363	0.822326	1:29:49
6	0.003822	0.009243	0.824851	1:29:50
7	0.004120	0.009700	0.821603	1:29:48
8	0.002501	0.010308	0.820887	1:29:50
9	0.003941	0.010059	0.822769	1:29:47
10	0.004392	0.010589	0.821599	1:29:49
11	0.003021	0.010419	0.822123	1:29:49
Better model found at epoch 0 with dice value: 0.8256103992462158.
Better model found at epoch 1 with dice value: 0.8263944387435913.
Better model found at epoch 4 with dice value: 0.8280683159828186.



fold:  2
epoch	train_loss	valid_loss	dice	time
0	0.005731	0.008329	0.828296	1:30:30
1	0.004366	0.008295	0.825781	1:29:55
2	0.005894	0.009523	0.823959	1:29:50
3	0.004642	0.010283	0.820261	1:29:43
4	0.003910	0.009004	0.824968	1:29:39
5	0.005872	0.008353	0.831194	1:29:42
6	0.004779	0.009164	0.827115	1:29:48
7	0.003729	0.009589	0.827741	1:29:42
8	0.002382	0.010495	0.825872	1:29:42
9	0.003647	0.009949	0.827074	1:29:42
10	0.003826	0.010135	0.827529	1:29:39
11	0.003138	0.010171	0.827706	1:29:43
Better model found at epoch 0 with dice value: 0.8282960653305054.
Better model found at epoch 5 with dice value: 0.8311936259269714.
fold3
epoch	train_loss	valid_loss	dice	time
0	0.005510	0.009959	0.814834	1:30:27
1	0.004696	0.009779	0.815714	1:29:54
2	0.004126	0.009552	0.818221	1:29:51
3	0.004121	0.010156	0.817459	1:29:44
4	0.004757	0.010107	0.818174	1:29:38
5	0.003463	0.009922	0.815573	1:29:41
6	0.004251	0.010353	0.814511	1:29:43
7	0.003629	0.010751	0.815229	1:29:42
8	0.003579	0.011269	0.812502	1:29:44
9	0.004614	0.011579	0.811707	1:29:42
10	0.003638	0.010986	0.816639	1:29:42
11	0.004797	0.011186	0.818167	1:29:40
Better model found at epoch 0 with dice value: 0.8148338198661804.
Better model found at epoch 1 with dice value: 0.8157141804695129.
Better model found at epoch 2 with dice value: 0.8182209134101868.




fold:  3
epoch	train_loss	valid_loss	dice	time
0	0.005536	0.010181	0.815544	1:29:43
1	0.005007	0.010228	0.813226	1:29:43
2	0.004829	0.009476	0.805355	1:29:38
3	0.005007	0.010039	0.811850	1:29:34
4	0.005185	0.011137	0.808189	1:29:30
5	0.003862	0.010142	0.815380	1:29:32
6	0.004465	0.009667	0.816439	1:29:30
7	0.004076	0.012367	0.808578	1:29:27
8	0.003637	0.010428	0.818723	1:29:32
9	0.004116	0.011323	0.813577	1:29:30
10	0.003371	0.011082	0.817705	1:29:28
11	0.004306	0.011553	0.816912	1:29:37
Better model found at epoch 0 with dice value: 0.8155437707901001.
Better model found at epoch 6 with dice value: 0.8164392709732056.
Better model found at epoch 8 with dice value: 0.8187227249145508.