# PANDA EfficientNet-B0 Baseline with 36 x tiles_256

Hi everyone,

I'm here to show you how to train a single efficientnet-b0 model to get LB 0.87

Inference kernel is https://www.kaggle.com/haqishen/panda-inference-w-36-tiles-256

If you find find any of the following idea helps, please upvote me, THANKS!

# Summary of This Baseline

* Using tiling method based on https://www.kaggle.com/iafoss/panda-16x128x128-tiles
    * Simply setting the `N = 36` and `sz=256` then extract from median resolution
* Create 6x6 big image from 36 tiles
* Efficientnet-B0
* Binning label
    * E.g.
        * `label = [0,0,0,0,0]` means `isup_grade = 0`
        * `label = [1,1,1,0,0]` means `isup_grade = 3`
        * `label = [1,1,1,1,1]` means `isup_grade = 5`
* BCE loss
* Augmentation on both tile level and big image level
* CosineAnnealingLR for one round

# MEMO

The full training process need over `10h` to run so you should run it on your own machine.

# Update
* Version 1
    * Baseline
* Version 2, 3
    * Add some Markdown Text
* Version 4
    * Fix `init_lr` from 3e-5 to 3e-4
* Version 5
    * Add warmup scheduler
    * Add training log for this version
* Version 6
    * Fix the bug that train from scratch. Now it's train from ImageNet pretrained weights. Actually I haven't tried train from scratch yet.
* Version 7, 8
    * Update accuracy calculate.
    * Fix tiny bug.

In [1]:
DEBUG = False

In [2]:
#!pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git

In [3]:
#!pip install efficientnet_pytorch

In [4]:
import os
from pathlib import Path
import sys
# sys.path = [
#     '../input/efficientnet-pytorch/EfficientNet-PyTorch/EfficientNet-PyTorch-master',
# ] + sys.path

In [1]:
import time
import skimage.io
import numpy as np
import pandas as pd
import cv2
import PIL
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import DataLoader, Dataset
from torch.utils.data.sampler import SubsetRandomSampler, RandomSampler, SequentialSampler
from torchvision import transforms
from warmup_scheduler import GradualWarmupScheduler
from radam import *
from csvlogger import *
from mish_activation import *
from efficientnet_pytorch import EfficientNet
import albumentations
from sklearn.model_selection import StratifiedKFold
import matplotlib.pyplot as plt
from sklearn.metrics import cohen_kappa_score
from tqdm import tqdm

In [2]:
# def load_image(fn, mode=None, **kwargs):
#     "Open and load a `PIL.Image` and convert to `mode`"
#     im = PIL.Image.open(fn, **kwargs)
#     im.load()
#     im = im._new(im.im)
#     return im.convert(mode) if mode else im

# # Cell
# def image2tensor(img):
#     "Transform image to byte tensor in `c*h*w` dim order."
#     res = tensor(img)
#     if res.dim()==2: res = res.unsqueeze(-1)
#     return res.permute(2,0,1)

# def pil2numpy(image,dtype:np.dtype):
#     "Convert PIL style `image` array to torch style image tensor."
#     a = np.asarray(image)
# #     if a.ndim==2 : a = np.expand_dims(a,2)
# #     a = np.transpose(a, (1, 0, 2))
# #     a = np.transpose(a, (2, 1, 0))
#     return a.astype(dtype, copy=False)

In [3]:
# def pil2tensor(image,dtype:np.dtype):
#     "Convert PIL style `image` array to torch style image tensor."
#     a = np.asarray(image)
#     if a.ndim==2 : a = np.expand_dims(a,2)
#     a = np.transpose(a, (1, 0, 2))
#     a = np.transpose(a, (2, 1, 0))
#     return torch.from_numpy(a.astype(dtype, copy=False) )

# def open_image(fn, div=False, convert_mode='RGB'):
#     x = load_image(fn, mode=convert_mode)
#     x = pil2numpy(x,np.float32)
#     if div: 
#         #x.div_(255)
#         x = x/255.0
#     #return 1.0 - x #invert image for zero padding
#     return x

# Config

In [4]:
data_dir = Path('/content/clouderizer/panda/data')
TRAIN = '/content/clouderizer/panda/data/train/'
df_train = pd.read_csv(os.path.join(data_dir, 'train.csv')).sort_values('image_id')
tiles_folder = data_dir / 'train'

enet_type = 'efficientnet-b0'
kernel_type = enet_type
fold = 0
tile_size = 128
image_size = 128
n_tiles = 16
N = 16
batch_size = 2
num_workers = 4
out_dim = 6
init_lr = 3e-4
warmup_factor = 10

warmup_epo = 1
n_epochs = 30

device = torch.device('cuda')

# Create Folds

In [5]:
skf = StratifiedKFold(5, shuffle=True, random_state=42)
df_train['fold'] = -1
for i, (train_idx, valid_idx) in enumerate(skf.split(df_train, df_train['isup_grade'])):
    df_train.loc[valid_idx, 'fold'] = i
df_train.head()

Unnamed: 0,image_id,data_provider,isup_grade,gleason_score,fold
0,0005f7aaab2800f6170c399693a96917,karolinska,0,0+0,4
1,000920ad0b612851f8e01bcc880d9b3d,karolinska,0,0+0,0
2,0018ae58b01bdadc8e347995b69f99aa,radboud,4,4+4,3
3,001c62abd11fa4b57bf7a6c603a11bb9,karolinska,4,4+4,4
4,001d865e65ef5d2579c190a0e0350d8f,karolinska,0,0+0,4


# Model

In [6]:
class Model(nn.Module):
    def __init__(self, backbone, out_dim):
        super().__init__()
        self.enet = EfficientNet.from_pretrained(backbone)
        self.myfc = nn.Linear(self.enet._fc.in_features, out_dim)
        self.enet._fc = nn.Identity()

    def extract(self, x):
        return self.enet(x)

    def forward(self, *x):
        shape = x[0].shape
        n = len(x)
        x = torch.stack(x,1).view(-1,shape[1],shape[2],shape[3])
        #x: bs*N x 3 x 128 x 128
        x = self.enet(x)
        #x: bs*N x C x 4 x 4
        shape = x.shape
        #concatenate the output for tiles into a single map
        x = x.view(-1,n,shape[1],shape[2],shape[3]).permute(0,2,1,3,4).contiguous()\
          .view(-1,shape[1],shape[2]*n,shape[3])
        #x: bs x C x N*4 x 4
        x = self.myfc(x)
        #x: bs x n
        return x
#         x = self.extract(x)
#         x = self.myfc(x)
#         return x

# Dataset

In [17]:
all_tiles = [[f'{TRAIN}{f}_{i}.png' for i in range(n_tiles)] for f in df_train.image_id]

In [8]:
#all_files = set(str(p) for p in Path(TRAIN).iterdir())

In [9]:
# def get_tiles(img, mode=0):
#         result = []
#         h, w, c = img.shape
#         pad_h = (tile_size - h % tile_size) % tile_size + ((tile_size * mode) // 2)
#         pad_w = (tile_size - w % tile_size) % tile_size + ((tile_size * mode) // 2)

#         img2 = np.pad(img,[[pad_h // 2, pad_h - pad_h // 2], [pad_w // 2,pad_w - pad_w//2], [0,0]], constant_values=255)
#         img3 = img2.reshape(
#             img2.shape[0] // tile_size,
#             tile_size,
#             img2.shape[1] // tile_size,
#             tile_size,
#             3
#         )

#         img3 = img3.transpose(0,2,1,3,4).reshape(-1, tile_size, tile_size,3)
#         n_tiles_with_info = (img3.reshape(img3.shape[0],-1).sum(1) < tile_size ** 2 * 3 * 255).sum()
#         if len(img3) < n_tiles:
#             img3 = np.pad(img3,[[0,n_tiles-len(img3)],[0,0],[0,0],[0,0]], constant_values=255)
#         idxs = np.argsort(img3.reshape(img3.shape[0],-1).sum(-1))[:n_tiles]
#         img3 = img3[idxs]
#         for i in range(len(img3)):
#             result.append({'img':img3[i], 'idx':i})
#         return result, n_tiles_with_info >= n_tiles


# class PANDADataset(Dataset):
#     def __init__(self,
#                  df,
#                  image_size,
#                  n_tiles=n_tiles,
#                  tile_mode=0,
#                  rand=False,
#                  transform=None,
#                 ):

#         self.df = df.reset_index(drop=True)
#         self.image_size = image_size
#         self.n_tiles = n_tiles
#         self.tile_mode = tile_mode
#         self.rand = rand
#         self.transform = transform

#     def __len__(self):
#         return self.df.shape[0]

#     def __getitem__(self, index):
#         row = self.df.iloc[index]
#         img_id = row.image_id
        
#         #tiff_file = os.path.join(image_folder, f'{img_id}.tiff')
#         #image = skimage.io.MultiImage(tiff_file)[1]
#         #tiles, OK = get_tiles(image, self.tile_mode)
        
#         tiles = all_tiles[index]

#         if self.rand:
#             idxes = np.random.choice(list(range(self.n_tiles)), self.n_tiles, replace=False)
#         else:
#             idxes = list(range(self.n_tiles))

#         n_row_tiles = int(np.sqrt(self.n_tiles))
#         images = np.zeros((image_size * n_row_tiles, image_size * n_row_tiles, 3))
#         for h in range(n_row_tiles):
#             for w in range(n_row_tiles):
#                 i = h * n_row_tiles + w
                
#                 if len(tiles) > idxes[i] and tiles[idxes[i]] in all_files:
#                     this_img = open_image(tiles[idxes[i]])
#                 else:
#                      this_img = np.ones((self.image_size, self.image_size, 3)).astype(np.uint8) * 255
#                 this_img = 255 - this_img
#                 if self.transform is not None:
#                     this_img = self.transform(image=this_img)['image'] #['image']
#                 h1 = h * image_size
#                 w1 = w * image_size
#                 images[h1:h1+image_size, w1:w1+image_size] = this_img

#         if self.transform is not None:
#             images = self.transform(image=images)['image']
#         images = images.astype(np.float32)
#         images /= 255
#         images = images.transpose(2, 0, 1)

#         label = np.zeros(5).astype(np.float32)
#         label[:row.isup_grade] = 1.
#         return torch.tensor(images), torch.tensor(label)


# Augmentations

In [10]:
# transforms_train = albumentations.Compose([
#     albumentations.Transpose(p=0.5),
#     albumentations.VerticalFlip(p=0.5),
#     albumentations.HorizontalFlip(p=0.5),
# ])
# transforms_val = albumentations.Compose([])

In [11]:
# dataset_show = PANDADataset(df_train, image_size, n_tiles, 0, transform=transforms_train)
# from pylab import rcParams
# rcParams['figure.figsize'] = 20,10
# for i in range(2):
#     f, axarr = plt.subplots(1,5)
#     for p in range(5):
#         idx = np.random.randint(0, len(dataset_show))
#         img, label = dataset_show[idx]
#         axarr[p].imshow(1. -  img.transpose(0, 1).transpose(1,2).squeeze())
#         axarr[p].set_title(str(sum(label)))


In [12]:
mean = torch.tensor([1.0-0.90949707, 1.0-0.8188697, 1.0-0.87795304])
std = torch.tensor([0.36357649, 0.49984502, 0.40477625])

The code below (in the hidden cell) creates ImageItemList capable of loading multiple tiles of an image. It is specific for fast.ai, and pure Pytorch code would be much simpler.

In [13]:
def open_image(fn:PathOrStr, div:bool=True, convert_mode:str='RGB', cls:type=Image,
        after_open:Callable=None)->Image:
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin
        x = PIL.Image.open(fn).convert(convert_mode)
    if after_open: x = after_open(x)
    x = pil2tensor(x,np.float32)
    if div: x.div_(255)
    return cls(1.0-x) #invert image for zero padding

class MImage(ItemBase):
    def __init__(self, imgs):
        self.obj, self.data = \
          (imgs), [(imgs[i].data - mean[...,None,None])/std[...,None,None] for i in range(len(imgs))]
    
    def apply_tfms(self, tfms,*args, **kwargs):
        for i in range(len(self.obj)):
            self.obj[i] = self.obj[i].apply_tfms(tfms, *args, **kwargs)
            self.data[i] = (self.obj[i].data - mean[...,None,None])/std[...,None,None]
        return self
    
    def __repr__(self): return f'{self.__class__.__name__} {img.shape for img in self.obj}'
    def to_one(self):
        img = torch.stack(self.data,1)
        img = img.view(3,-1,N,sz,sz).permute(0,1,3,2,4).contiguous().view(3,-1,sz*N)
        return Image(1.0 - (mean[...,None,None]+img*std[...,None,None]))

class MImageItemList(ImageList):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def __len__(self)->int: return len(self.items) or 1 
    
    def get(self, i):
        fn = Path(self.items[i])
        fnames = [Path(str(fn)+'_'+str(i)+'.png')for i in range(N)]
        imgs = [open_image(fname, convert_mode=self.convert_mode, after_open=self.after_open)
               for fname in fnames]
        return MImage(imgs)

    def reconstruct(self, t):
        return MImage([mean[...,None,None]+_t*std[...,None,None] for _t in t])
    
    def show_xys(self, xs, ys, figsize:Tuple[int,int]=(300,50), **kwargs):
        rows = min(len(xs),8)
        fig, axs = plt.subplots(rows,1,figsize=figsize)
        for i, ax in enumerate(axs.flatten() if rows > 1 else [axs]):
            xs[i].to_one().show(ax=ax, y=ys[i], **kwargs)
        plt.tight_layout()
        

#collate function to combine multiple images into one tensor
def MImage_collate(batch:ItemsList)->Tensor:
    result = torch.utils.data.dataloader.default_collate(to_data(batch))
    if isinstance(result[0],list):
        result = [torch.stack(result[0],1),result[1]]
    return result

In [14]:
def get_data(fold=0):
    return (MImageItemList.from_df(df_train, path='/', folder=TRAIN, cols='image_id')
      .split_by_idx(df_train.index[df_train.fold == fold].tolist())
      .label_from_df(cols=['isup_grade'])
      .transform(get_transforms(flip_vert=True,max_rotate=15),size=image_size,padding_mode='zeros')
      .databunch(bs=batch_size,num_workers=num_workers))

# Loss

In [15]:
criterion = nn.BCEWithLogitsLoss()

# Train & Val

In [16]:
def train_epoch(loader, optimizer):

    model.train()
    train_loss = []
    bar = tqdm(loader)
    for (data, target) in bar:
        
        #data, target = data.to(device), target.to(device)
        loss_func = criterion
        optimizer.zero_grad()
        logits = model(*data)
        loss = loss_func(logits, *target)
        loss.backward()
        optimizer.step()

        loss_np = loss.detach().cpu().numpy()
        train_loss.append(loss_np)
        smooth_loss = sum(train_loss[-100:]) / min(len(train_loss), 100)
        bar.set_description('loss: %.5f, smth: %.5f' % (loss_np, smooth_loss))
    return train_loss


def val_epoch(loader, get_output=False):

    model.eval()
    val_loss = []
    LOGITS = []
    PREDS = []
    TARGETS = []

    with torch.no_grad():
        for (data, target) in tqdm(loader):
            #data, target = data.to(device), target.to(device)
            logits = model(*data)

            loss = criterion(logits, *target)

            pred = logits.sigmoid().sum(1).detach().round()
            LOGITS.append(logits)
            PREDS.append(pred)
            TARGETS.append(*target.sum(1))

            val_loss.append(loss.detach().cpu().numpy())
        val_loss = np.mean(val_loss)

    LOGITS = torch.cat(LOGITS).cpu().numpy()
    PREDS = torch.cat(PREDS).cpu().numpy()
    TARGETS = torch.cat(TARGETS).cpu().numpy()
    acc = (PREDS == TARGETS).mean() * 100.
    
    qwk = cohen_kappa_score(PREDS, TARGETS, weights='quadratic')
    qwk_k = cohen_kappa_score(PREDS[df_valid['data_provider'] == 'karolinska'], df_valid[df_valid['data_provider'] == 'karolinska'].isup_grade.values, weights='quadratic')
    qwk_r = cohen_kappa_score(PREDS[df_valid['data_provider'] == 'radboud'], df_valid[df_valid['data_provider'] == 'radboud'].isup_grade.values, weights='quadratic')
    print('qwk', qwk, 'qwk_k', qwk_k, 'qwk_r', qwk_r)

    if get_output:
        return LOGITS
    else:
        return val_loss, acc, qwk

    

# Create Dataloader & Model & Optimizer

In [None]:
# train_idx = np.where((df_train['fold'] != fold))[0]
# valid_idx = np.where((df_train['fold'] == fold))[0]

# df_this  = df_train.loc[train_idx]
# df_valid = df_train.loc[valid_idx]

# dataset_train = PANDADataset(df_this , image_size, n_tiles, transform=transforms_train)
# dataset_valid = PANDADataset(df_valid, image_size, n_tiles, transform=transforms_val)

# train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=batch_size, sampler=RandomSampler(dataset_train), num_workers=num_workers)
# valid_loader = torch.utils.data.DataLoader(dataset_valid, batch_size=batch_size, sampler=SequentialSampler(dataset_valid), num_workers=num_workers)

data = get_data(0)

In [None]:
train_loader = data.train_dl
valid_load = data.valid_dl

In [None]:
model = Model(enet_type, out_dim=out_dim)
model = model.to(device)

optimizer = optim.Adam(model.parameters(), lr=init_lr/warmup_factor)
scheduler_cosine = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, n_epochs-warmup_epo)
scheduler = GradualWarmupScheduler(optimizer, multiplier=warmup_factor, total_epoch=warmup_epo, after_scheduler=scheduler_cosine)

# Run Training

In [None]:

qwk_max = 0.
best_file = f'{kernel_type}_best_fold{fold}.pth'
for epoch in range(1, n_epochs+1):
    print(time.ctime(), 'Epoch:', epoch)
    scheduler.step()

    train_loss = train_epoch(train_loader, optimizer)
    val_loss, acc, qwk = val_epoch(valid_loader)

    content = time.ctime() + ' ' + f'Epoch {epoch}, lr: {optimizer.param_groups[0]["lr"]:.7f}, train loss: {np.mean(train_loss):.5f}, val loss: {np.mean(val_loss):.5f}, acc: {(acc):.5f}, qwk: {(qwk):.5f}'
    print(content)
    with open(f'log_{kernel_type}.txt', 'a') as appender:
        appender.write(content + '\n')

    if qwk > qwk_max:
        print('score2 ({:.6f} --> {:.6f}).  Saving model ...'.format(qwk_max, qwk))
        torch.save(model.state_dict(), best_file)
        qwk_max = qwk

torch.save(model.state_dict(), os.path.join(f'{kernel_type}_final_fold{fold}.pth'))

In [None]:
%debug

# My Local Train Log


```
Tue June 2 15:39:21 2020 Epoch 1, lr: 0.0000300, train loss: 0.42295, val loss: 0.29257, acc: 47.50471, qwk: 0.77941
Tue June 2 15:51:56 2020 Epoch 2, lr: 0.0003000, train loss: 0.34800, val loss: 0.48723, acc: 29.09605, qwk: 0.58493
Tue June 2 16:04:28 2020 Epoch 3, lr: 0.0003000, train loss: 0.29207, val loss: 0.27091, acc: 52.49529, qwk: 0.81714
Tue June 2 16:17:01 2020 Epoch 4, lr: 0.0002965, train loss: 0.26521, val loss: 0.26736, acc: 57.15631, qwk: 0.80364
Tue June 2 16:29:33 2020 Epoch 5, lr: 0.0002921, train loss: 0.24412, val loss: 0.24422, acc: 56.07345, qwk: 0.84068
Tue June 2 16:42:05 2020 Epoch 6, lr: 0.0002861, train loss: 0.23085, val loss: 0.25306, acc: 58.05085, qwk: 0.84429
Tue June 2 16:54:38 2020 Epoch 7, lr: 0.0002785, train loss: 0.21998, val loss: 0.21920, acc: 62.14689, qwk: 0.86278
Tue June 2 17:07:10 2020 Epoch 8, lr: 0.0002694, train loss: 0.21062, val loss: 0.23400, acc: 61.91149, qwk: 0.86170
Tue June 2 17:19:47 2020 Epoch 9, lr: 0.0002589, train loss: 0.20040, val loss: 0.27417, acc: 57.10923, qwk: 0.81771
Tue June 2 17:32:25 2020 Epoch 10, lr: 0.0002471, train loss: 0.18900, val loss: 0.26732, acc: 64.92467, qwk: 0.84131
Tue June 2 17:45:05 2020 Epoch 11, lr: 0.0002342, train loss: 0.18640, val loss: 0.21936, acc: 63.27684, qwk: 0.86580
Tue June 2 17:57:42 2020 Epoch 12, lr: 0.0002203, train loss: 0.17387, val loss: 0.22863, acc: 61.25235, qwk: 0.86871
Tue June 2 18:10:23 2020 Epoch 13, lr: 0.0002055, train loss: 0.16491, val loss: 0.23071, acc: 66.85499, qwk: 0.87892
Tue June 2 18:23:00 2020 Epoch 14, lr: 0.0001901, train loss: 0.15448, val loss: 0.24338, acc: 68.45574, qwk: 0.87342
Tue June 2 18:35:39 2020 Epoch 15, lr: 0.0001743, train loss: 0.14536, val loss: 0.22043, acc: 65.11299, qwk: 0.87169
Tue June 2 18:48:18 2020 Epoch 16, lr: 0.0001581, train loss: 0.13918, val loss: 0.22007, acc: 67.65537, qwk: 0.88284
Tue June 2 19:00:55 2020 Epoch 17, lr: 0.0001419, train loss: 0.13121, val loss: 0.24287, acc: 66.71375, qwk: 0.86357
Tue June 2 19:13:35 2020 Epoch 18, lr: 0.0001257, train loss: 0.12249, val loss: 0.21583, acc: 66.80791, qwk: 0.88478
Tue June 2 19:26:14 2020 Epoch 19, lr: 0.0001099, train loss: 0.11325, val loss: 0.21401, acc: 71.13936, qwk: 0.89178
Tue June 2 19:38:55 2020 Epoch 20, lr: 0.0000945, train loss: 0.10602, val loss: 0.21250, acc: 70.00942, qwk: 0.89256
Tue June 2 19:51:32 2020 Epoch 21, lr: 0.0000797, train loss: 0.09965, val loss: 0.21149, acc: 70.33898, qwk: 0.89590
Tue June 2 20:03:59 2020 Epoch 22, lr: 0.0000658, train loss: 0.09425, val loss: 0.22203, acc: 70.76271, qwk: 0.89493
Tue June 2 20:16:28 2020 Epoch 23, lr: 0.0000529, train loss: 0.08843, val loss: 0.22948, acc: 71.70433, qwk: 0.89304
Tue June 2 20:28:56 2020 Epoch 24, lr: 0.0000411, train loss: 0.08448, val loss: 0.21200, acc: 71.18644, qwk: 0.89947
Tue June 2 20:41:25 2020 Epoch 25, lr: 0.0000306, train loss: 0.07898, val loss: 0.21873, acc: 72.55179, qwk: 0.90021
Tue June 2 20:53:53 2020 Epoch 26, lr: 0.0000215, train loss: 0.07369, val loss: 0.21842, acc: 72.64595, qwk: 0.90240
Tue June 2 21:06:20 2020 Epoch 27, lr: 0.0000139, train loss: 0.07264, val loss: 0.21501, acc: 73.21092, qwk: 0.90450
Tue June 2 21:18:49 2020 Epoch 28, lr: 0.0000079, train loss: 0.06950, val loss: 0.21616, acc: 73.35217, qwk: 0.90264
Tue June 2 21:31:16 2020 Epoch 29, lr: 0.0000035, train loss: 0.06787, val loss: 0.21195, acc: 73.11676, qwk: 0.90434
Tue June 2 21:43:43 2020 Epoch 30, lr: 0.0000009, train loss: 0.06801, val loss: 0.21014, acc: 73.11676, qwk: 0.90468
```

# Thank you for reading!