In this notebook, we will try to tackle the PANDA competition dataset using fastai v2. A comprehensive EDA can be found in this [notebook](https://www.kaggle.com/tanulsingh077/prostate-cancer-in-depth-understanding-eda-model). The [solution](https://www.kaggle.com/iafoss/panda-concat-tile-pooling-starter-0-79-lb) uses concat tiling as proposed by @iafoss. The modified datasets we will use here also comes from @iafoss. We will also use tricks and models used by @DrHB in his [2nd place solution](https://github.com/DrHB/PANDA-2nd-place-solution/tree/main/train_drhb). 

References and resources

https://www.kaggle.com/iafoss/panda-concat-tile-pooling-starter-0-79-lb

https://www.kaggle.com/tanulsingh077/prostate-cancer-in-depth-understanding-eda-model

https://github.com/kentaroy47/Kaggle-PANDA-1st-place-solution/tree/master/src 

https://docs.google.com/presentation/d/1Ies4vnyVtW5U3XNDr_fom43ZJDIodu1SV6DSK8di6fs/edit#slide=id.g9b10629a30_0_299

https://github.com/DrHB/PANDA-2nd-place-solution/tree/main/train_drhb

# Imports and initial exploration

In [None]:
!pip install timm -q

In [None]:
from fastai.vision.all import *
import torch
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import cohen_kappa_score,confusion_matrix
from timm import create_model
import timm

In [None]:
path = Path('../input/prostate-cancer-grade-assessment')
path_img = Path('../input/panda-16x128x128-tiles-data/train')

In [None]:
sz = 128
N = 12
n_classes = 6
BS=32

In [None]:
df = pd.read_csv(path/'train.csv')

In [None]:
df.head(5)

In [None]:
len(df)

In [None]:
fns = []
for f in path_img.ls():
    fns.append(str(f).split('/')[-1].split('_')[0])

In [None]:
remove_fns = []
for f in list(df['image_id'].values):
    if f not in fns:
        remove_fns.append(f)

In [None]:
len(remove_fns)

In [None]:
df = df[~df['image_id'].isin(remove_fns)]

In [None]:
len(df)

In [None]:
N_FOLDS = 3
df['fold'] = -1

strat_kfold = StratifiedKFold(n_splits=N_FOLDS, random_state=42, shuffle=True)
for i, (_, test_index) in enumerate(strat_kfold.split(df.image_id.values, df['isup_grade'].values)):
    df.iloc[test_index, -1] = i
    
df['fold'] = df['fold'].astype('int')

# Preparing data and dataloader

The mean and std comes from @iafoss's notebook. 

In [None]:
mean = torch.tensor([1.0-0.90949707, 1.0-0.8188697, 1.0-0.87795304])
std = torch.tensor([0.36357649, 0.49984502, 0.40477625])

We write a function to open images and prepare them.

In [None]:
def open_image(fn):
    
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin
        x = PILImage.create(fn)

    x = torch.Tensor(np.array(x))
    x = x.permute(2,0,1).float()/255.0
    x = (1.0 - x) #invert image for zero padding plus normalize
    
    return x

The PandaImage class reverses modification and allows for the images to be displayed.

In [None]:
class PandaImage(fastuple):
    def show(self, ctx=None, **kwargs):
        img, label = self
        img = img.view(N,-1,3,sz,sz).permute(1,2,3,0,4).contiguous().view(3,-1,sz*N)
        img = 1- img
        img = img.permute(1,2,0)
        img = np.array(img*255).astype(np.uint8)
        
        return show_image(PILImage.create(img), title=label, ctx=ctx)

We make use of fastcore's Type Dispatch to make `show_batch` work.

In [None]:
@typedispatch
def show_batch(x:PandaImage, y, samples, ctxs=None, max_n=6, nrows=None, ncols=1, figsize=(20,20), **kwargs):
    if figsize is None: figsize = (ncols*6, max_n//ncols * 3)
    if ctxs is None: ctxs = get_grid(min(x[0].shape[0], max_n), nrows=None, ncols=ncols, figsize=figsize)
    for i,ctx in enumerate(ctxs): PandaImage(x[0][i], [x[1][i].item()]).show(ctx=ctx)

`PandaTransform` opens N tiles of a `image_id`. Then if it is a train dataset carries out data augmentation and stacks the N images to be used for preparing dataloader.   

In [None]:
class PandaTransform(Transform):
    def __init__(self, path_img, df, files, valid=False):
        self.files = files
        self.path_img = path_img
        self.df = df
        self.valid = valid
        self.tfms = aug_transforms(flip_vert=True, max_rotate=15, pad_mode='zeros')
        
    def encodes(self, i):
        files_i = self.files[i]
        label  = self.df[self.df['image_id'] == files_i]['isup_grade'].values
        
        fnames = [self.path_img/f'{files_i}_{i}.png' for i in range(N)]
        imgs = [open_image(fname) for fname in fnames]
        
        if not self.valid:
            aug_img = []
            for img in imgs: 
                for t in self.tfms:
                    img = t(img, split_idx=0)
                aug_img.append(img)

            aug_img = torch.stack(aug_img, 0)
            return (PandaImage(aug_img, label))
        
        else:
            return (PandaImage(torch.stack(imgs, 0), label))

In [None]:
train_fns = df[df['fold'] != 0]['image_id'].values
valid_fns = df[df['fold'] == 0]['image_id'].values

In [None]:
train_tl= TfmdLists(range(len(train_fns)), [PandaTransform(path_img, df, train_fns, valid=False)])
valid_tl= TfmdLists(range(len(valid_fns)), [PandaTransform(path_img, df, valid_fns, valid=True)])

In [None]:
dls = DataLoaders.from_dsets(train_tl, valid_tl, 
                             after_batch=[Normalize.from_stats(*(mean, std))], bs=BS)
dls = dls.cuda()

Looks like everything is working fine.

In [None]:
dls.show_batch()

In [None]:
x, y = dls.one_batch()

In [None]:
x.shape

# Model as used in DrHB's 2nd place solution

The following model and the tricks come from DrHB's solution in the competition. The model outputs n_classes plus one (in this case: 7) output. The extra output will be used in regression-based prediction. The `CustomEnd` that is atached to the end of the model will allow for the preparation of the output that will be fed to the `loss_function`. The model also make use of `SqueezeExcite` layer.

In [None]:
class CustomEnd(nn.Module):
    def __init__(self, scaler = SigmoidRange(-1, 6.0)):
        super().__init__()
        self.scaler_ = scaler
        
    def forward(self, x):
        classif = x[:, :-1]
        regress = self.scaler_ (x[:, -1])
        return classif, regress
    
def make_divisible(v, divisor=8, min_value=None):
    min_value = min_value or divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
   # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

def sigmoid(x, inplace: bool = False):
    return x.sigmoid_() if inplace else x.sigmoid()

class SqueezeExcite(nn.Module):
    def __init__(self, in_chs, se_ratio=0.25, reduced_base_chs=None,
             act_layer=nn.ReLU, gate_fn=sigmoid, divisor=1, **_):
        super(SqueezeExcite, self).__init__()
        self.gate_fn = gate_fn
        reduced_chs = make_divisible((reduced_base_chs or in_chs) * se_ratio, divisor)
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True)
        self.act1 = act_layer(inplace=True)
        self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True)
    def forward(self, x):
        x_se = self.avg_pool(x)
        x_se = self.conv_reduce(x_se)
        x_se = self.act1(x_se)
        x_se = self.conv_expand(x_se)
        x = x * self.gate_fn(x_se)
        return x

In [None]:
class DrHBModel(nn.Module):
    def __init__(self, N):
        super().__init__()
        self.N = N
        m = models.resnet34(pretrained=True)
        self.enc = nn.Sequential(*list(m.children())[:-2])       
        nc = list(m.children())[-1].in_features
        self.cb = SqueezeExcite(nc)
        self.head = nn.Sequential(AdaptiveConcatPool2d(),
                                  Flatten(),
                                  nn.Linear(2*nc,512),
                                  nn.ReLU(inplace=True),
                                  nn.Dropout(0.4),
                                  nn.Linear(512,7), 
                                  CustomEnd())
        
    def forward(self, x):
        shape = x.shape
        n = shape[1]
        x = x.view(-1,shape[2],shape[3],shape[4])
        x = self.enc(x)
        
        shape = x.shape
        x = x.view(-1, n, x.shape[1], x.shape[2], x.shape[3]).permute(0, 2, 1, 3, 4).contiguous().\
        view(-1, x.shape[1], x.shape[2] * n, x.shape[3])
        x = x.view(x.shape[0], x.shape[1], x.shape[2]//int(np.sqrt(n)), -1)
        x = self.cb(x)
        x = self.head(x)
        return x

In [None]:
drhbmodel= DrHBModel(N)

In [None]:
def drhbmodel_splitter(m): return L(m.enc, m.cb, m.head).map(params)

# Preparing timm-based model with DrHB's tricks in his 2nd place solution

The following section was written to make use of `timm` models.

In [None]:
def get_timm_vis_model(arch:str, pretrained=True, cut=None):
    model = create_model(arch, pretrained=pretrained)
    if cut is None:
        ll = list(enumerate(model.children()))
        cut = next(i for i,o in reversed(ll) if has_pool_type(o))
    model =  nn.Sequential(*list(model.children())[:cut])
    
    return model

In [None]:
class PandaModel(Module):
    def __init__(self, n_classes, arch, *args, **kwargs):
        self.vis_model = get_timm_vis_model(arch)
        self.cb = SqueezeExcite(num_features_model(self.vis_model))
        self.vis_head  = create_head(num_features_model(self.vis_model), n_classes+1)
        self.custom_end = CustomEnd()
                
    def forward(self, x):
        shape = x[0].shape
        n = shape[0]
        x = x.view(-1,shape[1],shape[2],shape[3]) 
        #x: bs*N x 3 x 128 x 128
        
        x = self.vis_model(x)
        shape = x.shape 
        #x: bs*N x C x 4 x 4
        #concatenate the output for tiles into a single map
        x = x.view(-1,n,shape[1],shape[2],shape[3]).permute(0,2,1,3,4).contiguous().view(-1,shape[1],shape[2]*n,shape[3])
        #x: bs x C x N*4 x 4
        x = self.vis_head(x)
        #x: bs x n
        x = self.custom_end(x)
        return x

In [None]:
pandamodel = PandaModel(n_classes, 'ssl_resnext50_32x4d')

In [None]:
def pandamodel_splitter(m): return L(m.vis_model, m.vis_head).map(params)

# Preparing Loss Functions and Metrics

In [None]:
class CustomCrossEntropy(nn.CrossEntropyLoss):
  
  def forward(self, input, target):
    #target = target.long()
    target = target.squeeze()
    return F.cross_entropy(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)

In [None]:
class CustomLoss(nn.Module):
    def __init__(self, loss_ce, loss_mse):
        super().__init__()
        self.loss_ce  = loss_ce
        self.loss_mse = loss_mse 
        
    def forward(self, i, o):
        loss_cross = self.loss_ce(i[0], o)
        loss_mserr = self.loss_mse(i[1], o.float().squeeze())
        return loss_cross + loss_mserr

In [None]:
def qkp_class(y_hat, y):
    y_hat = torch.argmax(F.softmax(y_hat[0], dim=1), dim=1)
    return torch.tensor(cohen_kappa_score(y_hat.cpu(), y.cpu(), weights='quadratic'), device='cuda:0')

def qkp_regres(y_hat, y):
    p = optR.predict(y_hat[1].cpu().numpy(), coefficients)
    return torch.tensor(cohen_kappa_score(p, y.cpu(), weights='quadratic'), device='cuda:0')

def qkp_combine(y_hat, y):
    return torch.tensor(cohen_kappa_score(torch.round((y_hat[1] + torch.argmax(F.softmax(y_hat[0], dim=1), dim=1))/2).cpu(), y.cpu(), weights='quadratic'),device='cuda:0')

In [None]:
class OptimizedRounder():
    def __init__(self):
        self.coef_ = 0

    def _kappa_loss(self, coef, X, y):
        X_p = np.copy(X)
        for i, pred in enumerate(X_p):
            if pred < coef[0]:
                X_p[i] = 0
            elif pred >= coef[0] and pred < coef[1]:
                X_p[i] = 1
            elif pred >= coef[1] and pred < coef[2]:
                X_p[i] = 2
            elif pred >= coef[2] and pred < coef[3]:
                X_p[i] = 3
            elif pred >= coef[3] and pred < coef[4]:
                X_p[i] = 4
            else:
                X_p[i] = 5

        ll = quadratic_weighted_kappa(y, X_p)
        return -ll

    def fit(self, X, y):
        loss_partial = partial(self._kappa_loss, X=X, y=y)
        initial_coef = [0.5, 1.5, 2.5, 3.5, 4.5]
        self.coef_ = sp.optimize.minimize(loss_partial, initial_coef, method='nelder-mead')

    def predict(self, X, coef):
        X_p = np.copy(X)
        for i, pred in enumerate(X_p):
            if pred < coef[0]:
                X_p[i] = 0
            elif pred >= coef[0] and pred < coef[1]:
                X_p[i] = 1
            elif pred >= coef[1] and pred < coef[2]:
                X_p[i] = 2
            elif pred >= coef[2] and pred < coef[3]:
                X_p[i] = 3
            elif pred >= coef[3] and pred < coef[4]:
                X_p[i] = 4
            else:
                X_p[i] = 5
        return X_p

    def coefficients(self):
        return self.coef_['x']

In [None]:
optR = OptimizedRounder()
coefficients = [0.5, 1.5, 2.5, 3.5, 4.5]

# Training DrHB Model

In [None]:
learn = Learner(dls, 
                drhbmodel, 
                loss_func=CustomLoss(CustomCrossEntropy(), nn.MSELoss()), 
                metrics=[qkp_class, qkp_regres, qkp_combine],
                splitter=drhbmodel_splitter).to_fp16()

In [None]:
learn.freeze()
learn.summary()

In [None]:
learn.fit_one_cycle(2, 1e-3)

In [None]:
learn.unfreeze()
learn.fit_one_cycle(3, 1e-3)

# Training timm Model

In [None]:
learn = Learner(dls, 
                pandamodel, 
                loss_func=CustomLoss(CustomCrossEntropy(), nn.MSELoss()), 
                metrics=[qkp_class, qkp_regres, qkp_combine],
                splitter=pandamodel_splitter).to_fp16()

In [None]:
learn.freeze()
learn.summary()

In [None]:
learn.fit_one_cycle(2, 1e-3)

In [None]:
learn.unfreeze()
learn.fit_one_cycle(3, 1e-3)