https://www.kaggle.com/c/jovian-pytorch-z2g/discussion/163666

3.For both experiments, I used a 10-fold cross-validation approach. This approach is taked from
multilabel-stratification-cv-and-ensemble.

In addition to having multiple labels in each image, the other challenge in this competition is the existence of rare classes and combinations of different classes.

One technique to deal with this is to guarantee a balanced spliting between training and validation set. The usual random `train_test_split` is not ideal in this case because you can end up putting rare cases in the validation set and your model will never learn about them. The stratification present in the `scikit-learn` is also not equipped to deal with multilabel targets. The library `scikit-multilearn` does exactly that.

Update 1: in the previous example I've just showed how to create the splitted dataframe. This is not much help if you are not used to create datasets in Pytorch. In this version I show how to use this in conjunction with the Advanced Transfer Learning Notebook

In [None]:
import os
import gc
import time
import copy
from pathlib import Path
import multiprocessing as mp
import random
import warnings
warnings.filterwarnings("ignore")

import cv2
import pandas as pd
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm



import torch
import torchvision.models as models
from torch.utils.data import Dataset, random_split, DataLoader
import torchvision.transforms as T
from sklearn.metrics import f1_score
import torch.nn.functional as F
import torch.nn as nn
from torchvision.utils import make_grid
from skmultilearn.model_selection import IterativeStratification
%matplotlib inline

      
ROOT = Path('/kaggle/input/jovian-pytorch-z2g/')
DIR = ROOT / 'Human protein atlas'
TRAIN = DIR / 'train'
TEST = DIR / 'test'
batch_size = 64
size = 256
nfolds = 5
threshold = 0.3
SEED = 2020

## Helper Functions

In [None]:
def show_sample(img, target, invert=True):
    if invert:
        plt.imshow(1 - img.permute((1, 2, 0)))
    else:
        plt.imshow(img.permute(1, 2, 0))
    print('Labels:', decode_target(target, text_labels=True))
    
def show_batch(dl, invert=True):
    for images, labels in dl:
        fig, ax = plt.subplots(figsize=(16, 8))
        ax.set_xticks([]); ax.set_yticks([])
        data = 1-images if invert else images
        ax.imshow(make_grid(data, nrow=16).permute(1, 2, 0))
        break

def F_score(output, label, threshold=0.5, beta=1):
    prob = output > threshold
    label = label > threshold

    TP = (prob & label).sum(1).float()
    TN = ((~prob) & (~label)).sum(1).float()
    FP = (prob & (~label)).sum(1).float()
    FN = ((~prob) & label).sum(1).float()

    precision = torch.mean(TP / (TP + FP + 1e-12))
    recall = torch.mean(TP / (TP + FN + 1e-12))
    F2 = (1 + beta**2) * precision * recall / (beta**2 * precision + recall + 1e-12)
    return F2.mean(0)

def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')

def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)
    
class MultilabelImageClassificationBase(nn.Module):

    def training_step(self, batch):
        images, targets = batch 
        out = self(images)                      
        loss = F.binary_cross_entropy(out, targets)      
        return loss
    
    def validation_step(self, batch):
        images, targets = batch 
        out = self(images)                           # Generate predictions
        loss = F.binary_cross_entropy(out, targets)  # Calculate loss
        score = F_score(out, targets)
        return {'val_loss': loss.detach(), 'val_score': score.detach() }
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_scores = [x['val_score'] for x in outputs]
        epoch_score = torch.stack(batch_scores).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_score': epoch_score.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.4f}, train_loss: {:.4f}, val_loss: {:.4f}, val_score: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_score']))

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

def encode_label(label):
    target = torch.zeros(10)
    for l in str(label).split(' '):
        target[int(l)] = 1.
    return target

def decode_target(target, text_labels=False, threshold=0.5):
    result = []
    for i, x in enumerate(target):
        if (x >= threshold):
            if text_labels:
                result.append(labels[i] + "(" + str(i) + ")")
            else:
                result.append(str(i))
    return ' '.join(result)

seed_everything(SEED)

In [None]:
df = pd.read_csv(DIR / 'train.csv').set_index("Image").sort_index()
submission = pd.read_csv(ROOT / 'submission.csv') # Don't change the order in the submission
DEVICE = get_default_device()

In [None]:
train_images = {int(x.stem): x for x in TRAIN.iterdir() if x.suffix == '.png'}
test_images = {int(x.stem): x for x in TEST.iterdir() if x.suffix == '.png'}

In [None]:
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

seed_everything(SEED)

In [None]:
def encode_label(label):
    target = torch.zeros(10)
    for l in str(label).split(' '):
        target[int(l)] = 1.
    return target

def decode_target(target, text_labels=False, threshold=0.5):
    result = []
    for i, x in enumerate(target):
        if (x >= threshold):
            if text_labels:
                result.append(labels[i] + "(" + str(i) + ")")
            else:
                result.append(str(i))
    return ' '.join(result)

In [None]:
labels = {
    0: 'Mitochondria',
    1: 'Nuclear bodies',
    2: 'Nucleoli',
    3: 'Golgi apparatus',
    4: 'Nucleoplasm',
    5: 'Nucleoli fibrillar center',
    6: 'Cytosol',
    7: 'Plasma membrane',
    8: 'Centrosome',
    9: 'Nuclear speckles'
}

In [None]:
indexes = {v:k for k,v in labels.items()}

## Count the distribution of combinations

In [None]:
df.Label.value_counts().tail(10)

1 9        2
8 6 1      2
6 3 2 7    2
8 6 5      1
0 5 4 7    1
8 3 5 4    1
6 0 2 4    1
8 9 7      1
8 6 9      1
8 6 0      1
Name: Label, dtype: int64

We can see we have combinations with only one example, this is certainly harder for our model to generalize.

## Split the label strings

In [None]:
df['Label'] = df.Label.str.split(" ") ; df.head()

Unnamed: 0_level_0,Label
Image,Unnamed: 1_level_1
0,"[9, 4, 7]"
1,"[3, 2, 4]"
3,[5]
4,"[3, 4]"
6,[4]


## Expand the labels

In [None]:
df = df.explode('Label') ; df.head(10)

Unnamed: 0_level_0,Label
Image,Unnamed: 1_level_1
0,9
0,4
0,7
1,3
1,2
1,4
3,5
4,3
4,4
6,4


## Count the distributions of individual classes

In [None]:
df.Label.value_counts()

4    9066
6    5711
7    2629
2    2542
0    2088
3    1977
1    1752
9    1278
5    1109
8    1037
Name: Label, dtype: int64

## Turn labels into one-hot encoding columns

In [None]:
df = pd.get_dummies(df) ; df.head()

Unnamed: 0_level_0,Label_0,Label_1,Label_2,Label_3,Label_4,Label_5,Label_6,Label_7,Label_8,Label_9
Image,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0,0,0,0,0,0,0,0,0,1
0,0,0,0,0,1,0,0,0,0,0
0,0,0,0,0,0,0,0,1,0,0
1,0,0,0,1,0,0,0,0,0,0
1,0,0,1,0,0,0,0,0,0,0


## Combine the vectors of the same index and sum them

In [None]:
df = df.groupby(df.index).sum() ; df.head()

Unnamed: 0_level_0,Label_0,Label_1,Label_2,Label_3,Label_4,Label_5,Label_6,Label_7,Label_8,Label_9
Image,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0,0,0,0,1,0,0,1,0,1
1,0,0,1,1,1,0,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0
4,0,0,0,1,1,0,0,0,0,0
6,0,0,0,0,1,0,0,0,0,0


In [None]:
df.columns = labels.keys() ; df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9
Image,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0,0,0,0,1,0,0,1,0,1
1,0,0,1,1,1,0,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0
4,0,0,0,1,1,0,0,0,0,0
6,0,0,0,0,1,0,0,0,0,0


Turn the index (image names) and columns (one-hot target) into np.arrays to feed the stratification algorithm

In [None]:
X, y = df.index.values, df.values

In [None]:
k_fold = IterativeStratification(n_splits=nfolds, order=2)

splits = list(k_fold.split(X, y))

In the previous example I've used the `order=1` option. Reading the documentation it says is better advised to use higher orders for the model to sample with replacement for more rare classes. Experiment with these values

In [None]:
splits[0][0].shape , splits[0][1].shape

((15400,), (3836,))

Now we have a list with 5 arrays to index and split our dataset. Each array has 2 dimensions, the 1st dimension are the indices of our training set ( 80% of the data ) and the second dimension are the indices of our validation set (20% of the data). A better way to index our data frame is to create a new column in our DataFrame with the split for that fold.

In [None]:
fold_splits = np.zeros(df.shape[0]).astype(np.int)

for i in range(nfolds):
    fold_splits[splits[i][1]] = i

df['Split'] = fold_splits

df.head(10)

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,Split
Image,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,0,0,0,0,1,0,0,1,0,1,0
1,0,0,1,1,1,0,0,0,0,0,2
3,0,0,0,0,0,1,0,0,0,0,3
4,0,0,0,1,1,0,0,0,0,0,2
6,0,0,0,0,1,0,0,0,0,0,3
7,0,0,0,0,1,0,0,0,0,0,2
8,1,1,0,0,0,0,0,0,0,0,2
11,0,0,0,0,1,0,1,0,0,0,2
12,0,0,0,0,1,0,1,0,0,0,0
13,0,0,0,0,0,0,0,0,0,1,1


So, for example for our `fold=0`, all the examples with `Split == 0` is our validation set, all the other are our training set for that fold.

In [None]:
fold = 0

train_df = df[df.Split != fold]
val_df = df[df.Split == fold]

In [None]:
train_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,Split
Image,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,0,1,1,1,0,0,0,0,0,2
3,0,0,0,0,0,1,0,0,0,0,3
4,0,0,0,1,1,0,0,0,0,0,2
6,0,0,0,0,1,0,0,0,0,0,3
7,0,0,0,0,1,0,0,0,0,0,2


In [None]:
val_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,Split
Image,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,0,0,0,0,1,0,0,1,0,1,0
12,0,0,0,0,1,0,1,0,0,0,0
20,0,0,1,0,0,0,0,1,0,0,0
30,0,0,1,0,1,0,1,0,0,0,0
40,0,0,0,0,1,0,0,0,0,0,0


The following decoded dataframes are for visualization purposes only. We will pass the above dataframes with one-hot encoded labels to our models.

In [None]:
decoded_train_df = pd.DataFrame({'Label' : list(map(decode_target, train_df.values))}, index=train_df.index)
decoded_val_df = pd.DataFrame({'Label' : list(map(decode_target, val_df.values))}, index=val_df.index)

In [None]:
decoded_train_df.Label.value_counts().tail(10)

1 4 6 7 10    2
4 7 9 10      1
2 3 6 7 10    1
5 6 8 10      1
1 2 7 10      1
0 2 4 6 10    1
0 6 8 10      1
3 4 5 8 10    1
3 4 6 8 10    1
1 9 10        1
Name: Label, dtype: int64

In [None]:
decoded_val_df.Label.value_counts().tail(10)

2 4 5      1
7 8 9      1
3 4 6 8    1
0 4 7      1
3 5        1
6 8 9      1
4 5 8      1
3 7 9      1
2 4 8      1
1 2 7      1
Name: Label, dtype: int64

## How to use this for cross-validation and training
The following function organizes the code above return a list with `nfolds` where each item is a tuple with the `train_df` and `val_df` for that fold

In [None]:
def create_split_df(nfolds=5, order=1):

    df = pd.read_csv(DIR / 'train.csv').set_index("Image")

    submission = pd.read_csv(ROOT / 'submission.csv')

    split_df = pd.get_dummies(df.Label.str.split(" ").explode())

    split_df = split_df.groupby(split_df.index).sum() 

    X, y = split_df.index.values, split_df.values

    k_fold = IterativeStratification(n_splits=nfolds, order=order)

    splits = list(k_fold.split(X, y))

    fold_splits = np.zeros(df.shape[0]).astype(np.int)

    for i in range(nfolds):
        fold_splits[splits[i][1]] = i

    split_df['Split'] = fold_splits    

    df_folds = []

    for fold in range(nfolds):

        df_fold = split_df.copy()
            
        train_df = df_fold[df_fold.Split != fold].drop('Split', axis=1).reset_index()
        
        val_df = df_fold[df_fold.Split == fold].drop('Split', axis=1).reset_index()
        
        df_folds.append((train_df, val_df))

    return df_folds

In [None]:
splits = create_split_df(5, order=2)

## Statistics of the DataSet
To normalize your Dataset you can use the Imagenet Statistics or another way is to calculate the stats of your current images, train + test and instead use those. The following snippet of code does these. 
Uncomment if you want to try it yourself but I already provided the values. You can see the values are very different from imagenet. Experiment with both!

In [None]:
#train_set = set(TRAIN.iterdir())
#test_set = set(TEST.iterdir())
#whole_set = train_set.union(test_set)

#x_tot, x2_tot = [], []
#for file in tqdm(whole_set):
#    img = cv2.imread(str(file), cv2.COLOR_RGB2BGR)
#    img = img/255.0
#    x_tot.append(img.reshape(-1, 3).mean(0))
#    x2_tot.append((img**2).reshape(-1, 3).mean(0))

#image stats
#img_avr =  np.array(x_tot).mean(0)
#img_std =  np.sqrt(np.array(x2_tot).mean(0) - img_avr**2)
#print('mean:',img_avr, ', std:', np.sqrt(img_std))
#mean = torch.as_tensor(x_tot)
#std =torch.as_tensor(x2_tot)

In [None]:
mean = torch.tensor([0.05438065, 0.05291743, 0.07920227])
std = torch.tensor([0.39414383, 0.33547948, 0.38544176])
imagenet_mean = torch.tensor([0.485, 0.456, 0.406])
imagenet_std = torch.tensor([0.229, 0.224, 0.225])

The following is and adaptation from [Advanced Transfer Learning Starter Notebook](https://www.kaggle.com/aakashns/advanced-transfer-learning-starter-notebook) using the Stratified Splits, Cross Validation and saving the best model weights per fold.

In [None]:

train_tfms = T.Compose([
    T.Resize(size),
    T.RandomHorizontalFlip(), 
    T.RandomRotation(90), # Since the images are squares I experimented with 90º Rotation
    T.ToTensor(), 
    T.Normalize(mean, std, inplace=True), 
    T.RandomErasing(inplace=True)
])

valid_tfms = T.Compose([
    T.Resize(size), 
    T.ToTensor(), 
    T.Normalize(mean, std)
])

In [None]:
class HumanProteinDataset(Dataset):
    def __init__(self, df, transform=None, is_test=False):
        self.df = df
        self.transform = transform
        self.files = test_images if is_test else train_images
        
    def __len__(self):
        return len(self.df)    
    
    def __getitem__(self, idx):
        row = self.df.loc[idx]
        img_id, img_label = int(row['Image']), row.drop('Image').values.astype(np.float32)
        img = self.files[img_id] 
        img = Image.open(img)
        if self.transform:
            img = self.transform(img)
        return img, img_label

In [None]:
class Proteinmodel(MultilabelImageClassificationBase):
    def __init__(self, encoder):
        super().__init__()
        # Use a pretrained model
        self.network = encoder(pretrained=True)
        # Replace last layer
        num_ftrs = self.network.fc.in_features
        self.network.fc = nn.Linear(num_ftrs, 10)
    
    def forward(self, xb):
        return torch.sigmoid(self.network(xb))
    
    def freeze(self):
        # To freeze the residual layers
        for param in self.network.parameters():
            param.require_grad = False
        for param in self.network.fc.parameters():
            param.require_grad = True
    
    def unfreeze(self):
        # Unfreeze all layers
        for param in self.network.parameters():
            param.require_grad = True

In [None]:
def get_split_dataloaders(split):
    train_df, val_df = split
    
    train_ds = HumanProteinDataset(train_df, transform=train_tfms)
    val_ds = HumanProteinDataset(val_df, transform=valid_tfms)
    
    train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=mp.cpu_count(), pin_memory=True)
    val_dl = DataLoader(val_ds, batch_size*2, num_workers=mp.cpu_count(), pin_memory=True)
    
    
    train_dl = DeviceDataLoader(train_dl, DEVICE)
    val_dl = DeviceDataLoader(val_dl, DEVICE)
    
    return train_dl, val_dl

In [None]:
def get_test_dl():
    test_ds = HumanProteinDataset(submission, transform=valid_tfms, is_test=True)
    test_dl = DataLoader(test_ds, batch_size*2, num_workers=mp.cpu_count(), pin_memory=True)
    return DeviceDataLoader(test_dl, DEVICE)

In [None]:
@torch.no_grad()
def evaluate(model, val_loader):
    model.eval()
    outputs = [model.validation_step(batch) for batch in tqdm(val_loader)]
    return model.validation_epoch_end(outputs)

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, 
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD, save_best='val_loss'):
    
    since = time.time()
    
    torch.cuda.empty_cache()
    history = []
    
    # Set up cutom optimizer with weight decay
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    # Set up one-cycle learning rate scheduler
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, 
                                                steps_per_epoch=len(train_loader))
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss, best_score = 1e4, 0.0
    
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        lrs = []
        for batch in tqdm(train_loader):
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            
            # Gradient clipping
            if grad_clip: 
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            
            optimizer.step()
            optimizer.zero_grad()
            
            # Record & update learning rate
            lrs.append(get_lr(optimizer))
            sched.step()
        
        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        
        if result['val_loss'] < best_loss:   
            best_loss = result['val_loss']
            if save_best == 'val_loss':
                best_model_wts = copy.deepcopy(model.state_dict())
        
            
        if result['val_score'] > best_score:
            best_score = result['val_score']                   
            if save_best == 'val_score':            
                best_model_wts = copy.deepcopy(model.state_dict())          
        
        history.append(result)
        
    time_elapsed = time.time() - since
    
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    
    print(f'Best val Score: {best_score:4f}')
    
    print(f'Best val loss: {best_loss:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
        
    
    return model, history

In [None]:
def predict_single(image):
    xb = image.unsqueeze(0)
    xb = to_device(xb, device)
    preds = model(xb)
    prediction = preds[0]
    print("Prediction: ", prediction)
    show_sample(image, prediction)
    
@torch.no_grad()
def predict_dl(dl, model):
    torch.cuda.empty_cache()
    batch_probs = []
    for xb, _ in tqdm(dl):
        probs = model(xb)
        batch_probs.append(probs.cpu().detach())
    batch_probs = torch.cat(batch_probs)
    return batch_probs

In [None]:
device = get_default_device()

In [None]:
max_lr = 0.01
grad_clip = 0.1
weight_decay = 1e-4
opt_func = torch.optim.Adam


histories = []
predictions = []

test_dl = get_test_dl()

since = time.time()


for i, split in enumerate(splits):
    
    history = []
    
    train_dl, val_dl = get_split_dataloaders(split)
    
    # initialize parameters of model to train each fold from scratch and not leak info from different folds
    model = to_device(Proteinmodel(models.resnet50), device)
    
    model.freeze()    
    model, hist = fit_one_cycle(6, max_lr, model, train_dl, val_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
    
    history += hist
    
    model.unfreeze()   
    model, hist  = fit_one_cycle(4, max_lr/10, model, train_dl, val_dl, 
                             grad_clip=grad_clip, 
                             weight_decay=weight_decay, 
                             opt_func=opt_func)
    
    history += hist
    
    test_preds = predict_dl(test_dl, model)
    
    predictions.append(test_preds)
    
    del model
    
    gc.collect()
    
print(f'Total Training time: {(time.time() - since)/60:.2f} minutes')

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/checkpoints/resnet50-19c8e357.pth


HBox(children=(FloatProgress(value=0.0, max=102502400.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [0], last_lr: 0.0060, train_loss: 0.3148, val_loss: 0.6879, val_score: 0.3748


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [1], last_lr: 0.0099, train_loss: 0.3126, val_loss: 0.5967, val_score: 0.3785


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [2], last_lr: 0.0081, train_loss: 0.3062, val_loss: 0.4109, val_score: 0.4545


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [3], last_lr: 0.0046, train_loss: 0.2893, val_loss: 0.3559, val_score: 0.4259


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [4], last_lr: 0.0013, train_loss: 0.2707, val_loss: 0.2777, val_score: 0.5244


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [5], last_lr: 0.0000, train_loss: 0.2522, val_loss: 0.2429, val_score: 0.5961
Training complete in 24m 49s
Best val Score: 0.596129
Best val loss: 0.242945


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [0], last_lr: 0.0009, train_loss: 0.2497, val_loss: 0.2472, val_score: 0.5822


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [1], last_lr: 0.0008, train_loss: 0.2497, val_loss: 0.2562, val_score: 0.5793


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [2], last_lr: 0.0003, train_loss: 0.2433, val_loss: 0.2417, val_score: 0.6048


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [3], last_lr: 0.0000, train_loss: 0.2381, val_loss: 0.2311, val_score: 0.6220
Training complete in 15m 40s
Best val Score: 0.621970
Best val loss: 0.231053


HBox(children=(FloatProgress(value=0.0, max=65.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [0], last_lr: 0.0060, train_loss: 0.3205, val_loss: 0.4015, val_score: 0.3860


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [1], last_lr: 0.0099, train_loss: 0.3149, val_loss: 0.4318, val_score: 0.2304


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [2], last_lr: 0.0081, train_loss: 0.3077, val_loss: 0.3235, val_score: 0.4274


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [3], last_lr: 0.0046, train_loss: 0.2945, val_loss: 0.3544, val_score: 0.3722


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [4], last_lr: 0.0013, train_loss: 0.2750, val_loss: 0.2850, val_score: 0.5072


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [5], last_lr: 0.0000, train_loss: 0.2596, val_loss: 0.2556, val_score: 0.5656
Training complete in 22m 53s
Best val Score: 0.565605
Best val loss: 0.255635


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [0], last_lr: 0.0009, train_loss: 0.2552, val_loss: 0.2628, val_score: 0.5847


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [1], last_lr: 0.0008, train_loss: 0.2563, val_loss: 0.2579, val_score: 0.5680


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [2], last_lr: 0.0003, train_loss: 0.2508, val_loss: 0.2473, val_score: 0.5917


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [3], last_lr: 0.0000, train_loss: 0.2455, val_loss: 0.2442, val_score: 0.6024
Training complete in 15m 10s
Best val Score: 0.602405
Best val loss: 0.244168


HBox(children=(FloatProgress(value=0.0, max=65.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [0], last_lr: 0.0060, train_loss: 0.3172, val_loss: 0.3527, val_score: 0.4776


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [1], last_lr: 0.0099, train_loss: 0.3142, val_loss: 0.3441, val_score: 0.3723


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [2], last_lr: 0.0081, train_loss: 0.3067, val_loss: 0.3632, val_score: 0.4664


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [3], last_lr: 0.0046, train_loss: 0.2910, val_loss: 0.2945, val_score: 0.5510


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [4], last_lr: 0.0013, train_loss: 0.2709, val_loss: 0.2563, val_score: 0.6155


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [5], last_lr: 0.0000, train_loss: 0.2520, val_loss: 0.2436, val_score: 0.6120
Training complete in 22m 58s
Best val Score: 0.615482
Best val loss: 0.243591


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))


Epoch [2], last_lr: 0.0003, train_loss: 0.2427, val_loss: 0.2358, val_score: 0.6512


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [0], last_lr: 0.0060, train_loss: 0.3184, val_loss: 0.4678, val_score: 0.3867


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [1], last_lr: 0.0099, train_loss: 0.3139, val_loss: 2.1558, val_score: 0.1619


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [2], last_lr: 0.0081, train_loss: 0.3106, val_loss: 0.3028, val_score: 0.4463


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [3], last_lr: 0.0046, train_loss: 0.2973, val_loss: 0.2952, val_score: 0.4553


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [0], last_lr: 0.0060, train_loss: 0.3151, val_loss: 0.3492, val_score: 0.4570


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))


Epoch [2], last_lr: 0.0081, train_loss: 0.3037, val_loss: 0.3994, val_score: 0.4438


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))


Epoch [4], last_lr: 0.0013, train_loss: 0.2738, val_loss: 0.2574, val_score: 0.5686


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [1], last_lr: 0.0008, train_loss: 0.2520, val_loss: 0.2421, val_score: 0.6175


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [2], last_lr: 0.0003, train_loss: 0.2453, val_loss: 0.2301, val_score: 0.6296


HBox(children=(FloatProgress(value=0.0, max=241.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=31.0), HTML(value='')))


Epoch [3], last_lr: 0.0000, train_loss: 0.2377, val_loss: 0.2257, val_score: 0.6447
Training complete in 15m 44s
Best val Score: 0.644697
Best val loss: 0.225736


HBox(children=(FloatProgress(value=0.0, max=65.0), HTML(value='')))


Total Training time: 202.89 minutes


## Ensemble the models by averaging the predictions from each fold

In [None]:
prediction_cv = torch.stack(predictions).mean(axis=0)
decoded_predictions = test_preds > threshold
submission["Label"] = [decode_target(t.tolist()) for t in  decoded_predictions]
submission.to_csv("submission.csv", index=False)

In [None]:
submission.head(10)

Unnamed: 0,Image,Label
0,24117,4 9
1,15322,4
2,14546,6
3,8079,0 6
4,13192,4
5,25927,1 4
6,3372,0 3
7,21781,6
8,2847,4
9,16413,9


I've trained for a few epochs with image_size=256 just for the sharing purposes. Training for longer times with bigger images and tweaked hyperparameters can make your model a lot better.