**finale update**

- In the below section, I was **wrong** about the official `EfficientNet`. The weird performace was due to the normalizing the data in data-loader; currently it's not needed becuase this network has built-in normalizatio layer. However, I'm not sure if this will be remained or removed in future. 

---

## About this kernel
- **Augmentations**: Advance augmentation like `CutMix`, `MixUp`, `FMix` are used. 
- **Data Generator**: A Keras custom data generator is implemented.
- **Training**: Previously We've used official `EfficientNet` models but due to their weird performance it's been changed. 
- **Training Mechansim**: A convininet OOP oriented training mechanism is implemented for K-Fold training and Out-of-fold validation and TTA over validation set.
- **Submission**: Test Time Augmentation is applied to each K-Fold splits and ended up with group Ensemble. For TTA `albumentation` library is used. 


## Logs
- **update 2**: Used un-official efficientnet implementation
- **update 1**: Added Symmetric Cross Entropy Loss for dealing with noisy labels 

Symmetric Cross Entropy Loss, [implementation](https://www.kaggle.com/c/cassava-leaf-disease-classification/discussion/208324). 

---

## Cassava Leaf Disease Classification

Hi,
This is a complete **GPU** baseline starter in `tf.keras` for this **Cassava Leaf Disease Classification** problem. Here `jpg` samples will be used for modeling in ene-to-end. Main approaches are:

```
- Stratify-KFold (K=5)
- Image Augmentation
    - General Augmentaiton (Albumentation)
    - CutMix 
    - MixUp
    - FMix
    - Mosaic (will not use, incomplete implementation)
- Image Modeling
    - EfficientNet (Modified) + Custom Top Layers (Attention Mechanisms) + Softmax 
- Training
    - 5 Fold Training 
    - OOF Evaluation: Optimizing Metrics - Out-of-Fold Weights Ensemble
- Inference
    - Ensemble
    - Test Time Augmentation
```

The implementations of `CutMix` and `MixUp` augmentation are taken from [Chris Deotte](https://www.kaggle.com/cdeotte) and integrated into a custom [tf.keras.utils.Sequence](https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) generator. For modeling, `EfficientNets` are typically used but feel free to change with other architecture in the config class. We've tried to integrate duel attention mechanism on top of the base model. Next,a loop training process is done over the number of folds. In inference time, the saved weights with the respected model architectures and input sizes is loaded and run **TTA** for each model followed by ensemble. 

In [None]:
!pip install /kaggle/input/kerasapplications -q
!pip install /kaggle/input/efficientnet-keras-source-code/ -q --no-deps

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from glob import glob
import albumentations as A 
from pylab import rcParams
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import os, gc, cv2, random, warnings, math, sys, json, pprint

# sklearn
from sklearn.utils import class_weight
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import accuracy_score, balanced_accuracy_score

# tf 
import tensorflow as tf
import efficientnet.tfkeras as efn
from tensorflow.keras import backend as K

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
warnings.simplefilter('ignore')

In [None]:
# helper function to plot sample 
def plot_imgs(dataset_show, row, col):
    rcParams['figure.figsize'] = 20,10
    for i in range(row):
        f, ax = plt.subplots(1,col)
        for p in range(col):
            idx = np.random.randint(0, len(dataset_show))
            img, label = dataset_show[idx]
            ax[p].grid(False)
            ax[p].imshow(img[0])
            ax[p].set_title(idx)
    plt.show()
    

def visulize(path, n_images, is_random=True, figsize=(16, 16)):
    plt.figure(figsize=figsize)
    
    w = int(n_images ** .5)
    h = math.ceil(n_images / w)
    
    image_names = os.listdir(path)
    for i in range(n_images):
        image_name = image_names[i]
        if is_random:
            image_name = random.choice(image_names)
            
        img = cv2.imread(os.path.join(path, image_name))
        plt.subplot(h, w, i + 1)
        plt.imshow(img)
        plt.xticks([])
        plt.yticks([])
    plt.show()

**Enabling Mixed Precision and Accelerated Linear Algebra**

In [None]:
MIXED_PRECISION = True
XLA_ACCELERATE  = False # Didn't work; Dunno Why!

GPUS = tf.config.experimental.list_physical_devices('GPU')
if GPUS:
    try:
        for GPU in GPUS:
            tf.config.experimental.set_memory_growth(GPU, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(GPUS), "Physical GPUs,", len(logical_gpus), "Logical GPUs") 
    except RuntimeError as  RE:
        print(RE)

if MIXED_PRECISION:
    policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
    tf.keras.mixed_precision.experimental.set_policy(policy)
    print('Mixed precision enabled')

if XLA_ACCELERATE:
    tf.config.optimizer.set_jit(True)
    print('Accelerated Linear Algebra enabled')
    
strategy = tf.distribute.get_strategy()
REPLICAS = strategy.num_replicas_in_sync
print(f'REPLICAS: {REPLICAS}') 
print("Tensorflow version " + tf.__version__)

**Basic Config**

In [None]:
class BaseConfig(object):
    SEED  = 101
    TRAIN_DF       = '../input/cassava-leaf-disease-classification/train.csv'
    TRAIN_IMG_PATH = '../input/cassava-leaf-disease-classification/train_images/'
    TEST_IMG_PATH  = '../input/cassava-leaf-disease-classification/test_images/'
    CLASS_MAP      = '../input/cassava-leaf-disease-classification/label_num_to_disease_map.json'

In [None]:
def seed_all(s):
    random.seed(s)
    np.random.seed(s)
    tf.random.set_seed(s)
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    os.environ['PYTHONHASHSEED'] = str(s) 
    
seed_all(BaseConfig.SEED)

**General Overview**

In [None]:
df = pd.read_csv(BaseConfig.TRAIN_DF)
assert df.shape[0] == len(df.image_id.unique()) , "NOT ALL ID UNIQUE"
print(df.info())
df.head()

**Significant Class Imbalance**

In [None]:
with open(os.path.join(BaseConfig.CLASS_MAP)) as file:
    pprint.pprint(json.loads(file.read()))

In [None]:
temp_df = df.copy()
temp_df[['CBB', 'CBSD', 
         'CGM', 'CMD', 'Healthy']] = pd.get_dummies(temp_df["label"])

fig = go.Figure(data=[go.Pie(labels=temp_df.columns[2:],values=temp_df.iloc[:, 2:].sum().values)])
fig.show()

del temp_df

**Displaying Samples**

In [None]:
visulize(BaseConfig.TRAIN_IMG_PATH, 9, is_random=True)

# Augmentation

The `albumentation` is primarily used. However, for more details augmentation, plese refer to my another work, [here](https://www.kaggle.com/ipythonx/tf-keras-sota-augmentation-in-sequence-generator). 

In [None]:
import albumentations as A 

# For Training 
def albu_transforms_train(data_resize): 
    return A.Compose([
            A.RandomResizedCrop(data_resize, data_resize),
            A.HorizontalFlip(p=0.5),
            A.VerticalFlip(p=0.5),
            A.ShiftScaleRotate(p=0.5),
            A.RandomRotate90(p=0.5),
            A.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit=0.2, 
                                 val_shift_limit=0.2, p=0.5),
            A.RandomBrightnessContrast(brightness_limit=(-0.1,0.1), 
                                       contrast_limit=(-0.1, 0.1), p=0.5),
            A.RandomShadow(shadow_roi=(0, 0.5, 1, 1), 
                           shadow_dimension=5, p=0.5),
            A.CoarseDropout(p=0.5),            
            A.ToFloat()
        ], p=1.)

# For Validation 
def albu_transforms_valid(data_resize): 
    return A.Compose([
            A.ToFloat(),
            A.CenterCrop(data_resize, data_resize, p=1.),
            A.Resize(data_resize, data_resize),
        ], p=1.)

# For Inference (TTA)
def albu_transforms_inference(data_resize): 
    return A.Compose([
            A.Transpose(p=0.5),
            A.VerticalFlip(p=0.5),
            A.HorizontalFlip(p=0.5),
            A.RandomRotate90(p=0.5),
            A.ShiftScaleRotate(p=0.5),
            A.HueSaturationValue(hue_shift_limit=0.2, 
                                 sat_shift_limit=0.2, 
                                 val_shift_limit=0.2, p=0.5),
            A.ToFloat(),
            A.RandomResizedCrop(data_resize, data_resize)
        ], p=1.)

**CutMix** Augmentation

In [None]:
def CutMix(image, label, DIM, PROBABILITY = 1.0):
    # input image - is a batch of images of size [n,dim,dim,3] not a single image of [dim,dim,3]
    # output - a batch of images with cutmix applied
    CLASSES = 5
    
    imgs = []; labs = []
    for j in range(len(image)):
        # DO CUTMIX WITH PROBABILITY DEFINED ABOVE
        P = tf.cast( tf.random.uniform([],0,1)<=PROBABILITY, tf.int32)
        
        # CHOOSE RANDOM IMAGE TO CUTMIX WITH
        k = tf.cast( tf.random.uniform([],0,len(image)),tf.int32)
        
        # CHOOSE RANDOM LOCATION
        x = tf.cast( tf.random.uniform([],0,DIM),tf.int32)
        y = tf.cast( tf.random.uniform([],0,DIM),tf.int32)
        
        b = tf.random.uniform([],0,1) # this is beta dist with alpha=1.0
        
        WIDTH = tf.cast( DIM * tf.math.sqrt(1-b),tf.int32) * P
        ya = tf.math.maximum(0,y-WIDTH//2)
        yb = tf.math.minimum(DIM,y+WIDTH//2)
        xa = tf.math.maximum(0,x-WIDTH//2)
        xb = tf.math.minimum(DIM,x+WIDTH//2)
        
        # MAKE CUTMIX IMAGE
        one = image[j,ya:yb,0:xa,:]
        two = image[k,ya:yb,xa:xb,:]
        three = image[j,ya:yb,xb:DIM,:]
        middle = tf.concat([one,two,three],axis=1)
        img = tf.concat([image[j,0:ya,:,:],middle,image[j,yb:DIM,:,:]],axis=0)
        imgs.append(img)
        
        # MAKE CUTMIX LABEL
        a = tf.cast(WIDTH*WIDTH/DIM/DIM,tf.float32)
        labs.append((1-a)*label[j] + a*label[k])
            
    # RESHAPE HACK SO TPU COMPILER KNOWS SHAPE OF OUTPUT TENSOR (maybe use Python typing instead?)
    image2 = tf.reshape(tf.stack(imgs),(len(image),DIM,DIM,3))
    label2 = tf.reshape(tf.stack(labs),(len(image),CLASSES))
    
    return image2,label2

**MixUp** Augmentation

In [None]:
def MixUp(image, label, DIM, PROBABILITY = 1.0):
    # input image - is a batch of images of size [n,dim,dim,3] not a single image of [dim,dim,3]
    # output - a batch of images with mixup applied
    CLASSES = 5
    
    imgs = []; labs = []
    for j in range(len(image)):
        # DO MIXUP WITH PROBABILITY DEFINED ABOVE
        P = tf.cast( tf.random.uniform([],0,1)<=PROBABILITY, tf.float32)
                   
        # CHOOSE RANDOM
        k = tf.cast( tf.random.uniform([],0,len(image)),tf.int32)
        a = tf.random.uniform([],0,1)*P # this is beta dist with alpha=1.0
                    
        # MAKE MIXUP IMAGE
        img1 = image[j,]
        img2 = image[k,]
        imgs.append((1-a)*img1 + a*img2)
                    
        # MAKE CUTMIX LABEL
        labs.append((1-a)*label[j] + a*label[k])
            
    # RESHAPE HACK SO TPU COMPILER KNOWS SHAPE OF OUTPUT TENSOR (maybe use Python typing instead?)
    image2 = tf.reshape(tf.stack(imgs),(len(image),DIM,DIM,3))
    label2 = tf.reshape(tf.stack(labs),(len(image),CLASSES))
    return image2,label2

**FMix** Augmentation

In [None]:
sys.path.insert(0, "/kaggle/input/pyutils")
from fmix_utils import sample_mask

def FMix(image, label, DIM,  alpha=1, decay_power=3, max_soft=0.0, reformulate=False):
    lam, mask = sample_mask(alpha, decay_power,(DIM, DIM), max_soft, reformulate)
    index = tf.constant(np.random.permutation(int(image.shape[0])))
    mask  = np.expand_dims(mask, -1)
    
    # samples 
    image1 = image * mask
    image2 = tf.gather(image, index) * (1 - mask)
    image3 = image1 + image2

    # labels
    label1 = label * lam 
    label2 = tf.gather(label, index) * (1 - lam)
    label3 = label1 + label2 
    return image3, label3

# Cassava Samples Generator

Here is a custom data generator. It's a standalone for training, validation and also inference. The image resizing and normalizing is done in the augmentation part, however. The randomness of choosing **cutmix** or **mixup** is arbitrary. 

In [None]:
class CassavaGenerator(tf.keras.utils.Sequence):
    def __init__(self, img_path, data, batch_size, random_state, 
                 dim, shuffle=True, transform=None, 
                 use_mixup=False, use_cutmix=False,
                 use_fmix=False, is_train=False):
        self.dim  = dim
        self.data = data
        self.random_state = random_state
        self.shuffle  = shuffle
        self.img_path = img_path
        self.is_train = is_train
        self.augment  = transform
        self.use_cutmix = use_cutmix
        self.use_mixup  = use_mixup
        self.use_fmix   = use_fmix 
        self.batch_size = batch_size
        self.list_idx   = self.data.index.values
        self.label = pd.get_dummies(self.data['label'], 
                                    columns = ['label']) if self.is_train else np.nan
        self.on_epoch_end()
        
    def __len__(self):
        return int(np.floor(len(self.list_idx) / self.batch_size))
    
    def __getitem__(self, index):
        batch_idx = self.indices[index*self.batch_size:(index+1)*self.batch_size]
        idx = [self.list_idx[k] for k in batch_idx]
        
        Data   = np.empty((self.batch_size, *self.dim))
        Target = np.empty((self.batch_size, 5), dtype = np.float32)

        for i, k in enumerate(idx):
            # load the image file using cv2
            image = cv2.imread(self.img_path + self.data['image_id'][k])
            image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
            
            res = self.augment(image=image)
            image = res['image']
            
            # assign 
            if self.is_train:
                Data[i,] =  image
                Target[i,] = self.label.iloc[k,].values
            else:
                Data[i,] =  image 
                
        if np.random.rand() > 0.2 and self.use_cutmix:
            Data, Target = CutMix(Data, Target, self.dim[0])
        elif np.random.rand() > 0.2 and self.use_mixup:
            Data, Target = MixUp(Data, Target, self.dim[0]) 
        elif np.random.rand() > 0.1 and self.use_fmix:
            Data, Target = FMix(Data, Target, self.dim[0])

        return Data, Target if self.is_train else Data
    
    def on_epoch_end(self):
        self.indices = np.arange(len(self.list_idx))
        if self.shuffle:
            np.random.seed(self.random_state)
            np.random.shuffle(self.indices)

**Sanity Checks**

In [None]:
check_gens = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                              df, 20, 1234, (128, 128, 3),
                              shuffle = True, is_train = True, 
                              use_mixup = False, use_cutmix = False, 
                              use_fmix = False,transform = albu_transforms_train(128))

print('Only Albumentation')
plot_imgs(check_gens, 2, 4)

**With CutMix** + `np.random.rand() > 0.7`

In [None]:
check_gens = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                              df, 20,1234,(128, 128, 3),
                              shuffle = True, is_train = True, 
                              use_mixup = False, use_cutmix = True, 
                              use_fmix = False, transform = albu_transforms_train(128))

plot_imgs(check_gens, 2, 4)

**With MixUp** + `np.random.rand() > 0.4`

In [None]:
check_gens = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                              df, 20, 1234, (128, 128, 3),
                              shuffle = True, is_train = True, 
                              use_mixup = True, use_cutmix = False,
                              use_fmix = False, transform = albu_transforms_train(128))

plot_imgs(check_gens, 2, 4)

**With FMix** + `np.random.rand() > 0.1`

In [None]:
check_gens = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                              df, 20, 1234, (128, 128, 3),
                              shuffle = True, is_train = True, 
                              use_mixup = False, use_cutmix = False,
                              use_fmix = True, transform = albu_transforms_train(128))

plot_imgs(check_gens, 2, 4)

**Train Config**

A convenient way to experiment with different fold. 

In [None]:
class TrainConfig(BaseConfig):
    NUM_CLASS    = 5
    EPOCH        = 25
    TTA          = 3 
    FOLDS        = 3 
    VERBOSITY    = 1
    WORKERS      = 2
    LABEL_SMOOTH = 0
    MULTIPROCESS = False  # Didn't work as expected 
    CLS_BLN_WGS  = False  # Didn't work as expected 

    # randomly applied
    MixUp  = False
    CutMix = False
    FMix   = False
    
    # Learning Rate for each fold
    LR_RATE = {
        '0' : 1e-3,  # learning rate in fold 0
        '1' : 1e-3,  # learning rate in fold 1
        '2' : 1e-3,  # learning rate in fold 2
        '3' : 1e-3,  # learning rate in fold 3
        '4' : 1e-3   # learning rate in fold 4
    }
    
    # Batch Size for each fold 
    BATCH_SIZE = {
        '0' : 64,  # batch size in fold 0
        '1' : 86,  # batch size in fold 1 
        '2' : 86,  # batch size in fold 2
        '3' : 86,  # batch size in fold 3
        '4' : 86   # batch size in fold 4
    }
    
    # Base Networks for each fold
    IMG_SIZE = {
        '0': 260, # size of the image in fold 0
        '1': 260, # size of the image in fold 1
        '2': 260, # size of the image in fold 2
        '3': 260, # size of the image in fold 3
        '4': 260  # size of the image in fold 4
    } 
    
    BASE_NETS = {
        '0' : [efn.EfficientNetB3, 
               '../input/efficientnet-keras-noisystudent-weights-b0b7/efficientnet-b3_noisy-student_notop.h5'],
        '1' : [efn.EfficientNetB1, 
               '../input/efficientnet-keras-noisystudent-weights-b0b7/efficientnet-b1_noisy-student_notop.h5'],
        '2' : [efn.EfficientNetB2, 
               '../input/efficientnet-keras-noisystudent-weights-b0b7/efficientnet-b2_noisy-student_notop.h5'],
        '3' : [efn.EfficientNetB0, 
               '../input/efficientnet-keras-noisystudent-weights-b0b7/efficientnet-b0_noisy-student_notop.h5'],
        '4' : [efn.EfficientNetB1, 
               '../input/efficientnet-keras-noisystudent-weights-b0b7/efficientnet-b1_noisy-student_notop.h5'],
    }

# Image Modeling


Let's build the whole model. Here we're using subclassed api. But this can be easily implemented with functional api too. However, in case you're new to subclassing api and wants to learn, [here](https://www.linkedin.com/pulse/model-sub-classing-custom-training-loop-from-scratch-tensorflow/?trackingId=z7MZlI8kTTeKPINnkak5Ag%3D%3D) are our recent article where we comprehensively demonstrate model subclassing and custom training loop from scratch in `tensorflow 2.x`. 

In [None]:
class CassavaClassifier(tf.keras.Model):
    def __init__(self, dim, base_efnet):
        super(CassavaClassifier, self).__init__()
        # Layer of Block
        self.Base  = base_efnet[0](input_shape=dim,include_top = False, 
                                   weights=base_efnet[1])
        # Keras Built-in
        self.GAP = tf.keras.layers.GlobalAveragePooling2D()
        # Tail
        self.DENS = tf.keras.layers.Dense(512, activation=tf.nn.relu)
        self.DROP = tf.keras.layers.Dropout(0.5)
        self.OUT  = tf.keras.layers.Dense(5, activation='softmax')
    
    def call(self, input_tensor, training=False):
        # Base Inputs
        x = self.Base(input_tensor)
        # Global Weighted Average Poolin
        x = self.GAP(x)
        x = self.DENS(x)
        x = self.DROP(x)
        return self.OUT(x)
    
    # AFAIK: The most convenient method to print model.summary() 
    # in suclassed model
    def build_graph(self):
        x = tf.keras.layers.Input(shape=(TrainConfig.IMG_SIZE['0'],
                                         TrainConfig.IMG_SIZE['0'], 3))
        return tf.keras.Model(inputs=[x], 
                              outputs=self.call(x))

# Stratified KFolding

In [None]:
skf = StratifiedKFold(n_splits=TrainConfig.FOLDS, shuffle=True, random_state=BaseConfig.SEED)
    
for fold, (trn_idx, val_idx) in enumerate(skf.split(np.arange(df.shape[0]), df.label.values)):
    df.loc[df.iloc[val_idx].index, 'fold'] = fold
    
print('Class distribution per fold.\n', df.groupby('fold')['label'].value_counts())

# Training Mechanism

In [None]:
# some placeholder for oof 
oof_pred = []; oof_tar = []; oof_names = []; oof_folds = []; oof_val = [] 

# compute the extact batch size and step, mainly for validation step
def count_data_items(length, b_max):
    batch_size = sorted([int(length/n) for n in range(1, length+1) \
                         if length % n == 0 and length/n <= b_max], reverse=True)[0]  
    steps  = length / batch_size 
    return batch_size, steps

# learing rate sched
def get_lr_callback(batch_size=8):
    lr_start   = 0.000005
    lr_max     = 0.00000125 * REPLICAS * batch_size
    lr_min     = 0.000001
    lr_ramp_ep = 5
    lr_sus_ep  = 0
    lr_decay   = 0.8
   
    def lrfn(epoch):
        if epoch < lr_ramp_ep:
            lr = (lr_max - lr_start) / lr_ramp_ep * epoch + lr_start
            
        elif epoch < lr_ramp_ep + lr_sus_ep:
            lr = lr_max
            
        else:
            lr = (lr_max - lr_min) * lr_decay**(epoch - lr_ramp_ep - lr_sus_ep) + lr_min
  
        return lr

    return tf.keras.callbacks.LearningRateScheduler(lrfn, verbose=False)

In [None]:
# Symetric Cross Entropy loss function for dealing with 
# nosisy labels
class SymmetricCrossEntropy(tf.losses.Loss):
    def __init__(self, alpha=0.1, beta=1.0):
        '''
        Paper: https://arxiv.org/abs/1908.06112
        '''
        super(SymmetricCrossEntropy, self).__init__()
        self.alpha = alpha
        self.beta = beta

    def call(self, y_true, y_pred):
        ce_loss = tf.reduce_mean(-tf.reduce_sum(y_true * \
                    tf.math.log(tf.clip_by_value(y_pred, 1e-7, 1.0)), 
                    axis = -1))

        rce_loss = tf.reduce_mean(-tf.reduce_sum(y_pred * \
                   tf.math.log(tf.clip_by_value(y_true, 1e-4, 1.0)), 
                   axis = -1))

        return self.alpha*ce_loss + self.beta*rce_loss
    
# create an instance
SCELoss = SymmetricCrossEntropy()

In [None]:
'''
Following class (CassavaTrainer) contians 

- __init__: initiate basic parameter and compile the model and 
            print out basic training information 
  
- fold_generator: based on train_index and valid_index, it 
                  will create training fold and validation fold set

- callbacks: Contains necessary callbacks.

- fold_training: Actual training. Also OOF with TTA over the validation set. 
'''
class CassavaTrainer:
    def __init__(self, fold, trn_idx, val_idx):
        tf.keras.backend.clear_session()
        gc.collect()
        
        self.fold = fold
        self.trn_idx = trn_idx
        self.val_idx = val_idx
        
        # building the complete model and compile
        self.model = CassavaClassifier((TrainConfig.IMG_SIZE[str(self.fold)], 
                                        TrainConfig.IMG_SIZE[str(self.fold)], 3),
                                       TrainConfig.BASE_NETS[str(self.fold)]) 
        self.model.compile(
            loss      = tf.keras.losses.CategoricalCrossentropy(label_smoothing=TrainConfig.LABEL_SMOOTH),
            metrics   = tf.keras.metrics.CategoricalAccuracy(),
            optimizer = tf.keras.optimizers.Adam(learning_rate=TrainConfig.LR_RATE[str(self.fold)]))
    
        # print out the model params
        trainable_count = np.sum([K.count_params(w) \
                                  for w in self.model.trainable_weights]) 
        non_trainable_count = np.sum([K.count_params(w) \
                                      for w in self.model.non_trainable_weights])
        
        print('[INFO]: OOF Fold No. {}'.format(self.fold))
        print('[INFO]: Model Build Successfully.')
        print('[INFO]: Model {} - Image Size {} - Batch Size {}'.\
              format(TrainConfig.BASE_NETS[str(self.fold)][0].__name__, 
              TrainConfig.IMG_SIZE[str(self.fold)], 
                     TrainConfig.BATCH_SIZE[str(self.fold)]))
        print('[INFO]: Initial Learning Rate: {} - Class Balancing: {}'.\
              format(TrainConfig.LR_RATE[str(self.fold)],
                     TrainConfig.CLS_BLN_WGS))
        print('Total params: {:,}'.format(trainable_count + non_trainable_count))
        print('Trainable params: {:,}'.format(trainable_count))
        print('Non-trainable params: {:,}'.format(non_trainable_count))
        
    
    def fold_generator(self):
        # for way one - data generator
        train_labels = df.iloc[self.trn_idx].reset_index(drop=True) 
        val_labels   = df.iloc[self.val_idx].reset_index(drop=True) 

        # training generator  
        train_generator = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                                           train_labels, TrainConfig.BATCH_SIZE[str(self.fold)],
                                           BaseConfig.SEED,
                                           (TrainConfig.IMG_SIZE[str(self.fold)], 
                                            TrainConfig.IMG_SIZE[str(self.fold)], 3),
                                           shuffle = True, is_train = True, 
                                           use_mixup = TrainConfig.MixUp, 
                                           use_cutmix = TrainConfig.CutMix,
                                           transform = albu_transforms_train(TrainConfig.IMG_SIZE[str(self.fold)]))
        
        # validation generator: no shuffle , not augmentation
        valid_batch, valid_step = count_data_items(len(val_labels), 
                                                   TrainConfig.BATCH_SIZE[str(self.fold)])
        val_generator = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                                         val_labels, valid_batch, 
                                         BaseConfig.SEED,
                                         (TrainConfig.IMG_SIZE[str(self.fold)], 
                                          TrainConfig.IMG_SIZE[str(self.fold)], 3), 
                                         shuffle = False, is_train = True, 
                                         transform = albu_transforms_valid(TrainConfig.IMG_SIZE[str(self.fold)]))
        
        return train_generator, val_generator, train_labels, val_labels, valid_step, valid_batch

    def callbacks(self):
        # model check point, save best weight based on val loss
        checkpoint = tf.keras.callbacks.ModelCheckpoint('./fold_{}.h5'.format(self.fold),
                                                        monitor='val_loss',
                                                        verbose= 0,save_best_only=True,
                                                        mode= 'min',save_weights_only=True)
        # save model history information in csv file 
        csv_logger = tf.keras.callbacks.CSVLogger('./history_{}.csv'.format(self.fold))
        
        # reduce learning rate based on val loss
        # if we use LearningRateScheduler, we won't use reduceLROnPlat
        reduceLROnPlat = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',
                                                              factor=0.3, patience=2,
                                                              verbose=0, mode='auto',
                                                              epsilon=0.0001, cooldown=1, 
                                                              min_lr=0.00001)
        return [checkpoint,
                csv_logger,
                reduceLROnPlat]
    

    def fold_training(self):
        # call each fold set
        train_generator, val_generator, train_labels, \
        val_labels, valid_step, valid_batch = self.fold_generator()

        # invoke callbacks functions
        steps_per_epoch  = np.ceil(float(len(train_labels)) \
                                   / float(TrainConfig.BATCH_SIZE[str(self.fold)])) 
        validation_steps = valid_step 
        
        # try balancing 
        cls_wgts = class_weight.compute_class_weight("balanced",
                                                  sorted(train_labels.label.unique()), 
                                                  train_labels.label.values)
        cls_wgts = {i : cls_wgts[i] for i, label \
                    in enumerate(sorted(train_labels.label.unique()))}
        
        # fit generator
        history = self.model.fit(
            train_generator,
            steps_per_epoch  = steps_per_epoch,
            validation_data  = val_generator,
            validation_steps = validation_steps,
            epochs = TrainConfig.EPOCH, 
            class_weight = None if not TrainConfig.CLS_BLN_WGS else cls_wgts,
            verbose = TrainConfig.VERBOSITY, callbacks = self.callbacks(), 
            workers = TrainConfig.WORKERS, use_multiprocessing = TrainConfig.MULTIPROCESS
        )
    
        print('[INFO]: Loading Best Model...')
        self.model.build((None, *(TrainConfig.IMG_SIZE[str(self.fold)],
                             TrainConfig.IMG_SIZE[str(self.fold)], 3)))
        self.model.load_weights('./fold_{}.h5'.format(self.fold))
        
        # appending / saving 
        oof_names.append(val_labels.image_id.tolist())
        oof_tar.append(val_labels.label.tolist()) 
        oof_folds.append(np.ones_like(oof_tar[-1], dtype='int8')*self.fold ) 
        oof_val.append(np.max(history.history['val_categorical_accuracy']))
        
        print('[INFO]: Predicting TTA over OOF...')
        tta_preds = []
        for _ in range(TrainConfig.TTA):
            # using the tta augmentation on validation set in prediction time 
            tta_val_generator = CassavaGenerator(BaseConfig.TRAIN_IMG_PATH, 
                                                 val_labels, valid_batch,
                                                 BaseConfig.SEED,
                                                 (TrainConfig.IMG_SIZE[str(self.fold)], 
                                                  TrainConfig.IMG_SIZE[str(self.fold)], 3), 
                                                 shuffle = False, is_train = True, 
                                                 transform = albu_transforms_inference(TrainConfig.IMG_SIZE[str(self.fold)]))
            # predict and take mean
            pred = self.model.predict(tta_val_generator, verbose=0)
            tta_preds.append(pred/TrainConfig.TTA)
               
        oof_pred.append(np.argmax(np.sum(tta_preds, axis=0), axis=1))
        print('[INFO]: OOF Acc without TTA: {} - with TTA: {}'.\
              format(oof_val[-1], accuracy_score(oof_tar[-1], oof_pred[-1])))
        print('[INFO]: OOF Balance Acc with TTA: ', balanced_accuracy_score(oof_tar[-1], oof_pred[-1]))
        print('\n'*2)
        
        del self.model, train_generator, val_generator,tta_val_generator, train_labels, val_labels 

In [None]:
# calling method to run on all folds
skf = StratifiedKFold(n_splits=TrainConfig.FOLDS, shuffle=True, random_state=BaseConfig.SEED)
for each_fold, (trn_idx, val_idx) in enumerate(skf.split(np.arange(df.shape[0]), df.label.values)):
    tx = CassavaTrainer(each_fold, trn_idx, val_idx)
    tx.fold_training()
    break

## Calculating OOF Accuracy

Here overall **OOF** is calculated. In order to ensemble of different models which are trained with the **same configuration**, in that case we will need this. Basically **OOF** is used to determine what are the best weights to blend these models with. To do that, please, check these notebooks:

- [Forward Selection OOF Ensemble](https://www.kaggle.com/cdeotte/forward-selection-oof-ensemble-0-942-private)
- [Optimizing Metrics: Out-of-Fold Weights Ensemble](https://www.kaggle.com/ipythonx/optimizing-metrics-out-of-fold-weights-ensemble)

In [None]:
# COMPUTE OVERALL OOF ACCURACY
oof   = np.concatenate(oof_pred)
true  = np.concatenate(oof_tar)
names = np.concatenate(oof_names)
folds = np.concatenate(oof_folds)

# SAVE OOF TO DISK
df_oof = pd.DataFrame(dict(
    image_id = names, 
    target   = true, 
    pred     = oof, 
    fold     = folds
))

print('Overall OOF Acc: ', accuracy_score(true, oof))
print('Overall OOF Balance Acc: ', balanced_accuracy_score(true, oof))

df_oof.to_csv('oof.csv', index=False)
df_oof.head(10)

# Test Gens and Inference

Here, **TTA** will be applied over each trained model followed by a simple ensemble. The applied augmentation is mentioned in the **Augmentation** section. It's just a random selection though, we need to do experiment to get the most optimized ones. 

```
- Test Time Augmentation (TTA)
- Ensembling
```

In [None]:
df_test = pd.DataFrame({
    'image_id': os.listdir(BaseConfig.TEST_IMG_PATH),
})
df_test.head()

In [None]:
model_check_points = sorted(glob('./*.h5'))
print(f'Found {len(model_check_points)} Weights Files.')

# container for predictive probabilities
target = []

for i, each_check_points in enumerate(model_check_points):
    # define and load weights
    model = CassavaClassifier((TrainConfig.IMG_SIZE[str(i)], 
                               TrainConfig.IMG_SIZE[str(i)], 3), 
                              TrainConfig.BASE_NETS[str(i)])
    model.build((None, *(TrainConfig.IMG_SIZE[str(i)],
                         TrainConfig.IMG_SIZE[str(i)], 3)))
    model.load_weights(each_check_points)
    
    tta_preds = []
    for _ in range(TrainConfig.TTA):
        test_batch, test_step = count_data_items(len(df_test), 
                                                 TrainConfig.BATCH_SIZE[str(i)])
        test_generator = CassavaGenerator(BaseConfig.TEST_IMG_PATH, df_test,
                                          test_batch, BaseConfig.SEED,
                                          (TrainConfig.IMG_SIZE[str(i)], 
                                           TrainConfig.IMG_SIZE[str(i)], 3), 
                                          shuffle = False, is_train = False,
                                          transform = albu_transforms_inference(TrainConfig.IMG_SIZE[str(i)]))
        # predict and take mean
        pred = model.predict(test_generator, verbose=1)
        tta_preds.append(pred/TrainConfig.TTA)
    
    tta_preds = np.sum(tta_preds, axis=0)
    target.append(tta_preds/len(model_check_points))

    del model, test_generator
    gc.collect()

In [None]:
df_test.loc[:, 'label'] = np.argmax(np.sum(target, axis=0), axis=1)
df_test.to_csv("submission.csv", index=False)
df_test.head()