# Introduction

To start with fact

**Petals to the Metal** is actually one of the book from the series The Adventure Zone...
To find more about it please check this [link](https://www.amazon.com/Adventure-Zone-Petals-Metal/dp/1250232635)...
Keeping this aside, let's get to classifying some flowers




<center><img src="https://www.fiftyflowers.com/blog/wp-content/uploads/iStock-659171982-1170x449.jpg"></center>

<font color="red" size=3>Please upvote this kernel if you like it. It motivates me to produce more quality content :)</font>

This competition comes under **Getting Started Competition** and the main aim is to understand how to use TPU's when there is large amount of data...
There is around 4.8GB of data and it contains TFRecords of train, validation, test and a sample_submission.csv
Our task at hand is to build a classifier which will classify 104 different types of flowers...

So without any more delay lets get into it..

# Update Log

### V7
* Adding DenseNet201 to the notebook 
* Visualizing the results

### V9
* Improving notebook aesthetics
* Added visualizations for loss

### V12
* Hiding unnecessary code and adding and minor code changes

<a class="anchor" id="toc"></a>
<div style="background: #f9f9f9 none repeat scroll 0 0;border: 1px solid #aaa;display: table;font-size: 95%;margin-bottom: 1em;padding: 20px;width: 600px;">
<h1>Contents</h1>
<ul style="font-weight: 700;text-align: left;list-style: outside none none !important;">
<li style="list-style: outside none none !important;font-size:17px"><a href="#1.1">1 Data Preparation</a></li>
      <ul style="font-weight: 700;text-align: left;list-style: outside none none !important;">
            <li style="list-style: outside none none !important;"><a href="#1.1">1.1 Importing Dependencies</a></li>
            <li style="list-style: outside none none !important;"><a href="#1.2">1.2 Setting the parameters</a></li>
            <li style="list-style: outside none none !important;"><a href="#1.3">1.3 Helper Functions</a></li>
            <li style="list-style: outside none none !important;"><a href="#1.4">1.4 Visualization Functions</a></li>
            <li style="list-style: outside none none !important;"><a href="#1.5">1.5 Augmentation Functions</a></li>
      </ul>
<li style="list-style: outside none none !important;font-size:17px"><a href="#2.1">2 Visualizations</a></li>
      <ul style="font-weight: 700;text-align: left;list-style: outside none none !important;">
            <li style="list-style: outside none none !important;"><a href="#2.1">2.1 Training Images</a></li>
            <li style="list-style: outside none none !important;"><a href="#2.2">2.2 Validation Images</a></li>
            <li style="list-style: outside none none !important;"><a href="#2.3">2.3 Test Images</a></li>
            <li style="list-style: outside none none !important;"><a href="#2.4">2.4 Augmentations</a></li>
      </ul>
    
<li style="list-style: outside none none !important;font-size:17px"><a href="#3">3 Modelling</a></li>
      <ul style="font-weight: 700;text-align: left;list-style: outside none none !important;">
            <li style="list-style: outside none none !important;"><a href="#3.1">3.1 Warm-up Layers</a></li>
            <li style="list-style: outside none none !important;"><a href="#3.2">3.2 Fundamental DenseNet Block</a></li>
            <li style="list-style: outside none none !important;"><a href="#3.3">3.3 Model Architecture</a></li>
            <li style="list-style: outside none none !important;"><a href="#3.4">3.4 Fine Tuning Layers</a></li>
            <li style="list-style: outside none none !important;"><a href="#3.5">3.5 Visualize Results</a></li>
      </ul>    
<li style="list-style: outside none none !important;font-size:17px"><a href="#4">4 Acknowledgements</a></li>
      <ul style="font-weight: 700;text-align: left;list-style: outside none none !important;">    

</ul>
</div>

# Data Preparation

## Importing Dependencies 
<a class="anchor" id="1.1"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >        Back to the table of contents</a>

In [None]:
!pip install -q efficientnet
import efficientnet.tfkeras as efn
from tensorflow.keras.applications import DenseNet201

import math, os, re, warnings
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from IPython.display import SVG

from kaggle_datasets import KaggleDatasets
from sklearn.utils import class_weight
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras import optimizers, applications, Sequential, layers
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, LearningRateScheduler
import tensorflow as tf, tensorflow.keras.backend as K
from tensorflow.keras.models import Model


import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

def seed_everything(seed):
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'

seed = 42
seed_everything(seed)
warnings.filterwarnings("ignore")

The below code is used to detect hardware and check whether our TPU is working or not...          
The output returned tells us about appropriate distribution strategy 

If the output is **8 replicas** then TPU is switched on and working fine.                        
If the output is **1 replica** then TPU is not switched on and you can switch it on through **Settings->Accelerator->TPU v3-8**


In [None]:
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

print("REPLICAS: ", strategy.num_replicas_in_sync)


## Setting the parameters <a class="anchor" id="1.2"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
EPOCHS = 20
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
WARMUP_LEARNING_RATE = 1e-4 * strategy.num_replicas_in_sync
WARMUP_EPOCHS = 3
HEIGHT = 512
WIDTH = 512
IMAGE_SIZE = [224, 224]
CHANNELS = 3
N_CLASSES = 104
ES_PATIENCE = 6
RLROP_PATIENCE = 3
DECAY_DROP = 0.3

model_path = 'DenseNet201_%sx%s.h5' % (HEIGHT, WIDTH)

GCS_PATH = KaggleDatasets().get_gcs_path() + '/tfrecords-jpeg-%sx%s' % (HEIGHT, WIDTH)

TRAINING_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/train/*.tfrec')
VALIDATION_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/val/*.tfrec')
TEST_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/test/*.tfrec')

Below defined are the 104 classes of flowers which most of the humans cannot classify but a machine can...          
Such advanced is the field of Computer Vision.

In [None]:
CLASSES = [
    'pink primrose', 'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea', 
    'wild geranium', 'tiger lily', 'moon orchid', 'bird of paradise', 'monkshood', 
    'globe thistle', 'snapdragon', "colt's foot", 'king protea', 'spear thistle', 
    'yellow iris', 'globe-flower', 'purple coneflower', 'peruvian lily', 
    'balloon flower', 'giant white arum lily', 'fire lily', 'pincushion flower', 
    'fritillary', 'red ginger', 'grape hyacinth', 'corn poppy', 
    'prince of wales feathers', 'stemless gentian', 'artichoke', 'sweet william', 
    'carnation', 'garden phlox', 'love in the mist', 'cosmos',  'alpine sea holly', 
    'ruby-lipped cattleya', 'cape flower', 'great masterwort',  'siam tulip', 
    'lenten rose', 'barberton daisy', 'daffodil',  'sword lily', 'poinsettia', 
    'bolero deep blue',  'wallflower', 'marigold', 'buttercup', 'daisy', 
    'common dandelion', 'petunia', 'wild pansy', 'primula',  'sunflower', 
    'lilac hibiscus', 'bishop of llandaff', 'gaura',  'geranium', 'orange dahlia', 
    'pink-yellow dahlia', 'cautleya spicata',  'japanese anemone', 'black-eyed susan', 
    'silverbush', 'californian poppy',  'osteospermum', 'spring crocus', 'iris', 
    'windflower',  'tree poppy', 'gazania', 'azalea', 'water lily',  'rose', 
    'thorn apple', 'morning glory', 'passion flower',  'lotus', 'toad lily', 
    'anthurium', 'frangipani',  'clematis', 'hibiscus', 'columbine', 'desert-rose', 
    'tree mallow', 'magnolia', 'cyclamen ', 'watercress',  'canna lily', 
    'hippeastrum ', 'bee balm', 'pink quill',  'foxglove', 'bougainvillea', 
    'camellia', 'mallow',  'mexican petunia',  'bromelia', 'blanket flower', 
    'trumpet creeper',  'blackberry lily', 'common tulip', 'wild rose']

Below is the code for the Learning Rate that will be used to train the model..
Note that we are not starting with a high learning rate because we are fine tuning a model and if we use high learning rate at the beginning then it might break the pretrained weights...
To know more about Learning rate warm-up please refer to [this answer](https://stackoverflow.com/questions/55933867/what-does-learning-rate-warm-up-mean)

In [None]:
learning_rate = 3e-5 * strategy.num_replicas_in_sync
lr_start = 0.00000001
lr_min = 0.000001
lr_max = 3e-5 * strategy.num_replicas_in_sync
lr_rampup_epochs = 3
lr_sustain_epochs = 0
lr_exp_decay = .8

def lrfn(epoch):
    if epoch < lr_rampup_epochs:
        lr = (lr_max - lr_start) / lr_rampup_epochs * epoch + lr_start
    elif epoch < lr_rampup_epochs + lr_sustain_epochs:
        lr = lr_max
    else:
        lr = (lr_max - lr_min) * lr_exp_decay**(epoch - lr_rampup_epochs - lr_sustain_epochs) + lr_min
    return lr
    
    
lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose = True)

rng = [i for i in range(21 if EPOCHS<21 else EPOCHS)]
y = [lrfn(x) for x in rng]


print("Learning rate schedule: {:.3g} to {:.3g} to {:.3g}".format(y[0], max(y), y[-1]))

Visualizing anything makes understanding better so to understand how learning rate will change over the epochs please hover over the plot

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=rng, y=y,
                        mode='lines+markers',
                        line=dict(color='royalblue', width=4)))
fig.update_layout(
    title='Learning Rate Schedule',
    title_x=0.5,
    xaxis_title="Range of epochs",
    yaxis_title="Learning rate in 10^-6",
    paper_bgcolor='rgb(252, 252, 255)',
    plot_bgcolor='rgb(248, 248, 255)',
    font=dict(
        size=18,
        color="red"
    )
)
fig.show()

## Helper Functions <a class="anchor" id="1.3"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
# Datasets utility functions
AUTO = tf.data.experimental.AUTOTUNE # instructs the API to read from multiple files if available.


def decode_image(image_data):
    '''
    This method is used to read the input bytes string and
    detects whether an     image is a BMP, GIF, JPEG, or PNG,
    and performs the appropriate operation to convert 
    a Tensor of type dtype.
    '''
    image = tf.image.decode_jpeg(image_data, channels=3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.reshape(image, [HEIGHT, WIDTH, 3])
    return image

def read_labeled_tfrecord(example):
    '''
    This method is used to read an example of train tfrecord 
    or validation tfrecord and the output given is an image
    with its label.
    '''
    LABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "class": tf.io.FixedLenFeature([], tf.int64),  # shape [] means single element
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)
    return image, label

def read_unlabeled_tfrecord(example):
    '''
    Nearly same as `read_labeled_tfrecord`, but in this
    the test tfrecord is read and we do not have test labels
    so the output is image and id_number.
    '''
    UNLABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "id": tf.io.FixedLenFeature([], tf.string),  # shape [] means single element
        # class is missing, this competitions's challenge is to predict flower classes for the test dataset
    }
    example = tf.io.parse_single_example(example, UNLABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    idnum = example['id']
    return image, idnum # returns a dataset of image(s)

def load_dataset(filenames, labeled=True, ordered=False):
    '''
    Used to load the train, validation and test dataset
    '''
    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False 
        # disable order, increase speed

    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO) 
    # automatically interleaves reads from multiple files
    dataset = dataset.with_options(ignore_order) 
    # uses data as soon as it streams in, rather than in its original order
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord, num_parallel_calls=AUTO)
    # returns a dataset of (image, label) pairs if labeled=True or (image, id) pairs if labeled=False
    return dataset

def data_augment(image, label):
    '''
    Performing different types of augmentations on the data...
    '''
    image = tf.image.random_flip_left_right(image, seed=seed)
    image = tf.image.random_flip_up_down(image, seed=seed)
    image = tf.image.random_saturation(image, lower=0, upper=2, seed=seed)
    image = tf.image.random_crop(image, size=[int(HEIGHT*.8), int(WIDTH*.8), CHANNELS], seed=seed)

    return image, label

def get_training_dataset(do_aug=True):
    dataset = load_dataset(TRAINING_FILENAMES, labeled=True)
    dataset = dataset.map(data_augment, num_parallel_calls=AUTO)
    if do_aug: 
        dataset = dataset.map(transform, num_parallel_calls=AUTO)
    dataset = dataset.repeat() # the training dataset must repeat for several epochs
    dataset = dataset.shuffle(2048)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO) 
    # prefetch next batch while training (autotune prefetch buffer size)
    return dataset

def get_training_dataset_preview(ordered=True):
    dataset = load_dataset(TRAINING_FILENAMES, labeled=True, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()
    dataset = dataset.prefetch(AUTO)
    return dataset

def get_validation_dataset(ordered=False):
    dataset = load_dataset(VALIDATION_FILENAMES, labeled=True, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()
    dataset = dataset.prefetch(AUTO)
    return dataset

def get_test_dataset(ordered=False):
    dataset = load_dataset(TEST_FILENAMES, labeled=False, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

def count_data_items(filenames):
    # the number of data items is written in the name of the .tfrec files, i.e. flowers00-230.tfrec = 230 data items
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) for filename in filenames]
    return np.sum(n)

## Visualization functions <a class="anchor" id="1.4"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:

np.set_printoptions(threshold=15, linewidth=80)

def batch_to_numpy_images_and_labels(data):
    images, labels = data
    numpy_images = images.numpy()
    numpy_labels = labels.numpy()
    if numpy_labels.dtype == object: # binary string in this case, these are image ID strings
        numpy_labels = [None for _ in enumerate(numpy_images)]
    # If no labels, only image IDs, return None for labels (this is the case for test data)
    return numpy_images, numpy_labels

def disp_images(databatch):
    row = 2; col = 4;
    FIGSIZE = 16.0
    subplot=(row,col,1)
    plt.figure(figsize=(FIGSIZE,FIGSIZE/col*row))
    images, _ = batch_to_numpy_images_and_labels(databatch)
    for j in range(row*col):
        plt.subplot(row,col,j+1)
        plt.axis('off')
        plt.imshow(images[j,])
    plt.show()

## Augmentation Functions <a class="anchor" id="1.5"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>



I will also be adding cutmix, mix-up in the future versions if this notebook recieves a good response

In [None]:
def get_mat(rotation, shear, height_zoom, width_zoom, height_shift, width_shift):
    # returns 3x3 transformmatrix which transforms indicies
        
    # CONVERT DEGREES TO RADIANS
    rotation = math.pi * rotation / 180.
    shear = math.pi * shear / 180.
    
    # ROTATION MATRIX
    c1 = tf.math.cos(rotation)
    s1 = tf.math.sin(rotation)
    one = tf.constant([1],dtype='float32')
    zero = tf.constant([0],dtype='float32')
    rotation_matrix = tf.reshape( tf.concat([c1,s1,zero, -s1,c1,zero, zero,zero,one],axis=0),[3,3] )
        
    # SHEAR MATRIX
    c2 = tf.math.cos(shear)
    s2 = tf.math.sin(shear)
    shear_matrix = tf.reshape( tf.concat([one,s2,zero, zero,c2,zero, zero,zero,one],axis=0),[3,3] )    
    
    # ZOOM MATRIX
    zoom_matrix = tf.reshape( tf.concat([one/height_zoom,zero,zero, zero,one/width_zoom,zero, zero,zero,one],axis=0),[3,3] )
    
    # SHIFT MATRIX
    shift_matrix = tf.reshape( tf.concat([one,zero,height_shift, zero,one,width_shift, zero,zero,one],axis=0),[3,3] )
    
    return K.dot(K.dot(rotation_matrix, shear_matrix), K.dot(zoom_matrix, shift_matrix))


def transform(image,label):
    # input image - is one image of size [dim,dim,3] not a batch of [b,dim,dim,3]
    # output - image randomly rotated, sheared, zoomed, and shifted
    DIM = IMAGE_SIZE[0]
    XDIM = DIM%2 #fix for size 331
    
    rot = 15. * tf.random.normal([1],dtype='float32')
    shr = 5. * tf.random.normal([1],dtype='float32') 
    h_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
    w_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
    h_shift = 16. * tf.random.normal([1],dtype='float32') 
    w_shift = 16. * tf.random.normal([1],dtype='float32') 
  
    # GET TRANSFORMATION MATRIX
    m = get_mat(rot,shr,h_zoom,w_zoom,h_shift,w_shift) 

    # LIST DESTINATION PIXEL INDICES
    x = tf.repeat( tf.range(DIM//2,-DIM//2,-1), DIM )
    y = tf.tile( tf.range(-DIM//2,DIM//2),[DIM] )
    z = tf.ones([DIM*DIM],dtype='int32')
    idx = tf.stack( [x,y,z] )
    
    # ROTATE DESTINATION PIXELS ONTO ORIGIN PIXELS
    idx2 = K.dot(m,tf.cast(idx,dtype='float32'))
    idx2 = K.cast(idx2,dtype='int32')
    idx2 = K.clip(idx2,-DIM//2+XDIM+1,DIM//2)
    
    # FIND ORIGIN PIXEL VALUES           
    idx3 = tf.stack( [DIM//2-idx2[0,], DIM//2-1+idx2[1,]] )
    d = tf.gather_nd(image,tf.transpose(idx3))
        
    return tf.reshape(d,[DIM,DIM,3]),label

In [None]:
# Train data
NUM_TRAINING_IMAGES = count_data_items(TRAINING_FILENAMES)
train_dataset = get_training_dataset_preview(ordered=True)
y_train = next(iter(train_dataset.unbatch().map(lambda image, label: label).batch(NUM_TRAINING_IMAGES))).numpy()

# Validation data
NUM_VALIDATION_IMAGES = count_data_items(VALIDATION_FILENAMES)
valid_dataset = get_validation_dataset(ordered=True)
y_valid = next(iter(valid_dataset.unbatch().map(lambda image, label: label).batch(NUM_VALIDATION_IMAGES))).numpy()

# Test data
NUM_TEST_IMAGES = count_data_items(TEST_FILENAMES)
test_dataset = get_test_dataset(ordered=True)

print('Dataset:\n {} training images,\n {} validation images,\n {} unlabeled test images'.format(NUM_TRAINING_IMAGES, NUM_VALIDATION_IMAGES, NUM_TEST_IMAGES))

# Visualizations
## Training Images<a class="anchor" id="2.1"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
disp_images(next(iter(train_dataset.unbatch().batch(8))))

## Validation Images<a class="anchor" id="2.2"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
disp_images(next(iter(valid_dataset.unbatch().batch(8))))

## Test Images<a class="anchor" id="2.3"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
disp_images(next(iter(test_dataset.unbatch().batch(8))))

## Augmentations<a class="anchor" id="2.4"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
for i in range(2):
    row = 2; col = 4;
    all_elements = get_training_dataset(do_aug=False).unbatch()
    one_element = tf.data.Dataset.from_tensors( next(iter(all_elements)) )
    augmented_element = one_element.repeat().map(transform).batch(row*col)

    for (img,label) in augmented_element:
        plt.figure(figsize=(16,int(16*row/col)))
        for j in range(row*col):
            plt.subplot(row,col,j+1)
            plt.axis('off')
            plt.imshow(img[j,])
        plt.show()
        break

Now let's look at how many flowers of each type are there in the test and train dataset

In [None]:
train_agg = np.asarray([[label, (y_train == index).sum()] for index, label in enumerate(CLASSES)])
valid_agg = np.asarray([[label, (y_valid == index).sum()] for index, label in enumerate(CLASSES)])
fig = go.Figure(data=[
    go.Bar(name='Train', x=train_agg[...,1], y=train_agg[...,0],orientation='h',
        marker=dict(color='rgba(102, 255, 102, 0.5)')),
    go.Bar(name='Validation',x=valid_agg[...,1], y=valid_agg[...,0],orientation='h',
           marker=dict(color='rgba(255, 102, 102, 0.5)'))
])
fig.update_layout(
    title='Train and Validation Class distribution',
    title_x=0.5,
    barmode='stack',
    xaxis_title="",
    yaxis_title="",
    font=dict(
        size=10,
        color="royalblue"
    ),
    paper_bgcolor='rgb(252, 252, 255)',
    plot_bgcolor='rgb(248, 248, 255)',
    autosize=False,
    width=800,
    height=1500,
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=40,
        pad=4
    ))

fig.show()

In the above plot please not that by toggling the **Legend** we can see that the x axis values will change...

# Modelling<a class="anchor" id="3"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

The first model we will be dealing with is **DenseNet201**          
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. DenseNet connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - DenseNet network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.

For more information on DenseNet201 please follow this [link](https://www.kaggle.com/pytorch/densenet201)

In [None]:
def create_model(input_shape, N_CLASSES):
    base_model = applications.DenseNet201(weights='imagenet', 
                                          include_top=False,
                                          input_shape=input_shape)

    base_model.trainable = False # Freeze layers
    model = tf.keras.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dense(N_CLASSES, activation='softmax')
    ])
    
    return model

## Warm-up the top layers<a class="anchor" id="3.1"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
with strategy.scope():
    model = create_model((None, None, CHANNELS), N_CLASSES)
    
metric_list = ['sparse_categorical_accuracy']

optimizer = optimizers.Adam(lr=WARMUP_LEARNING_RATE)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=metric_list)
model.summary()

As I said earlier in the notebook that anything that is visualized makes understanding things more better so let's visualize the model architecture
## Fundamental DenseNet Block<a class="anchor" id="3.2"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
SVG(tf.keras.utils.model_to_dot(Model(model.layers[0].input, model.layers[0].layers[13].output), dpi=75).create(prog='dot', format='svg'))

The above image shows the fundamental block in the DenseNet architecture. The architecture mainly involves Convolution, Maxpooling, ReLU, and concatenation.
## Model Architecture<a class="anchor" id="3.3"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
SVG(tf.keras.utils.model_to_dot(model, dpi=75).create(prog='dot', format='svg'))

In [None]:
STEPS_PER_EPOCH = NUM_TRAINING_IMAGES // BATCH_SIZE
warmup_history = model.fit(x=get_training_dataset(), 
                           steps_per_epoch=STEPS_PER_EPOCH, 
                           validation_data=get_validation_dataset(),
                           epochs=WARMUP_EPOCHS, 
                           verbose=2).history

Remember the learning rate plot where there was a linear increase for first three epochs...
The above code is warming up the top layers so that we can fine tune it in the next step
## Fine Tuning all the layers<a class="anchor" id="3.4"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
for layer in model.layers:
    layer.trainable = True # Unfreeze layers

checkpoint = ModelCheckpoint(model_path, monitor='val_loss', mode='min', save_best_only=True)
es = EarlyStopping(monitor='val_loss', mode='min', patience=ES_PATIENCE, 
                   restore_best_weights=True, verbose=1)
lr_callback = LearningRateScheduler(lrfn, verbose=1)

callback_list = [checkpoint, es, lr_callback]

optimizer = optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=metric_list)
model.summary()

In [None]:
history = model.fit(x=get_training_dataset(), 
                    steps_per_epoch=STEPS_PER_EPOCH, 
                    validation_data=get_validation_dataset(),
                    callbacks=callback_list,
                    epochs=EPOCHS, 
                    verbose=2).history

## Visualize the results<a class="anchor" id="3.5"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

In [None]:
def display_training_curves(training, validation):
    fig = go.Figure()
        
    fig.add_trace(
        go.Scatter(x=np.arange(1, EPOCHS+1), mode='lines+markers', y=training, marker=dict(color="dodgerblue"),
               name="Train"))
    
    fig.add_trace(
        go.Scatter(x=np.arange(1, EPOCHS+1), mode='lines+markers', y=validation, marker=dict(color='red'),
               name="Val"))
    if training != history['loss']:
        fig.update_layout(title_x=0.5,title_text='Accuracy vs Epochs', 
                      yaxis_title='Accuracy', xaxis_title="Epochs",
                      paper_bgcolor='rgb(252, 252, 255)',
                      plot_bgcolor='rgb(248, 248, 255)',)
    else:
        fig.update_layout(title_x=0.5,title_text='Loss vs Epochs', 
                      yaxis_title='Loss', xaxis_title="Epochs",
                      paper_bgcolor='rgb(252, 252, 255)',
                      plot_bgcolor='rgb(248, 248, 255)',)
    fig.show()

In [None]:
display_training_curves(history['sparse_categorical_accuracy'],history['val_sparse_categorical_accuracy'])

In [None]:
display_training_curves(history['loss'],history['val_loss'])

In [None]:
acc_df = pd.DataFrame(np.transpose([[*np.arange(1, 20).tolist()*3], ["Train"]*19 + ["Val"]*19 + ["Benchmark"]*19,
                                     history['sparse_categorical_accuracy'] + history['val_sparse_categorical_accuracy'] + [1.0]*19]))
acc_df.columns = ["Epochs", "Stage", "Accuracy"]

In [None]:
fig = px.bar(acc_df, x="Accuracy", y="Stage", animation_frame="Epochs", title="Accuracy vs. Epochs", color='Stage',
       color_discrete_map={"Train":"dodgerblue", "Val":"red", "Benchmark":"seagreen"}, orientation="h")

fig.update_layout(
    xaxis = dict(
        autorange=False,
        range=[0, 1]
    )
)

fig.update_layout(title_x=0.5, 
                  paper_bgcolor='rgb(252, 252, 255)',
                  plot_bgcolor='rgb(248, 248, 255)',)

Please click the **PLAY** button to understand how Train accuracy and validation accuracy change over the time

In [None]:
acc_df = pd.DataFrame(np.transpose([[*np.arange(1, 20).tolist()*2], ["Train"]*19 + ["Val"]*19,
                                     history['loss'] + history['val_loss'] ]))
acc_df.columns = ["Epochs", "Stage", "Loss"]

In [None]:
fig = px.bar(acc_df, x="Loss", y="Stage", animation_frame="Epochs", title="Loss vs. Epochs", color='Stage',
       color_discrete_map={"Train":"dodgerblue", "Val":"red"}, orientation="h")

fig.update_layout(
    xaxis = dict(
        autorange=False,
        range=[0, 1]
    )
)

fig.update_layout(title_x=0.5, 
                  paper_bgcolor='rgb(252, 252, 255)',
                  plot_bgcolor='rgb(248, 248, 255)',)

We can see from the above animation that after certain epochs there is no decrease in the validation loss
Also, for the first few epochs there is no decrease in the loss for both training and validation

# Acknowledgements
1. [Awesome notebook by Chris Deotte](https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96)
2. [Work of DimitreOliveira](https://www.kaggle.com/dimitreoliveira/flower-classification-with-tpus-eda-and-baseline)
3. [Excellent visualizations by Tarun](https://www.kaggle.com/tarunpaparaju/plant-pathology-2020-eda-models)
4. [Notebook Aesthetics by Ouassim](https://www.kaggle.com/ishivinal/machine-learning-model-evaluation-metrics)
5. [Plotly](https://plotly.com/)


<a class="anchor" id="4"></a>
<a href="#toc"><img src= "https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Circle-icons-arrow-up.svg/1200px-Circle-icons-arrow-up.svg.png" style="width:20px;hight:20px;float:left" >Back to the table of contents</a>

## In the upcoming versions of this notebook I will be adding different models and their performance...