# Your Default Notebook for any Computer Vision Competition, with TPUs

<img src="https://user-images.githubusercontent.com/115424463/267539825-8e845684-c60a-4bc9-838d-f0b800e317b9.jpeg" alt="'Petals to the Metal' by AI">

### Picture: Petals to the Metal

This notebook serves several purposes:

1. It aims to ease the transition from the [**Computer Vision**](https://www.kaggle.com/learn/computer-vision) course toimage classification competitions, using [**Petals to the Metal**](https://www.kaggle.com/c/tpu-getting-started) as an example. To achieve this, the code in this notebook is extensively commented to make it accessible to individuals regardless of their prior experience. We have maintained a high-level approach wherever possible.

2. Its objective is to serve as a collection of logical blocks and utility scripts that you can use for exploratory data analysis (EDA) and model training, allowing you to design and implement your own strategies.

3. It includes an example of a custom convolutional neural network (CNN) to illustrate the theoretical concepts from the Computer Vision course.

4. It introduces several essential tools that are typically used in competitions: 
   - *Callbacks (EarlyStopping, LearningRateSchedule, Checkpoints)*
   - *Transfer learning*
   - *Building an ensemble of multiple models* <br>
  
Credits:
- We extend our immense gratitude to the authors of the **Computer Vision** course and [**Create Your First Sumbission**](https://www.kaggle.com/code/ryanholbrook/create-your-first-submission) notebook. It's truly miraculous that such materials are available for free!
- Several insights and functions were taken from [**George Zoto's** notebook](https://www.kaggle.com/code/georgezoto/computer-vision-petals-to-the-metal), which we strongly recommend for a thorough perusal as both comprehensive and inspiring. 

<blockquote style="margin-right:auto; margin-left:auto; background-color: #ebf9ff; padding: 1em; margin:24px;">
 <strong> Fork this notebook </strong> by clicking on the Copy and Edit button in the top right corner. It is designed to improve your visual comprehension, which is most apparent when you are in edit mode.<br>
    <strong> Please upvote </strong> and comment to keep me motivated and feel like being a part of big Data Science World. <br> </blockquote>
    <br> </blockquote>   

# Step 0: Imports

In [None]:
# !pip install --upgrade pip
# !pip install -U tensorflow == 2.11.0  

import math, re, os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.preprocessing import image_dataset_from_directory

from sklearn.metrics import f1_score

print(f'TensorFlow version: {tf.__version__}')

# Step 1: Connect to Tensor Processing Units (TPUs)

Kaggle provides a limited access to 3 types of processing units, avaliale for your models' training.
- Central Processing Units (**CPUs**)
- Graphics Processing Units (**GPUs**)
- Tensor Processing Units (**TPUs**)

Here is an [**article**](https://towardsdatascience.com/when-to-use-cpus-vs-gpus-vs-tpus-in-a-kaggle-competition-9af708a8c3eb) to help you figure out which is which. Long story short, "GPUs are a great alternative to CPUs when you want to speed up a variety of data science workflows, and TPUs are best when you specifically want to train a machine learning model as fast as you possibly can". But you will have to work on the code to make your data digestable for a TPU.

A TPU has **eight cores** (it's like having eight GPUs in one machine). With **distribution strategy**, we instruct TensorFlow on how to utilize all these cores simultaneously. We will employ this object when constructing our neural network model: it will distribute the training by generating eight distinct *replicas* of the model, one for each core.

In [None]:
# Detect TPU, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # An attempt to detect an avaliable TPU (to 'resolve a TPU cluster')
    print('Running on TPU ', tpu.master())                     # Tell the world that the attempt was a success!
except ValueError:
    tpu = None

if tpu:                                                        # If TPU was detected
    tf.config.experimental_connect_to_cluster(tpu)             # connect to the TPU cluster 
    tf.tpu.experimental.initialize_tpu_system(tpu)             # run the cluster
    strategy = tf.distribute.TPUStrategy(tpu)                  # create a distribution strategy for TPU training
else:
    strategy = tf.distribute.get_strategy()                    # If no TPU was found, use the default distribution strategy for CPU or GPU

print("REPLICAS: ", strategy.num_replicas_in_sync)             # Tell the world which strategy it is

In [None]:
# Set how many files can be processed simultaniously. This will be 16 with TPU off and 128 (=16*8) with TPU on
BATCH_SIZE = 16 * strategy.num_replicas_in_sync

# Step 2: Retrieve, Load and Format Data

- When used with TPUs, datasets need to be stored in a [Google Cloud Storage bucket](https://cloud.google.com/storage/) (**GCS**). You can use data from any public GCS bucket by giving its path (like `'/kaggle/input'`). 
- You can use data from any public dataset here on Kaggle in just the same way. If you'd like to use data from one of your private datasets, see [here](https://www.kaggle.com/docs/tpu#tpu3pt5).
- When used with TPUs, datasets are serialized into [TFRecords](https://www.kaggle.com/ryanholbrook/tfrecords-basics). This is a format for distributing data to each of the TPUs cores.

In [None]:
# Here we create lists of paths to our training, validation and test files
from kaggle_datasets import KaggleDatasets

GCS_DS_PATH     = KaggleDatasets().get_gcs_path('tpu-getting-started')   # You can list the bucket with "!gsutil ls $GCS_DS_PATH"
GCS_DS_PATH_EXT = KaggleDatasets().get_gcs_path('tf-flower-photo-tfrec') # More data from a side source! If you get an error here, add 'tf-flower-photo-tfrec' in the 'Add data' tab

IMAGE_SIZE = [192, 192]                                                  # This is the size for GPU. For TPU use [512, 512]
                 
GCS_PATH_SELECT = {                                                      # Images of different sizes are strored in different directories. The dictionary connects the sizes to the paths
    192: '/tfrecords-jpeg-192x192',
    224: '/tfrecords-jpeg-224x224',
    331: '/tfrecords-jpeg-331x331',
    512: '/tfrecords-jpeg-512x512'
}

GCS_PATH_PER_SIZE = GCS_PATH_SELECT[IMAGE_SIZE[0]]                       # Define the path to the directory depending on the IMAGE_SIZE
GCS_PATH_ORIGINAL = GCS_DS_PATH + GCS_PATH_PER_SIZE                      # This is where the original data for the competition dwells

IMAGENET_FILES    = tf.io.gfile.glob(GCS_DS_PATH_EXT + '/imagenet'    + GCS_PATH_PER_SIZE + '/*.tfrec') # More data from a side source!
INATURELIST_FILES = tf.io.gfile.glob(GCS_DS_PATH_EXT + '/inaturalist' + GCS_PATH_PER_SIZE + '/*.tfrec') # More data from a side source!
OPENIMAGE_FILES   = tf.io.gfile.glob(GCS_DS_PATH_EXT + '/openimage'   + GCS_PATH_PER_SIZE + '/*.tfrec') # More data from a side source!
OXFORD_FILES      = tf.io.gfile.glob(GCS_DS_PATH_EXT + '/oxford_102'  + GCS_PATH_PER_SIZE + '/*.tfrec') # More data from a side source!
TENSORFLOW_FILES  = tf.io.gfile.glob(GCS_DS_PATH_EXT + '/tf_flowers'  + GCS_PATH_PER_SIZE + '/*.tfrec') # More data from a side source!


TRAINING_FILENAMES = tf.io.gfile.glob(GCS_PATH_ORIGINAL  + '/train/*.tfrec')  # Get the list of file paths for training TFRecords
TRAINING_FILENAMES = TRAINING_FILENAMES + IMAGENET_FILES + INATURELIST_FILES + OPENIMAGE_FILES + OXFORD_FILES + TENSORFLOW_FILES  # Add the extra data


VALIDATION_FILENAMES = tf.io.gfile.glob(GCS_PATH_ORIGINAL + '/val/*.tfrec')   # Get the list of file paths for validation TFRecords
TEST_FILENAMES       = tf.io.gfile.glob(GCS_PATH_ORIGINAL + '/test/*.tfrec')  # Get the list of file paths for testing TFRecords

In [None]:
# These are our classification labels 
CLASSES = ['pink primrose',    'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea',     'wild geranium',     'tiger lily',           'moon orchid',              'bird of paradise', 'monkshood',        'globe thistle',         # 00 - 09
           'snapdragon',       "colt's foot",               'king protea',      'spear thistle', 'yellow iris',       'globe-flower',         'purple coneflower',        'peruvian lily',    'balloon flower',   'giant white arum lily', # 10 - 19
           'fire lily',        'pincushion flower',         'fritillary',       'red ginger',    'grape hyacinth',    'corn poppy',           'prince of wales feathers', 'stemless gentian', 'artichoke',        'sweet william',         # 20 - 29
           'carnation',        'garden phlox',              'love in the mist', 'cosmos',        'alpine sea holly',  'ruby-lipped cattleya', 'cape flower',              'great masterwort', 'siam tulip',       'lenten rose',           # 30 - 39
           'barberton daisy',  'daffodil',                  'sword lily',       'poinsettia',    'bolero deep blue',  'wallflower',           'marigold',                 'buttercup',        'daisy',            'common dandelion',      # 40 - 49
           'petunia',          'wild pansy',                'primula',          'sunflower',     'lilac hibiscus',    'bishop of llandaff',   'gaura',                    'geranium',         'orange dahlia',    'pink-yellow dahlia',    # 50 - 59
           'cautleya spicata', 'japanese anemone',          'black-eyed susan', 'silverbush',    'californian poppy', 'osteospermum',         'spring crocus',            'iris',             'windflower',       'tree poppy',            # 60 - 69
           'gazania',          'azalea',                    'water lily',       'rose',          'thorn apple',       'morning glory',        'passion flower',           'lotus',            'toad lily',        'anthurium',             # 70 - 79
           'frangipani',       'clematis',                  'hibiscus',         'columbine',     'desert-rose',       'tree mallow',          'magnolia',                 'cyclamen ',        'watercress',       'canna lily',            # 80 - 89
           'hippeastrum ',     'bee balm',                  'pink quill',       'foxglove',      'bougainvillea',     'camellia',             'mallow',                   'mexican petunia',  'bromelia',         'blanket flower',        # 90 - 99
           'trumpet creeper',  'blackberry lily',           'common tulip',     'wild rose']      

The following code allows to create sets of TFRecords. This is a special data format suitable for processing on TPU.
If not for TPU, we could have easily use `keras.preprocessing.image_dataset_from_directory()`

In [None]:
AUTO = tf.data.experimental.AUTOTUNE                     # Configure Auto-tuning for better performance. To be applied in many functions below                                                                                                                                # 100 - 102

def decode_image(image_data):                            
    image = tf.image.decode_jpeg(image_data, channels=3) # Decode the JPEG image to a tensor with 3 color channels (red, green, blue)
    image = tf.cast(image, tf.float32) / 255.0           # Convert pixel values to floating-point numbers in the range [0, 1]
    image = tf.reshape(image, [*IMAGE_SIZE, 3])          # Reshape the image tensor to match the specified IMAGE_SIZE
                                                         # This step ensures that all images have the same dimensions for consistency
    return image

# This function reads a labeled TFRecord file and returns the image and its corresponding label (to be applied on training and validation sets)
def read_labeled_tfrecord(example):                                     # example: A single labeled TFRecord file (labled picture to be used for training and validation)
    LABELED_TFREC_FORMAT = {                                            # setting a dictionary that defines the format of a TFRecord (names and dtypes of its features)
        "image": tf.io.FixedLenFeature([], tf.string),                  # tf.string means bytestring
        "class": tf.io.FixedLenFeature([], tf.int64),                   # shape [] means single element
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT) # Parse the single TFRecord example according to the specified format.
    image = decode_image(example['image'])                              # Decode the 'image' feature of a TFRecord file using the 'decode_image' function (previously defined)                         
    label = tf.cast(example['class'], tf.int32)                         # tf.cast converts tensors from one data type to another. Here it ensures that all elements are integers
    return image, label                                                 # returns a dataset of (image, label) pairs. In Python you get a tuple with this syntaxis automatically


# This function reads an unlabeled TFRecord file and returns the image and its ID (to be applied on the test set)
def read_unlabeled_tfrecord(example):                       
    UNLABELED_TFREC_FORMAT = {                             
        "image": tf.io.FixedLenFeature([], tf.string),     
        "id": tf.io.FixedLenFeature([], tf.string),                       # Class is missing, this competitions's challenge is to predict flower classes for the test dataset
    }
    example = tf.io.parse_single_example(example, UNLABELED_TFREC_FORMAT) # Parse the single TFRecord example according to the specified format.
    image = decode_image(example['image'])                                # Decode the 'image' feature of a TFRecord example using the 'decode_image' function (previously defined)
    idnum = example['id']
    return image, idnum                                                   # Returns a dataset of (image, id) pairs. In Python you get a tuple with this syntaxis automatically


# Read from TFRecords. For optimal performance, reading from multiple files at once and disregarding data order.
def load_dataset(filenames, labeled=True, ordered=False):                 # We set values for 'labeled' and 'ordered' in the definition of the function to use them by default. However, we reserve an option to pass different values to these parameters.
    
    options = tf.data.Options()                                           # Creating an objects here looks like a TensorFlow reference code. It is literally the same as in the documentation
    if not ordered:                                                       # If the 'ordered' parameter is 'False' (the default) and hasn't been explicitly set to 'True' when passed to the function
        options.deterministic = False                                     # Disable order, increase speed

    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO) # automatically interleaves reads from multiple files. This 'num_parallel_reads=AUTO' parameter of tf.data.TFRecordDataset() will be used many times in the below code.
    dataset = dataset.with_options(options)                               # uses data as soon as it streams in, rather than in its original order
                          
                          # returns a dataset of (image, label) pairs if labeled=True
    dataset = dataset.map(read_labeled_tfrecord if labeled \
                          # returns a dataset of (image, id) pairs if labeled=False
                          else read_unlabeled_tfrecord,
                          num_parallel_calls=AUTO)                        
    return dataset

# Step 3: Create Pipelines #

In [None]:
def data_augment(image, label):
    seed  = 42                                                       # Setting the seed ensures reproducibility; otherwise, the learning process can produce different results each time, making it hard to control.
    image = tf.image.random_flip_left_right(image, seed=seed)        # These functions are included here to make you aware of their existence, but not all of them necessarily yield optimal performance on the given dataset.
    image = tf.image.random_flip_up_down(image, seed=seed)
#   image = tf.image.random_saturation(image, 0, 2, seed=seed)       # It doesn't seem a great idea to change colours of flowers. But it could work on images of a different kind
#   image = tf.image.random_brightness(image, 0.6, seed=seed)
#   image = tf.image.random_contrast(image, 0.3, 0.5, seed=seed)
    
    return image, label   

def get_training_dataset():
    dataset = load_dataset(TRAINING_FILENAMES, labeled=True)        # Check load_dataset function and recall that 'dataset = tf.data.TFRecordDataset()'' with its inherent parameters
    dataset = dataset.map(data_augment, num_parallel_calls=AUTO)    # Apply data_augment function
    dataset = dataset.repeat()                                      # The repeat method is called on the dataset to make it repeat indefinitely (for all the epochs)
    dataset = dataset.shuffle(2048)                                 # Shuffling the data is important during training to prevent the model from memorizing the order 
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)                                # Prefetch next batch while training. Thanks to this statement, data pipeline code is executed on the CPU, 
                                                                    # saving the TPU capacities for computing gradients.
    return dataset

def get_validation_dataset(ordered=False):
    dataset = load_dataset(VALIDATION_FILENAMES, labeled=True, ordered=ordered) # 'ordered=ordered' passes the 'ordered' parameter's value from the overarching function 'get_validation_dataset'
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()                                                   # Caching the dataset means that it is temporarily stored in RAM, making it faster to access during subsequent epochs 
    dataset = dataset.prefetch(AUTO)                                            
    return dataset

def get_test_dataset(ordered=False):
    dataset = load_dataset(TEST_FILENAMES, labeled=False, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

def count_data_items(filenames):             # The number of data items is written in the name of the .tfrec files, i.e. flowers00-230.tfrec = 230 data items                            
    n = [int(re.compile(r"-([0-9]*)\.")      # This is a 'regular expression'. re.compile() creates here a pattern where a number appears between a hyphen and a period, 
               .search(filename)             # looks for the pattern in the filenames
               .group(1))                    # returns what was found. re.group() regulates which part of a pattern to return: re.group(0) returns the entire matched pattern, and re.group(n) returns the respective subpattern if the pattern contains a number of them.
                for filename in filenames]                          
    return np.sum(n)

In [None]:
NUM_TRAINING_IMAGES     = count_data_items(TRAINING_FILENAMES)
NUM_VALIDATION_IMAGES   = count_data_items(VALIDATION_FILENAMES)
NUM_TEST_IMAGES         = count_data_items(TEST_FILENAMES)

print(f'Dataset: \n'
      f'{NUM_TRAINING_IMAGES} training images \n'
      f'{NUM_VALIDATION_IMAGES} validation images \n'
      f'{NUM_TEST_IMAGES} unlabeled test images')

This next cell will create the datasets (traning, validation and test)
- These datasets are `tf.data.Dataset` objects. You can think about a dataset in TensorFlow as a *stream* of data records. 
- The training and validation sets are streams of `(image, label)` pairs.
- The test set is a stream of `(image, idnum)` pairs; we'll use these `idnum` (ID numbers) later to make our submission `csv` file.

In [None]:
ds_train = get_training_dataset()
ds_valid = get_validation_dataset()
ds_test  = get_test_dataset()

In [None]:
# Let's take a look at the data shapes
np.set_printoptions(threshold=15, linewidth=80)          # Set the print options for NumPy to control the way arrays are displayed. This is in order to display only a part rather then all the information

print("Training data shapes:")
for image, label in ds_train.take(3):                    # Iterate through the first 3 elements of the training dataset
    print(image.numpy().shape, label.numpy().shape)      # .numpy() converts a TensorFlow tensor to a NumPy array
print("Training data label examples:", label.numpy())

print ('---')

print("Test data shapes:")
for image, idnum in ds_test.take(3):
    print(image.numpy().shape, idnum.numpy().shape)
print("Test data IDs:", idnum.numpy().astype('U')) # U=unicode string

# Step 4: Explore Data #
Let's take a moment to look at some of the images in the dataset.

In [None]:
from matplotlib import pyplot as plt

def batch_to_numpy_images_and_labels(data):
    images, labels = data                                          # unpack the tuples of (image, label) and (image, idnum). See above read_labeled_tfrecord and read_unlabeled_tfrecord functions
    numpy_images   = images.numpy()                                # .numpy() converts a TensorFlow tensor to a NumPy array
    numpy_labels   = labels.numpy()
    if numpy_labels.dtype == object:                               # Remember,that in our case`label` is tf.int64 (numeric format) and `idnum` is tf.string (bytestring, an `object`) 
        numpy_labels = [None for _ in enumerate(numpy_images)]     # So, if numpy_labels ends up carring 'idnum' values (not the 'labels'), this statement sets them to None (for test data)
    return numpy_images, numpy_labels


# A function to generate a title based on the predicted and true target values
def title_from_label_and_target(label, correct_label):             # it takes predictions (labels) and true values (correct_label) as arguments
    if correct_label is None:                                      # if we deal with the test set, where no correct_labels are availible 
        return CLASSES[label], True                                # it simply returns the prediction
    
    correct = (label == correct_label)                             # if target value (correct_label) is availible, it compares it with the prediction and returns a boolean value (True/False)
    
    return "{} [{}{}{}]".format(CLASSES[label],                    # returns the prediction
            'OK' if correct else 'NO',                             # 'OK' if it is True, 'NO'          if it is False
            u"\u2192" if not correct else '',                      # ''   if it is True, '→'           if it is False 
            CLASSES[correct_label] if not correct else ''),correct # ''   if it is True, correct_label if it is False, separate value for 'True' or 'False'
                                                         
                                                            

    
# a function to display a single flower image with a title
def display_one_flower(image, title, subplot,                       # subplot is what you need to display several pictures at once (on one plot)
                       red=False, titlesize=16):
    plt.subplot(*subplot)                                           # '*subplot' syntax unpacks the values in the subplot tuple (rows, columns, index) that specify the subplot layout 
    plt.axis('off')
    plt.imshow(image)                                               # plt.imshow stands for 'show image'
    if len(title) > 0:                                                    # if title is avaliable
        plt.title(title,                                                  # set parameters for this title's display
          fontsize = int(titlesize) if not red else int(titlesize/1.2),   # bigger fontsize for correct (black) titles, smaller fontsize for the wrong (red) titles
          color='red' if red else 'black',                                # depending on the argument passed to the function
          fontdict={'verticalalignment':'center'}, 
          pad=int(titlesize/1.5))
    
    return (subplot[0],                                                   # the number of rows in the subplot grid
            subplot[1],                                                   # the number of columns in the subplot grid
            subplot[2]+1)                                                 # the current index (position) within the grid. +1 makes it an iterator: each time you call this funtion, it moves to the next image
    

# this function makes several pictures appear on the screen at the same time
def display_batch_of_images(databatch, predictions=None):
    """This functions works with following settings:
    display_batch_of_images(images)
    display_batch_of_images(images, predictions)
    display_batch_of_images((images, labels))
    display_batch_of_images((images, labels), predictions)
    """
    images, labels = batch_to_numpy_images_and_labels(databatch) # data
    if labels is None:
        labels = [None for _ in enumerate(images)]               # creates a list of None values with the same length as the images list: to ensure that there is a label for each image.
    rows = int(math.sqrt(len(images)))                           # auto-squaring: this will drop data from the display that does not fit into square or square-ish rectangle
    cols = len(images)//rows                                     # calculates the number of columns based on the number of rows and the total number of images. It uses integer division (//) to ensure that the grid is as square as possible.
        
    FIGSIZE = 13.0
    SPACING = 0.1
    subplot = (rows,cols,1)                                      # you allready know that subplot has three parameters: (rows, columns, index)
    if rows < cols:                                              # if there are more columns then rows
        plt.figure(figsize=(FIGSIZE,FIGSIZE/cols*rows))          # set portrait (tall) orientation
    else:                                                        # if the are more rows then colums
        plt.figure(figsize=(FIGSIZE/rows*cols,FIGSIZE))          # set landscape (wide) orientation
    
    # display
    display_dict = zip(images[:rows*cols], labels[:rows*cols])                      # a dictionary with a subset of images as keys and a subset of labels as values. The subsets start from the beginning and contain rows*cols elements 
    for i, (image, label) in enumerate (display_dict):                              # an iterator
        title = '' if label is None else CLASSES[label]                             # determine the title for the subplot based on the label
        correct = True                                                              # set the default value for 'correct'
        if predictions is not None:                                                 # if predictions are passed to the function
            title, correct = title_from_label_and_target(predictions[i], label)     # apply the above formular, passing the predictions' indexes from the iterator and the corresponding labeles 
        dynamic_titlesize  = FIGSIZE*SPACING/max(rows,cols)*40+3                    # magic formula tested to work from 1x1 to 10x10 images
        subplot = display_one_flower(image, title, subplot, 
                                     not correct,                                   # this is a value for parameter 'red' of display_one_flower function. So, if the prediction is False (correct=False), were turn it around (not correct) and pass True (red=True) to the function
                                     titlesize=dynamic_titlesize)
    
    #layout
    plt.tight_layout()                                                # ensure that the subplots (in this case, the displayed images and titles) fit within the figure without overlapping or being cut off.
    if label is None and predictions is None:                         # if there are no predictions and true labels     
        plt.subplots_adjust(wspace=0, hspace=0)                       # no spacing between the images             
    else:                                                             # otherwise
        plt.subplots_adjust(wspace=SPACING, hspace=SPACING)           # make spaces
    plt.show()

- You can display a single batch of images from a dataset with another of our helper functions. 
- The next cell will turn the dataset into an iterator of batches of 20 images.
- Use the Python `next` function to pop out the next batch in the stream and display it with the helper function.
- By defining `ds_iter` and `one_batch` in separate cells, you only need to rerun the second cell to see a new batch of images.

In [None]:
ds_iter = iter(ds_train.unbatch().batch(20))

In [None]:
one_batch = next(ds_iter)
display_batch_of_images(one_batch)

# Step 5: Callbacks
- A callback is a .fit() parameter, where you can pass different objects:
- **`Learning Rate Schedule`**: adjusts the learning rate e.g. after a certain number of epochs or when the training loss plateaus or else
- **`Early Stopping`**: stops learning if there is no improvement after several epochs
- **`Checkpoint`**: saves weights at the end of every epoch, if it's the best seen so far during model.fit

- There are [many other options](https://keras.io/api/callbacks/). But we will make these three here.
We will apply the same callbacks for all the models that follow.

In [None]:
EPOCHS = 30                                           # EarlyStopping should break it sooner
STEPS_PER_EPOCH = NUM_TRAINING_IMAGES // BATCH_SIZE   # Batches per epoch

### Callback 1: learning rate schedule

In [None]:
def exponential_lr(epoch,                                   # The current training epoch
                   start_lr = 0.00001,                      # The initial learning rate
                   min_lr = 0.00001,                        # The minimum learning rate
                   max_lr = 0.00005,                        # The maximum learning rate
                   rampup_epochs = 5,                       # The number of epochs for a linear increase in learning rate
                   sustain_epochs = 0,                      # The number of epochs to sustain the maximum learning rate
                   exp_decay = 0.8):                        # The exponential decay factor for learning rate reduction

    # calculates the learning rate for a given epoch based on the provided parameters
    def lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay): 
        
        if epoch < rampup_epochs:                             # For epochs less than rampup_epochs, the learning rate increases from start_lr to max_lr.
            lr = ((max_lr - start_lr) /
                  rampup_epochs * epoch + start_lr)
        
        elif epoch < rampup_epochs + sustain_epochs:          # From 'rampup_epochs' till 'rampup_epochs + sustain_epochs', the learning rate remains constant at max_lr
            lr = max_lr
        
        else:                                                 # exponential decay towards min_lr
            lr = ((max_lr - min_lr) *
                  exp_decay**(epoch - rampup_epochs - sustain_epochs) +
                  min_lr)
        return lr
    return lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay)

# This is what it was all about. We pass our customary funtion to the keras LearningRateScheduler to create a callback
lr_callback = tf.keras.callbacks.LearningRateScheduler(exponential_lr, verbose=True)  

# plot our customary learning rate per epoch
rng = [i for i in range(EPOCHS)]     
y = [exponential_lr(x) for x in rng]
plt.plot(rng, y)
print(f'Learning rate schedule: \n'
      f'from {y[0]:.3g} \n'
      f'to {max(y):.3g} \n'
      f'and then back to {y[-1]:.3g}')

### Callback 2: EarlyStopping
It will stop training when there is no improvement in the validation loss during the specified number of consecutive epochs. 

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    monitor='val_loss',
    min_delta=0.001,              # minimum change to count as an improvement
    patience=5,                   # how many epochs to wait before stopping
    restore_best_weights=True,
)

### Callback 3. Checkpoint
- Checkpoint allows recording the best performing configuration of a model's weights.
- It also allows you to upload these weights to the model from a file instead of training it again.

In [None]:
Xception_checkpoint_filepath = 'Xception.h5'

Xception_checkpoint = tf.keras.callbacks.ModelCheckpoint(
                        filepath=Xception_checkpoint_filepath,
                        save_weights_only=True,
                        monitor='val_loss',
                        mode='min',
                        save_best_only=True
)

# Step 6: Model_1 - Transfer learning

Now we're ready to create a neural network for classifying images! 
- We'll use what's known as **transfer learning**: take a pretrained heavy model (base) and set your keras model on top of it (head)
- We will use **Xception** as the base (cause it performs well on this dataset). Run the cell below to see the list of avalible bases in Keras
- The distribution strategy we created earlier contains a [context manager](https://docs.python.org/3/reference/compound_stmts.html#with), `strategy.scope`. When using a TPU, it's important to define your model in a strategy.scope() context.

In [None]:
# The list of avalible pretrained models (bases)
', '.join(tf.keras.applications.__dir__())

In [None]:
with strategy.scope():
    pretrained_model = tf.keras.applications.Xception(
                       weights='imagenet',                   
                       include_top=False,                         # we will build our own head on top of this base, so we tell the the strategy to 'decapitate' the base
                       input_shape=[*IMAGE_SIZE, 3]               # '*' unpacks the IMAGE_SIZE tuple, passing it's two elements as separate values
    )
    pretrained_model.trainable = False                            # transfer learning
    
    Xception = tf.keras.Sequential([                              # Here is our eventual model:
        pretrained_model,                                         # add the pretrained base  
        tf.keras.layers.GlobalAveragePooling2D(),                 # attach a new head (GlobalAveragePooling averages feature maps produced by the base down to a single value per feature. Which is just right for a classification)
        tf.keras.layers.Dropout(0.3),                             # add regularization 
        tf.keras.layers.Dense(                                    # output layer where
                            len(CLASSES),                         # number of neurons corresponds to the number of classes
                            activation='softmax')                 # this is the activation function you want to use for a multi-class classification task
    ])

In [None]:
Xception.compile(
    optimizer= 'nadam',                            # Nesterov-accelerated Adaptive Moment Estimation (nadam) is an extension of Adaptive Moment Estimation (adam)
    loss     = 'sparse_categorical_crossentropy',  # The one you need for a multi-class classification
    metrics  = ['sparse_categorical_accuracy'],    # The one you need for a multi-class classification
)

In [None]:
Xception_training = Xception.fit(
                    ds_train,
                    validation_data=ds_valid,
                    epochs=EPOCHS,
                    steps_per_epoch=STEPS_PER_EPOCH,
                    callbacks=[lr_callback, early_stopping, Xception_checkpoint]    # Here is where our callbacks go
)

In [None]:
Xception_history_frame = pd.DataFrame(Xception_training.history)
Xception_history_frame.loc[:, ['loss', 'val_loss']].plot()
Xception_history_frame.loc[:, ['sparse_categorical_accuracy', 'val_sparse_categorical_accuracy']].plot()

# Step 7: Model_2 - Custom CNN

In [None]:
with strategy.scope():
    custom_model = keras.Sequential([
        # Preprocessing
        preprocessing.RandomFlip(mode='horizontal'),  # meaning, left-to-right
        preprocessing.RandomFlip(mode='vertical'),    # meaning, top-to-bottom
        preprocessing.RandomRotation(factor=0.20),
        preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),

         # Block One
        layers.BatchNormalization(renorm=True),
        layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
        layers.MaxPool2D(),

        # Block Two
        layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
        layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
        layers.MaxPool2D(),

        # Block Three
        layers.BatchNormalization(renorm=True),
        layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
        layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
        layers.Conv2D(filters=256, kernel_size=3, activation='relu', padding='same'),
        layers.MaxPool2D(),

        # Head
        layers.BatchNormalization(renorm=True),
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.1),
        layers.Dense(len(CLASSES), activation='softmax') 
    ])


In [None]:
custom_model.compile(
    optimizer='nadam',
    loss='sparse_categorical_crossentropy',
    metrics=['sparse_categorical_accuracy'],
)

In [None]:
# Checkpoint for the second model
custom_model_checkpoint_filepath = 'custom_model.h5'

custom_model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
                        filepath=custom_model_checkpoint_filepath,
                        save_weights_only=True,
                        monitor='val_loss',
                        mode='min',
                        save_best_only=True
)

In [None]:
custom_model_training = custom_model.fit(
                ds_train,
                validation_data=ds_valid,
                epochs=EPOCHS,
                steps_per_epoch=STEPS_PER_EPOCH,
                callbacks=[lr_callback, early_stopping, custom_model_checkpoint]
)

In [None]:
history_frame = pd.DataFrame(custom_model_training.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['sparse_categorical_accuracy', 'val_sparse_categorical_accuracy']].plot()

# Step 7: Visual Validation
- We will apply our **display_batch_of_images()** function to see the flowers, their predicted and true classes.
- **Visual validation** can help reveal patterns of images the model has trouble with.

In [None]:
# Visual validation
dataset = get_validation_dataset()
dataset = dataset.unbatch().batch(20)     # Display 20 images at a time. Fill free to put your number
batch = iter(dataset)

In [None]:
# Xception
# Run the cell again to see another set
images, labels = next(batch)
probabilities = Xception.predict(images)
predictions = np.argmax(probabilities, axis=-1)
display_batch_of_images((images, labels), predictions)

In [None]:
# custom_model
# Run the cell again to see another set
images, labels = next(batch)
probabilities = custom_model.predict(images)
predictions = np.argmax(probabilities, axis=-1)
display_batch_of_images((images, labels), predictions)

# Step 9: Ensemble

In [None]:
cmdataset = get_validation_dataset(ordered=True)                               # since we are splitting the dataset and iterating separately on images and labels, order matters.
images_ds = cmdataset.map(lambda image, label: image)                          # makes a data set of images only
labels_ds = cmdataset.map(lambda image, label: label).unbatch()                # makes a data set of labels only

cm_correct_labels = next(iter(labels_ds.batch(NUM_VALIDATION_IMAGES))).numpy() # gets everything as one batch in np.array format.

# Xception.load_weights('/kaggle/input/petals-to-the-metal/Xception.h5')
# custom_model.load_weights('/kaggle/working/custom_model.h5')

predictions_1 = Xception.predict(images_ds)
predictions_2 = custom_model.predict(images_ds)

scores = []                                                                  # creat a list to store F1 scores for different alpha values. 
for alpha in np.linspace(0,1,100):                                           # alpha is one of 100 evenly spaced values in range from 0 to 1
    cm_probabilities = alpha * predictions_1 + (1-alpha) * predictions_2     # Combine predictions from two models using the alpha weight
    cm_predictions = np.argmax(cm_probabilities, axis=-1)                    # For each example, select the class with the highest probability as the predicted class.
    scores.append(f1_score(cm_correct_labels,                                # Calculate the F1 score between the correct labels 
                           cm_predictions,                                   # and the combined predictions.
                           labels=range(len(CLASSES)),                       # It computes the F1 score for each class 
                           average='macro'))                                 # and returns the macro-average.

print("Correct labels: ",   cm_correct_labels.shape, cm_correct_labels)
print("Predicted labels: ", cm_predictions.shape,    cm_predictions)
plt.plot(scores)

best_alpha = np.argmax(scores)/100                                           # 'scores' is a list of 100 avarage F1 values. We find the index of max F1 value. When we devide this index (e.g.35) by 100, we find our alpha, that produced that F1 index. Tricky, but elegant.
print (f'best_alpha: {best_alpha}')

# Step 10: Make Test Predictions

In [None]:
test_ds = get_test_dataset(ordered=True)

print('Computing predictions...')
test_images_ds = test_ds.map(lambda image, idnum: image)

m1 = Xception.predict(test_images_ds)
m2 = custom_model.predict(test_images_ds)

probabilities = best_alpha*m1+(1-best_alpha)*m2
predictions = np.argmax(probabilities, axis=-1)    # Find the class with the highest probability for each image:  
                                                   # 'predictions' is a sequence of matrixes, where each matrix (idividual prediction) consists of 2 vectors: (1) all the classes (indexes) and (2) probabilities of an image to be that class
                                                   # .argmax() returns index of a maximum value, i.e. class related to the highest probability value
print(predictions)

In [None]:
print('Generating submission.csv file...')

test_ids_ds = test_ds.map(lambda image, idnum: idnum).unbatch()                # Get image ids from test set 
test_ids = next(iter(test_ids_ds.batch(NUM_TEST_IMAGES))).numpy().astype('U')  # convert them to unicode
submission_df = pd.DataFrame({'id': test_ids, 'label': predictions})
submission_df.to_csv('submission.csv', index=False)


# Look at the first few predictions
!head submission.csv

# Step 11: Make a submission 

If you haven't already, create your own editable copy of this notebook by clicking on the **Copy and Edit** button in the top right corner. Then, submit to the competition by following these steps:

1. Begin by clicking on the blue **Save Version** button in the top right corner of the window.  This will generate a pop-up window.  
2. Ensure that the **Save and Run All** option is selected, and then click on the blue **Save** button.
3. This generates a window in the bottom left corner of the notebook.  After it has finished running, click on the number to the right of the **Save Version** button.  This pulls up a list of versions on the right of the screen.  Click on the ellipsis **(...)** to the right of the most recent version, and select **Open in Viewer**.  This brings you into view mode of the same page. You will need to scroll down to get back to these instructions.
4. Click on the **Output** tab on the right of the screen.  Then, click on the file you would like to submit, and click on the blue **Submit** button to submit your results to the leaderboard.

# Step 12. Upvote the notebook

I hope you enjoyed this journey, and as promised, it was indeed a pleasure. <br> If you've read this far please consider upvoting the notebook. Comments are welcome.