## TPU(Tensor Processing Unit)

TPUs are hardware accelerators specialized in deep learning taks. The extra hardware can be used to accelerate training by increasing the training batch size.

TPUs pair a classic vector processor with a dedicated matrix multiply unit and excel at any task where large matrix multiplications dominate, such as neural networks.

### The hardware
#### MXU and VPU

A TPU v2 core is made of a Matrix Multiply Unit (MXU) which runs matrix multiplications and a Vector Processing Unit (VPU) for all other tasks such as activations, softmax, etc. The VPU handles float32 and int32 computations. The MXU on the other hand operates in a mixed precision 16-32 bit floating point format.

### Mixed precision floating point and bfloat16

The MXU computes matrix multiplications using bfloat16 inputs and float32 outputs. Intermediate accumulations are performed in float32 precision. Google introduced the bfloat16 format in TPUs. bfloat16 is a truncated float32 with exactly the same exponent bits and range as float32. This, added to the fact that TPUs compute matrix multiplications in mixed precision with bfloat16 inputs but float32 outputs, means that, typically, no code changes are necessary to benefit from the performance gains of reduced precision.

### The software

#### Large batch size training

The ideal batch size for TPUs is 128 data items per TPU core but the hardware can already show good utilization from 8 data items per TPU core. Remember that one Cloud TPU has 8 cores.

In this notebook, we will be using the Keras API. In Keras, the batch you specify is the global batch size for the entire TPU. Our batches will automatically be split in 8 and ran on the 8 cores of the TPU.

## Imports

In [None]:
from kaggle_datasets import KaggleDatasets
import numpy as np
import tensorflow as tf
import re
from tensorflow import keras

print(f'Tensorflow version: {tf.__version__}')

## Distribution Strategy

In [None]:
# TPU detection
try:
    # TPU detection. No parameters necessary if TPU_NAME environment variable is set.
    # On Kaggle this is always the case.
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

# TPUStrategy for distributing training
if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else: # default strategy that works on CPU and single GPU
    strategy = tf.distribute.get_strategy()

print('Replicas ',strategy.num_replicas_in_sync)

In this code snippet :

- **TPUStrategyResolver()** finds the TPU on the network.
- **TPUStrategy** is the part that implements the distribution and the 'All-Reduce' gradient synchronization algorithm.
- The strategy is applied through a scope. The model must be defined within the strategy scope().
- The **tpu_model.fit** function expects a tf.data.Dataset object for input for TPU training.

In [None]:
AUTO = tf.data.experimental.AUTOTUNE

# CONFIGURATIONS
IMAGE_SIZE =  [192, 192]
EPOCHS = 20
FOLDS = 3
BATCH_SIZE = 16 * strategy.num_replicas_in_sync

## Loading Data

**Get GCS Path**
* When used with TPUs, datasets must be stored in a [Google Cloud Storage Buckets](https://cloud.google.com/storage).

We can use data from any public GCS bucket by giving its path just like we would data from '/kaggle/input'. The following will retrieve the GCS path for this competition's dataset.

# Data Directories

In [None]:
GCS_DS_PATH = KaggleDatasets().get_gcs_path()
print(GCS_DS_PATH)

GCS_PATH_SELECT = { # Image Sizes
    192: GCS_DS_PATH + '/tfrecords-jpeg-192x192',
    224: GCS_DS_PATH + '/tfrecords-jpeg-224x224',
    331: GCS_DS_PATH + '/tfrecords-jpeg-331x331',
    512: GCS_DS_PATH + '/tfrecords-jpeg-512x512'
}

GCS_PATH = GCS_PATH_SELECT[IMAGE_SIZE[0]]
print(GCS_PATH)

TRAINING_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/train/*.tfrec')
VALIDATION_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/val/*.tfrec')
TEST_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/test/*.tfrec') # predictions on this dataset should be submitted for the competition

# CLASSES

In [None]:
# 0 to 103
CLASSES = [
    "pink primrose",
    "hard-leaved pocket orchid",
    "canterbury bells",
    "sweet pea",
    "wild geranium",
    "tiger lily",
    "moon orchid",
    "bird of paradise",
    "monkshood",
    "globe thistle",
    "snapdragon",
    "colt's foot",
    "king protea",
    "spear thistle",
    "yellow iris",
    "globe-flower",
    "purple coneflower",
    "peruvian lily",
    "balloon flower",
    "giant white arum lily",
    "fire lily",
    "pincushion flower",
    "fritillary",
    "red ginger",
    "grape hyacinth",
    "corn poppy",
    "prince of wales feathers",
    "stemless gentian",
    "artichoke",
    "sweet william",
    "carnation",
    "garden phlox",
    "love in the mist",
    "cosmos",
    "alpine sea holly",
    "ruby-lipped cattleya",
    "cape flower",
    "great masterwort",
    "siam tulip",
    "lenten rose",
    "barberton daisy",
    "daffodil",
    "sword lily",
    "poinsettia",
    "bolero deep blue",
    "wallflower",
    "marigold",
    "buttercup",
    "daisy",
    "common dandelion",
    "petunia",
    "wild pansy",
    "primula",
    "sunflower",
    "lilac hibiscus",
    "bishop of llandaff",
    "gaura",
    "geranium",
    "orange dahlia",
    "pink-yellow dahlia",
    "cautleya spicata",
    "japanese anemone",
    "black-eyed susan",
    "silverbush",
    "californian poppy",
    "osteospermum",
    "spring crocus",
    "iris",
    "windflower",
    "tree poppy",
    "gazania",
    "azalea",
    "water lily",
    "rose",
    "thorn apple",
    "morning glory",
    "passion flower",
    "lotus",
    "toad lily",
    "anthurium",
    "frangipani",
    "clematis",
    "hibiscus",
    "columbine",
    "desert-rose",
    "tree mallow",
    "magnolia",
    "cyclamen ",
    "watercress",
    "canna lily",
    "hippeastrum ",
    "bee balm",
    "pink quill",
    "foxglove",
    "bougainvillea",
    "camellia",
    "mallow",
    "mexican petunia",
    "bromelia",
    "blanket flower",
    "trumpet creeper",
    "blackberry lily",
    "common tulip",
    "wild rose",
]

# DATASET FUNCTIONS

In [None]:
def decode_image(image_data):
    image = tf.image.decode_jpeg(image_data, channels = 3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.reshape(image, [*IMAGE_SIZE, 3])
    return image

In [None]:
def read_labeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        'image':tf.io.FixedLenFeature([], tf.string),
        'class':tf.io.FixedLenFeature([], tf.int64),
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)
    return image, label

In [None]:
def read_unlabeled_tfrecord(test_example):
    UNLABELED_TFREC_FORMAT = {
        'image':tf.io.FixedLenFeature([], tf.string),
        'id':tf.io.FixedLenFeature([], tf.string),
    }
    example = tf.io.parse_single_example(test_example,UNLABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    idnum = example['id']
    return image, idnum

In [None]:
def data_augment(image, label):
    # data augmentation. Thanks to the dataset.prefetch(AUTO) statement in the next function (below),
    # this happens essentially for free on TPU. Data pipeline code is executed on the "CPU" part
    # of the TPU while the TPU itself is computing gradients.
    image = tf.image.random_flip_left_right(image)
    return image, label 

In [None]:
def load_dataset(filenames, labeled = True, ordered = False):
    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False
        # disable order, increase speed
    # automatically interleaves reads from multiple files
    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads = AUTO)
    
    dataset = dataset.with_options(ignore_order)
    
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord, num_parallel_calls = AUTO)
    return dataset

In [None]:
def get_training_dataset():
    dataset = load_dataset(TRAINING_FILENAMES, labeled = True, ordered = False)
    dataset = dataset.map(data_augment, num_parallel_calls = AUTO)
    dataset = dataset.repeat()
    dataset = dataset.shuffle(2048)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO) # prefetch next batch while training (autotune prefetch buffer size)
    return dataset

In [None]:
def get_validation_dataset():
    dataset = load_dataset(VALIDATION_FILENAMES, labeled = True, ordered = False)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()
    dataset = dataset.prefetch(AUTO) # prefetch next batch while training (autotune prefetch buffer size)
    return dataset

In [None]:
def get_test_dataset():
    dataset = load_dataset(TEST_FILENAMES, labeled = False, ordered = True)
    dataset = dataset.batch(BATCH_SIZE)
    # prefetch next batch while training (autotune prefetch buffer size)
    dataset = dataset.prefetch(AUTO)
    return dataset

In [None]:
def count_data_items(filenames):
    # the number of data items is written in the name of the .tfrec files, 
    # i.e. flowers00-230.tfrec = 230 data items
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) for filename in filenames]
    return np.sum(n)

In [None]:
NUM_TRAINING_IMAGES = count_data_items(TRAINING_FILENAMES)
NUM_VALIDATION_IMAGES = count_data_items(VALIDATION_FILENAMES)
NUM_TEST_IMAGES = count_data_items(TEST_FILENAMES)
STEPS_PER_EPOCH = NUM_TRAINING_IMAGES // BATCH_SIZE
print('Dataset: {} training images, {} validation images, {} unlabeled test images'.format(NUM_TRAINING_IMAGES, NUM_VALIDATION_IMAGES, NUM_TEST_IMAGES))

In [None]:
training_dataset = get_training_dataset()
validation_dataset = get_validation_dataset()

# Visualizing dataset

In [None]:
# Iterate over first batch (128 images)
for img, label in training_dataset.take(1):
#     Get first 16 images
    data = [img[0:16,:,:,:].numpy(),label[0:16].numpy()]

In [None]:
data[0].shape, data[1].shape

In [None]:
import matplotlib.pyplot as plt

rows = 4
cols = 4
fig = plt.figure(figsize  = (18, 10))
for index in range(1, rows * cols + 1):
    ax = fig.add_subplot(rows, cols, index)
    img = data[0][index -1]
    label = data[1][index - 1]
    ax.axis('off')
    plt.imshow(img)
    plt.title(CLASSES[label])
plt.show()

# Custom Learning Rate Scheduler

In [None]:
# Learning rate schedule for TPU, GPU and CPU.
# Using an LR ramp up because fine-tuning a pre-trained model.
# Starting with a high LR would break the pre-trained weights.
LR_START = 0.00001
LR_MAX = 0.00005 * strategy.num_replicas_in_sync
LR_MIN = 0.00001
LR_RAMPUP_EPOCHS = 5
LR_SUSTAIN_EPOCHS = 0
LR_EXP_DECAY = 0.8

#  (0, LR_START) <-> (LR_RAMPUP_EPOCHS, LR_MAX)

def lrfun(epoch):
    if epoch < LR_RAMPUP_EPOCHS:
        # Equation of a straight line
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS*epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        lr = (LR_MAX - LR_MIN)*LR_EXP_DECAY**(epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS) + LR_MIN
    return lr

In [None]:
lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfun, verbose = True)

rng = [i for i in range(25 if  EPOCHS < 25 else EPOCHS)]
y = [lrfun(x) for x in rng]
plt.plot(rng, y)
print("Learning rate schedule: {:.3g} to {:.3g} to {:.3g}".format(y[0], max(y), y[-1]))

### Using Pretrained Model VGG16

In [None]:
# with strategy.scope():
#     pretrained_model = tf.keras.applications.VGG16(weights = 'imagenet', include_top = False, input_shape = [*IMAGE_SIZE, 3])
#     pretrained_model.trainable = False
    
#     model = tf.keras.Sequential([
#         pretrained_model,
#         tf.keras.layers.GlobalAveragePooling2D(),
#         tf.keras.layers.Dense(len(CLASSES), activation = 'softmax', dtype = 'float32')
#     ])
    
# model.compile(
#     optimizer = 'adam',
#     loss = 'sparse_categorical_crossentropy',
#     metrics = ['sparse_categorical_accuracy'])

# historical = model.fit(
#     training_dataset,
#     steps_per_epoch = STEPS_PER_EPOCH,
#     epochs = EPOCHS,
#     callbacks = [lr_callback],
#     validation_data = validation_dataset)

### Using Pretrained Model VGG19

In [None]:
# with strategy.scope():
#     pretrained_model = tf.keras.applications.VGG19(weights = 'imagenet', include_top = False, input_shape = [*IMAGE_SIZE, 3])
#     pretrained_model.trainable = False
    
#     model = tf.keras.Sequential([
#         pretrained_model,
#         tf.keras.layers.GlobalAveragePooling2D(),
#         tf.keras.layers.Dense(len(CLASSES), activation = 'softmax', dtype = 'float32')
#     ])
    
# model.compile(
#     optimizer = 'adam',
#     loss = 'sparse_categorical_crossentropy',
#     metrics = ['sparse_categorical_accuracy'])

# historical = model.fit(
#     training_dataset,
#     steps_per_epoch = STEPS_PER_EPOCH,
#     epochs = EPOCHS,
#     callbacks = [lr_callback],
#     validation_data = validation_dataset)

# Using Pretrained model DenseNet201

In [None]:
# with strategy.scope():
#     pretrained_model = tf.keras.applications.DenseNet201(weights = 'imagenet', include_top = False, input_shape = [*IMAGE_SIZE, 3])
#     pretrained_model.trainable = False
    
#     model = tf.keras.Sequential([
#         pretrained_model,
#         tf.keras.layers.GlobalAveragePooling2D(),
#         tf.keras.layers.Dense(len(CLASSES), activation = 'softmax', dtype = 'float32')
#     ])
    
# model.compile(
#     optimizer = 'adam',
#     loss = 'sparse_categorical_crossentropy',
#     metrics = ['sparse_categorical_accuracy'])

# historical = model.fit(
#     training_dataset,
#     steps_per_epoch = STEPS_PER_EPOCH,
#     epochs = EPOCHS,
#     callbacks = [lr_callback],
#     validation_data = validation_dataset)

# Using Trainable DenseNet201

In [None]:
with strategy.scope():
    pretrained_model = tf.keras.applications.DenseNet201(weights = 'imagenet', include_top = False, input_shape = [*IMAGE_SIZE, 3])
    pretrained_model.trainable = True
    
    model = tf.keras.Sequential([
        pretrained_model,
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(len(CLASSES), activation = 'softmax', dtype = 'float32')
    ])
    
model.compile(
    optimizer = 'adam',
    loss = 'sparse_categorical_crossentropy',
    metrics = ['sparse_categorical_accuracy'])

historical = model.fit(
    training_dataset,
    steps_per_epoch = STEPS_PER_EPOCH,
    epochs = EPOCHS,
    callbacks = [lr_callback],
    validation_data = validation_dataset)

# Plotting the results

In [None]:
training_loss = historical.history['loss']
training_sparse_categorical_accuracy = historical.history['sparse_categorical_accuracy']

validation_loss = historical.history['val_loss']
validation_sparse_categorical_accuracy = historical.history['val_sparse_categorical_accuracy']

epochs = np.arange(EPOCHS)

In [None]:
plt.subplots(1,2)

plt.subplot(1, 2, 1)
plt.plot(epochs, training_loss,label = 'Training Loss')
plt.plot(epochs, validation_loss, label = 'Validation Loss')
plt.xlabel('Epochs')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs, training_sparse_categorical_accuracy,label = 'Training Accuracy')
plt.plot(epochs, validation_sparse_categorical_accuracy, label = 'Validation Accuracy')
plt.xlabel('Epochs')
plt.legend()

plt.show()

## Confusion Matrix and Validation Score

In [None]:
# TODO

# Computing Predictions on testset

In [None]:
test_dataset = get_test_dataset()
test_images = test_dataset.map(lambda image, idnum: image)
prob = model.predict(test_images)
pred = np.argmax(prob, axis = -1)
print(pred)

In [None]:
test_ids_ds = test_dataset.map(lambda image, idnum: idnum).unbatch()
test_ids = next(iter(test_ids_ds.batch(NUM_TEST_IMAGES))).numpy().astype('U') # all in one batch
np.savetxt('submission.csv', np.rec.fromarrays([test_ids, pred]), fmt=['%s', '%d'], delimiter=',', header='id,label', comments='')