# Introduction

The challenge of [**Petals to the Metal - Flower Classification on TPU**](https://www.kaggle.com/c/tpu-getting-started/) is to build a machine learning model that classifies 104 type of flowers in a test dataset on Kaggle.com.  I have learned a lot when exploring different techniques to build, train and hypertune two convolutional neural networks.   I wrote a report covering the purpose, scope, methodology, results, conclusions of this study, and lessons learned from participating in this code competition.  Please take a look at my [**report**](https://saukha.weebly.com/uploads/1/1/8/0/118025626/flower_classification_cnn_-_sau_kha_fall_2020_rev.3.pdf), which is included in [**my porfolio**](https://saukha.weebly.com/portfolio.html) and leave me some comments below.

#### First Notebook  
After briefly exploring different pretrained models and differnt data dimensions, I built my first model with pretrained DenseNet201 on 512x512 data.  I used external data, data augmentation and L2 kernel regularizer to overcome overfitting. I also learned that adding a dropout layer and where to place it could make a difference.  (See [my first notebook](https://www.kaggle.com/saukha/petals-to-the-metals-flower-classification) on Kaggle.)   

#### Second Notebook  
In this notebook, I built on EfficientNetB7 as the pretrained base model, tried different amount of external data and other techniques.  I used various random data augmentation techniques and compared results to having them turned off, repositioned the dropout layer and compared results using dropout rate ranging from 0 to 0.5, added a BatchNorm layer, tried different factors for L2 regularizer for the kernel of the dense layer, and lastly, tried using "noisy-student" versus "imagenet" for initialized weights.  So far, I got the best score of 0.95403 with just a little bit external data (no augmentation) with the following hypertuning:  
* initialize with "imagenet"  
* 512x512 image size  
* no augmentation  
* not a lot of external data: just the 941 images from tf-flower-photo-tfrec/oxford_102_no_test/tfrecords-jpeg-512x512/ 
* added a dropout layer on top of the EfficientNetB7 base, with 0.4 drop rate  
* followed by Global_Average_Pooling2D layer and a batch normalization layer   
* and then the final dense layer, with a factor 0.035 for the L2 regularizer and "softmax" for activation  

I think EfficientNet rightfully earned its name "Efficient".  Even though there are seven blocks of a total of 806 layers at the base, plus a few layers I added on top, my model finished training 35/35 epochs in 5379.3 seconds.  The val_loss metrics are still in a gradually decreasing trend.  I could have let it train for more epochs to get a better score for a longer run-time but within 180 minutes.  Or, I could use the time quota to crank up just a little bit on the regularizer or dropout rate.  Anyway, after exploring and comparing with EfficientNetB7 with DenseNet201, I will choose EfficientNet over DenseNet201.  There are more flexibility as far as how many blocks of layers to use for your model: EfficientNet family has models lined up from B0 through B7. So, you have an option of how many layers to use as your base model.  See [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946), by Mingxing Tan, Quoc V. Le, arxiv.org, last 
update 11 September 2020, v5. 

#### Model Improvement   
I highly recommend to check out the following to take this code challenge to the next level.  There is a nice tutorial on TensorFlow.org.
* [tfa.image.random_cutout](https://www.tensorflow.org/addons/api_docs/python/tfa/image/random_cutout?hl=en),  
* [tf.image.crop_to_bounding_box](https://www.tensorflow.org/api_docs/python/tf/image/crop_to_bounding_box?version=nightly), and  
* [tf.image.sample_distorted_bounding_box](https://www.tensorflow.org/api_docs/python/tf/image/sample_distorted_bounding_box?version=nightly), [tf.image.draw_bounding_boxes](https://www.tensorflow.org/api_docs/python/tf/image/draw_bounding_boxes), and [tf.slice](https://www.tensorflow.org/api_docs/python/tf/slice).    

**Data Sources:**  
[Petals to the Metal - Flower Classification on TPU](https://www.kaggle.com/c/tpu-getting-started/data) - for this challenge  
[tf_flower_photo_tfrec](https://www.kaggle.com/kirillblinov/tf-flower-photo-tfrec) - external data shared by [Kirill Blinov](https://www.kaggle.com/kirillblinov)    

**Refernce:**  
[A Simple Petals TF 2.2 notebook](https://www.kaggle.com/philculliton/a-simple-petals-tf-2-2-notebook) by [Phil Culliton](https://www.kaggle.com/philculliton)  
[Create Your First Submission](https://www.kaggle.com/ryanholbrook/create-your-first-submission) by [Ryan Holbrook](https://www.kaggle.com/ryanholbrook). 

# Step 1a: Install new libraries

In [None]:
# upgrade pip if needed
#!pip install --upgrade pip  # get the latest version
# install tensorflow-addons
!pip install -q -U tensorflow-addons
# install EfficientNet
!pip install -q efficientnet

# Step 1b: Import Libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed on Kaggle
# It is defined by the [kaggle/python Docker image](https://github.com/kaggle/docker-python)
import re, os
import numpy as np 
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow_addons as tfa
import efficientnet.tfkeras as efn
from tensorflow.keras import regularizers      # mitigate overfitting 
from kaggle_datasets  import KaggleDatasets    # import kaggle data files
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
print("Tensorflow version " + tf.__version__)  # verify tensorflow versionis 2.x

# Step 2: Detect Hardware & Distribution Strategy

In [None]:
# Detect hardware, return appropriate distribution strategy
try:
    # TPU detection. No parameters necessary if TPU_NAME environment  
    # variable is set.  On Kaggle this is always the case.
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    # default distribution strategy in Tensorflow. 
    #  Works on CPU and single GPU.
    strategy = tf.distribute.get_strategy() 

print("REPLICAS: ", strategy.num_replicas_in_sync)

# Step 3: Set Input Paths

## Data Directories

In [None]:
# Input Direcotry
# Data files are available in the read-only "kaggle/input/" directory
# image files are in TFRecords format, each of which contains a sequence
# of records and can only be read sequentially.

IMG_DIM = '512x512'
for dirpath, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        if IMG_DIM in dirpath: # 
            print(os.path.join(dirpath, filename))

In [None]:
# Working Directory
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved 
# as output when you create a version using "Save & Run All" 
print('list of entries contained in /kaggle/working/:',tf.io.gfile.listdir('/kaggle/working'))   

# Temp Directory  
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of 
# the current session
!gsutil cp /kaggle/input/tpu-getting-started/sample_submission.csv /kaggle/temp/test.csv
print('list of entries contained in /kaggle/temp/:',tf.io.gfile.listdir('/kaggle/temp'))    

In [None]:
# Google Cloud Storage Data Paths
GCS_DS_PATH_0 = KaggleDatasets().get_gcs_path("tpu-getting-started")  # Google Cloud Storage
print(GCS_DS_PATH_0)
print('Entries in the bucket:')
!gsutil ls $GCS_DS_PATH_0 # list items in the bucket 

In [None]:
GCS_DS_PATH_2 = KaggleDatasets().get_gcs_path("tf-flower-photo-tfrec")  # Google Cloud Storage
print(GCS_DS_PATH_2)
print('Entries in the bucket:')
!gsutil ls $GCS_DS_PATH_2  # list items in the bucket 

## Image Files

In [None]:
# current competition data: "tpu-getting-started" 
VAL_FILENAMES_0 = tf.io.gfile.glob(
    GCS_DS_PATH_0 + '/tfrecords-jpeg-512x512/val/*.tfrec') 
TEST_FILENAMES_0 = tf.io.gfile.glob(
    GCS_DS_PATH_0 + '/tfrecords-jpeg-512x512/test/*.tfrec') 
TRAIN_FILENAMES_0 = tf.io.gfile.glob(
    GCS_DS_PATH_0 + '/tfrecords-jpeg-512x512/train/*.tfrec') 

# external data: "tf-flower-photo-tfrec" - duplicates removed
TRAIN_FILENAMES_2A = tf.io.gfile.glob(
    GCS_DS_PATH_2 + '/imagenet_no_test/tfrecords-jpeg-512x512/*.tfrec')  
TRAIN_FILENAMES_2B = tf.io.gfile.glob(
    GCS_DS_PATH_2 + '/inaturalist_no_test/tfrecords-jpeg-512x512/*.tfrec')
TRAIN_FILENAMES_2C = tf.io.gfile.glob(
    GCS_DS_PATH_2 + '/openimage_no_test/tfrecords-jpeg-512x512/*.tfrec')
TRAIN_FILENAMES_2D = tf.io.gfile.glob(
    GCS_DS_PATH_2 + '/tf_flowers_no_test/tfrecords-jpeg-512x512/*.tfrec')
TRAIN_FILENAMES_2E = tf.io.gfile.glob(
    GCS_DS_PATH_2 + '/oxford_102_no_test/tfrecords-jpeg-512x512/*.tfrec') 

 # Step 4: Set Parameters

In [None]:
# parameters 
IMAGE_SIZE         = [512, 512] 
HEIGHT             = IMAGE_SIZE[0]
WIDTH              = IMAGE_SIZE[1]
EPOCHS             = 35
BATCH_SIZE         = 16 * strategy.num_replicas_in_sync
DROP_RATE          = 0.4
REG_FACTOR         = 0.035
AUGMENTATION       = False
WEIGHTS            = 'imagenet'  #  or 'noisy-student'
MODEL_NAME         = 'EffNetB7_imagenet_NoAug_XD_DO_BN_v34.h5'

# competition and external data for training
TRAIN_FILENAMES    = TRAIN_FILENAMES_0 + TRAIN_FILENAMES_2E
VAL_FILENAMES      = VAL_FILENAMES_0
TEST_FILENAMES     = TEST_FILENAMES_0

# number of images
def count_data_items(filenames):
    # the number of data items is written in the name of the .tfrec files, 
    # i.e. flowers00-230.tfrec = 230 data items
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) 
         for filename in filenames]
    return np.sum(n)
NUM_TRAIN_IMAGES = count_data_items(TRAIN_FILENAMES)
NUM_VAL_IMAGES   = count_data_items(VAL_FILENAMES)
NUM_TEST_IMAGES  = count_data_items(TEST_FILENAMES)
STEPS_PER_EPOCH  = NUM_TRAIN_IMAGES // BATCH_SIZE
AUTO             = tf.data.experimental.AUTOTUNE

# print total number of images
print('Training Images:  ', NUM_TRAIN_IMAGES)
print('Validation Images:', NUM_VAL_IMAGES)
print('Test Images:      ', NUM_TEST_IMAGES)

In [None]:
# class names of flowers in the order of label idnum
CLASSES = [
    'pink primrose', 'hard-leaved pocket orchid', 'canterbury bells',   
    'sweet pea', 'wild geranium', 'tiger lily', 'moon orchid',             
    'bird of paradise', 'monkshood', 'globe thistle', 'snapdragon',                    
    "colt's foot", 'king protea', 'spear thistle', 'yellow iris',   
    'globe-flower', 'purple coneflower', 'peruvian lily', 'balloon flower', 
    'giant white arum lily', 'fire lily', 'pincushion flower',  
    'fritillary', 'red ginger', 'grape hyacinth', 'corn poppy',  
    'prince of wales feathers', 'stemless gentian', 'artichoke',
    'sweet william', 'carnation', 'garden phlox', 'love in the mist',  
    'cosmos', 'alpine sea holly', 'ruby-lipped cattleya', 'cape flower', 
    'great masterwort', 'siam tulip', 'lenten rose', 'barberton daisy',  
    'daffodil', 'sword lily', 'poinsettia', 'bolero deep blue',   
    'wallflower', 'marigold', 'buttercup', 'daisy', 'common dandelion',
    'petunia', 'wild pansy', 'primula',  'sunflower', 'lilac hibiscus', 
    'bishop of llandaff', 'gaura', 'geranium', 'orange dahlia', 
    'pink-yellow dahlia', 'cautleya spicata', 'japanese anemone', 
    'black-eyed susan', 'silverbush',  'californian poppy', 'osteospermum',      
    'spring crocus', 'iris', 'windflower', 'tree poppy', 'gazania', 
    'azalea', 'water lily', 'rose', 'thorn apple', 'morning glory',    
    'passion flower', 'lotus', 'toad lily', 'anthurium', 'frangipani',  
    'clematis', 'hibiscus', 'columbine', 'desert-rose', 'tree mallow', 
    'magnolia', 'cyclamen ', 'watercress', 'canna lily', 'hippeastrum ', 
    'bee balm', 'pink quill', 'foxglove', 'bougainvillea', 'camellia',  
    'mallow', 'mexican petunia', 'bromelia', 'blanket flower', 
    'trumpet creeper', 'blackberry lily', 'common tulip',  'wild rose'               
]    
print('Number of classes:', len(CLASSES))

# Step 5: Utility Functions to handle data

In [None]:
def decode_image(image_data):
    image = tf.image.decode_jpeg(image_data, channels=3)
    # convert image to floats in [0, 1] range
    image = tf.cast(image, tf.float32) / 255.0  
    # explicit size needed for TPU
    image = tf.reshape(image, [*IMAGE_SIZE, 3]) 
    return image
    
def read_labeled_tfrecord(example):
    """returns a dataset of (image, label) pairs"""
    LABELED_TFREC_FORMAT = {
        # shape [] means single element; tf.string means bytestring
        "image": tf.io.FixedLenFeature([], tf.string), 
        "class": tf.io.FixedLenFeature([], tf.int64), # from 0 to 103
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)
    return image, label 

def read_unlabeled_tfrecord(example):
    """returns a dataset of (image, idnum) pairs"""
    UNLABELED_TFREC_FORMAT = {
        # shape [] means single element; tf.string means bytestring 
        "image": tf.io.FixedLenFeature([], tf.string), 
        "id": tf.io.FixedLenFeature([], tf.string),    
        # class is missing, to be predicted flower classes for the test dataset
    }
    example = tf.io.parse_single_example(example, UNLABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    idnum = example['id']
    return image, idnum 

def load_dataset(filenames, labeled=True, ordered=False):
    """
    Read from TFRecords. For optimal performance, reading from multiple files 
    at once and disregarding data order. Order does not matter since we will
    be shuffling the data anyway.
    """
    ignore_order = tf.data.Options()
    if not ordered:
        # disable order, increase speed
        ignore_order.experimental_deterministic = False 
    
    # automatically interleaves reads from multiple file
    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO) 
    
    # uses data as soon as it streams in, rather than in its original order
    dataset = dataset.with_options(ignore_order) 
    
    # returns a dataset of (image, label) pairs if labeled=True 
    #   or (image, id) pairs if labeled=False
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord, 
                          num_parallel_calls=AUTO)   
    return dataset

def get_validation_dataset(ordered=False):
    dataset = load_dataset(VAL_FILENAMES,labeled=True, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.cache()
    # prefetch next batch while training (autotune prefetch buffer size)
    dataset = dataset.prefetch(AUTO) 
    return dataset

def get_test_dataset(ordered=True):  # order matters to submit predictions to Kaggle
    dataset = load_dataset(TEST_FILENAMES, labeled=False, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    # prefetch next batch while training (autotune prefetch buffer size)
    dataset = dataset.prefetch(AUTO) 
    return dataset


## Data Augmentation Modules

Data augmentation is a way to reduce overfitting by randomly altering the training images as they are fit to the model for training.  Thus the images vary different at each epoch as though they are a different dataset, thus increasing the data size.  Here is more info at [TensorFlow](https://www.tensorflow.org/tutorials/images/data_augmentation#overview).


## 1. Use tf.image or tfa.image  

As shown in the data_augment(image, label) function below, tf.image is used to apply random augmentation on one image at a time:

    image = tf.image.random_flip_left_right(image) 
    image = tf.image.random_contrast(image, lower=0.8, upper=1.2)
    image = tf.image.random_brightness(image,max_delta=0.1)
    image = tf.image.random_saturation(image, lower=0.7, upper=1.0)
    image = tfa.image.mean_filter2d(image, filter_shape = 3)
    etc

This function is then used to map to a batch of images at the time images are fetched prior to model training:

    if augmentation:
        # map the data_augment function to each image in dataset prefetched during training
        dataset = dataset.map(data_augment, num_parallel_calls=AUTO)   
        

## 2. Use Keras preprocessing layers  

I explored a few image preprocessing layers for augmentation:

    data_aug_layers = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.RandomRotation(
            0.125, fill_mode='constant'),
        tf.keras.layers.experimental.preprocessing.RandomZoom(
            (-0.17, -0.01), fill_mode='constant'),
        tf.keras.layers.experimental.preprocessing.RandomFlip(mode='horizontal'),
        tf.keras.layers.experimental.preprocessing.RandomTranslation(
            (-.15,.15), (-.15,.15), fill_mode='constant')
    ])

* **Option 1**: Apply the preprocessing layers to the dataset at the time images are fetched prior to model training:

        if augmentation:  
            # apply data augmentation preprocessing layers in batch of images
            dataset = dataset.map(lambda image, y: (data_aug_layers(image, training=True), y))  

* **Option 2**: Make the preprocessing layers part of the model  

        with strategy.scope():    
        pretrained_model = efn.EfficientNetB7(
            weights='imagenet', 
            include_top=False ,
            input_shape=[*IMAGE_SIZE, 3]
        )
        pretrained_model.trainable = True # transfer learning
        model = tf.keras.Sequential([
            pretrained_model, 
            data_aug_layers,   # preprocessing layers as part of the model                  
            tf.keras.layers.GlobalAveragePooling2D(),
            tf.keras.layers.Dense(len(CLASSES), kernel_regularizer=regularizers.L2(0.005), 
                activation='softmax')
        ])


## 3. ImageDataGenerator  
 
I am adding this third option to randomly tranform my training dataset.  This is great because one ImageDataGenerator with one random_transform method can do a lot of difference random augmentation on an image in one pass.  More info at [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator).

    tf.keras.preprocessing.image.ImageDataGenerator(
        featurewise_center=False, samplewise_center=False,
        featurewise_std_normalization=False, samplewise_std_normalization=False,
        zca_whitening=False, zca_epsilon=1e-06, rotation_range=0, width_shift_range=0.0,
        height_shift_range=0.0, brightness_range=None, shear_range=0.0, zoom_range=0.0,
        channel_shift_range=0.0, fill_mode='nearest', cval=0.0,
        horizontal_flip=False, vertical_flip=False, rescale=None,
        preprocessing_function=None, data_format=None, validation_split=0.0, dtype=None
    )
     
My understanding is to first create the image.ImageDataGenerator with whatever data argumentation arguments you choose. Here is an example from TensorFlow:
 
    img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)

Then, use the generator with the [random_transform method](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#random_transform) to apply a random transformation to a single image according to the pre-set arguments/ranges of the generator.  Below, input x is a 3D tensor, single image. The output is a randomly transformed version of x of the same shape.

    random_transform(
        x, seed=None
    )

## Random Data Augmentation Functions

In [None]:
################### tf.image; tfa.image ################

# define data augmentation function, one image at a time                  
def data_augment(image,  label):
    
    # using tf.image 
    image = tf.image.random_flip_left_right(image) 
    image = tf.image.random_contrast(image, lower=0.8, upper=1.2)
    image = tf.image.random_brightness(image, max_delta=0.1) 
    image = tf.image.random_saturation(image, lower=0.85, upper=1.0)
    # these commented out:
    # Pad the image with a black, 90-pixel border
    #image = tf.image.resize_with_crop_or_pad(
    #            image, HEIGHT + 180, WIDTH + 180
    #)
    # Randomly crop to original size from the padded image
    #image = tf.image.random_crop(image, size=[*IMAGE_SIZE,3])

    # using tfa.image 
    #rdn = tf.random.normal([1], mean=0, stddev=1, dtype=tf.float32) 
    #if rdn > 2.0:  # blur 2.5% of the images (1 tail, 2 stddev above mean)
    #    image = tfa.image.mean_filter2d(image, filter_shape = 3,
    #                                   padding='constant')
        
    return image, label

In [None]:
################# Keras preprocessing layers #################

# create image augmentation layers
# 0.125 rotation = 360*0.125 = 45 deg
data_aug_layers = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.RandomRotation(
        0.125, fill_mode='constant'),
    tf.keras.layers.experimental.preprocessing.RandomZoom(
        height_factor=(-0.5, 0.25), width_factor=(-0.5, 0.25), 
        fill_mode='constant')
])
# these layers removed:
#    tf.keras.layers.experimental.preprocessing.RandomTranslation(
#        (-.15,.15),(-.15,.15), fill_mode='constant'),
#    tf.keras.layers.experimental.preprocessing.RandomZoom(
#        (-0.17, -0.01), fill_mode='constant'),  
#    tf.keras.layers.experimental.preprocessing.RandomFlip(
#        mode='horizontal'), 

In [None]:
################### ImageDataGenerator  ###################

# create an ImageDataGenerator 
# update this based on image augmenation exploration results
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=45, width_shift_range=0.25, height_shift_range=0.25,
    brightness_range=None, zoom_range=[0.5, 1.25], shear_range=0.2, fill_mode='constant', 
    horizontal_flip=True, preprocessing_function=True
)

# define data augmentation function with random_transform method 
# for dataset.map( ... )
def img_gen_random_transform(image, label):
    # apply random_transform method to single image
    image = img_gen.random_transform(image)
    return image, label

def img_gen_random_transform_image(image):
    # apply random_transform method to single image
    image = img_gen.random_transform(image)
    return image

## Get Training Dataset Function
 `augmentation=True` or `augmentation=False`

In [None]:
# get training datatset with augmentation option
"""
But img_gen.random_transform raises errors in user code when call in get_training_dataset:
/opt/conda/lib/python3.7/site-packages/keras_preprocessing/image/image_data_generator.py
"""
def get_training_dataset(augmentation=False):
    dataset = load_dataset(TRAIN_FILENAMES, labeled=True, ordered=False)
    if augmentation:
        # map the data_augment function ones mages at a time
        dataset = dataset.map(data_augment, num_parallel_calls=AUTO)
        # map the img_gen_random_transform function across images of the dataset
        #dataset = dataset.map(img_gen_random_transform, num_parallel_calls=AUTO)  # didn't work  
               
    dataset = dataset.repeat() # the training dataset must repeat for several epochs
    dataset = dataset.shuffle(buffer_size=1920)
    dataset = dataset.batch(BATCH_SIZE)
    
    if augmentation:
        # apply data augmentation preprocessing layers in batch of images
        dataset = dataset.map(lambda image, y: (data_aug_layers(image, training=True), y))
        # map the img_gen_random_transform_image function in batch of images
        #dataset = dataset.map(lambda image, y: (img_gen_random_transform_image(image), y)) #didn't work
        
    dataset = dataset.prefetch(AUTO)  # prefetch next batch while training
    return dataset

## Image Visualization Functions (in batches)

In [None]:
np.set_printoptions(threshold=15, linewidth=80)

def batch_to_numpy_images_and_labels(databatch):
    images, labels = databatch
    numpy_images = images.numpy()
    numpy_labels = labels.numpy()
    if numpy_labels.dtype == object: # binary string in this case, these are image ID strings
        numpy_labels = [None for _ in enumerate(numpy_images)]
    # If no labels, only image IDs, return None for labels (this is the case for test data)
    return numpy_images, numpy_labels

def numpylabels_to_classlabels(numpy_labels):
    class_labels = []  # initialize list
    if numpy_labels[0] == None:
        class_labels = numpy_labels  # no label for test images
    else:
        for num in enumerate(numpy_labels):
            class_labels.append(CLASSES[num[1]])    
    return class_labels

def show_images(databatch, row=6, col=8):  # row, col of subplots
    numpy_images, numpy_labels = batch_to_numpy_images_and_labels(databatch)
    labels = numpylabels_to_classlabels(numpy_labels)   

    FIGSIZE = (col*3, row*3)  # 3X3 inch per image
    plt.figure(figsize=FIGSIZE)      
    for j in range(row*col):
        plt.subplot(row,col,j+1)
        plt.axis('off')
        plt.title(labels[j])
        plt.imshow(numpy_images[j])
    plt.show()


# Step 6: Image Augmentation Exploration

In [None]:
# get training dataset with augmentation=False
no_aug_train_set = get_training_dataset(augmentation=False)

In [None]:
# Re-run these codes to get the next batch of no aug training images
no_aug_train_batch = (next(iter(no_aug_train_set.unbatch().batch(16)))) 
images, _ = batch_to_numpy_images_and_labels(no_aug_train_batch)

## Explore image data augmentation: tf.image & tfa.image

In [None]:
# function to show image with random data augmentation
def show_data_aug(image):
    ROW=len(images)
    COL=7  # 1 no-aug plus 6 aug images
    plt.figure(figsize=(COL*2,ROW*2))
    i=0
    for image in images:
        plt.subplot(ROW,COL,i*COL+1)
        plt.title('rdm flip L/R')
        plt.axis('off')  
        # augmented with random flip
        plt.imshow(tf.image.random_flip_left_right(image))       

        plt.subplot(ROW,COL,i*COL+2)
        plt.title('resize & rdm crop')
        plt.axis('off')    
        # Pad the image with a black, 90-pixel border
        image1 = tf.image.resize_with_crop_or_pad(
            image, HEIGHT + 180, WIDTH + 180
        )
        # Randomly crop to original size from the padded image
        image1 = tf.image.random_crop(image1, size=[*IMAGE_SIZE,3])
        plt.imshow(image1)

        plt.subplot(ROW,COL,i*COL+3)
        plt.title('rdm contrast')
        plt.axis('off')
        # augmented with contrast
        plt.imshow(tf.image.random_contrast(image, lower=0.8, upper=1.2))  

        plt.subplot(ROW,COL,i*COL+4)
        plt.title('rdm brightness')
        plt.axis('off')
        # augmented with brightness
        plt.imshow(tf.image.random_brightness(image, max_delta=0.1))       

        plt.subplot(ROW,COL,i*COL+5)
        plt.title('no aug')
        plt.axis('off')
        plt.imshow(image)

        plt.subplot(ROW,COL,i*COL+6)
        plt.title('rdm saturation')
        plt.axis('off')
        # augmented with saturation
        plt.imshow(tf.image.random_saturation(image, lower=0.7, upper=1.3))  

        plt.subplot(ROW,COL,i*COL+7)
        plt.title('rdm blur')
        plt.axis('off')        
        # ouput a rdm value from a normal distribtion 
        rdn = tf.random.normal([1], mean=0, stddev=1, dtype=tf.float32)              
        if rdn > 2.0:  # 2 stddev above mean  
            # blur 2.5% of the images
            # using tfa.image mean filter
            plt.imshow(
                tfa.image.mean_filter2d(image, filter_shape = 3,
                padding='constant')
            )  
        else:
            plt.imshow(image)
            
        i+=1
        
    plt.show()

In [None]:
# Re-run this after adjusting image augmentation settings 
#   of the show_data_aug() function
# compare no aug training images with random data augmentation
print('Training Dataset')
print('Image Augmentation with tf.image and tfa.image')
show_data_aug(images)

## Explore image data augmentation: for keras.layers

In [None]:
# Run these to visualize effects before implementing in
#     tf.keras.layers.experimental.preprocessing.Random___()
print('Training Dataset')
print('Image Augmentation with tf.keras.preprocesing.image.random ...')
ROW=len(images)
COL=4  # 1 no-aug plus 3 aug images
plt.figure(figsize=(COL*3,ROW*3))
i=0
for image in images:
    plt.subplot(ROW,COL,i*4+1)
    plt.title('no aug')
    plt.axis('off')
    plt.imshow(image)
    
    plt.subplot(ROW,COL,i*4+2)
    plt.title('rdm shift')
    plt.axis('off')
    # random shift on one numpy image tensor 
    # compared to tf.keras.layers.experimental.preprocessing.RandomTranslation(...)
    image2 = tf.keras.preprocessing.image.random_shift(
        image, wrg=0.15, hrg=0.15, row_axis=1, col_axis=2, channel_axis=2,
        fill_mode='constant'
    )    
    plt.imshow(image2) 

    plt.subplot(ROW,COL,i*4+3)
    plt.title('rdm 45-deg rotation')
    plt.axis('off')
    # random rotation on one numpy image tensor
    # compared to tf.keras.layers.experimental.preprocessing.RandomRotation(...)
    image3 = tf.keras.preprocessing.image.random_rotation(
        image, rg=45, row_axis=1, col_axis=2, channel_axis=2, fill_mode='constant'
    )
    plt.imshow(image3)

    plt.subplot(ROW,COL,i*4+4)
    plt.title('rdm zoom')
    plt.axis('off')
    # random zoom on one numpy image tensor
    # comapred to tf.keras.layers.experimental.preprocessing.RandomZoom(...)
    image4 = tf.keras.preprocessing.image.random_zoom(
        #image, (.75, 1.0), row_axis=1, col_axis=2, channel_axis=2, fill_mode='constant'
        image, (.5, 1.25), row_axis=0, col_axis=1, channel_axis=2, fill_mode='constant'
    )
    plt.imshow(image4)
    i+=1
plt.show()

## Explore image data augmentation: ImageDataGenerator

### random_transform method

In [None]:
# create an ImageDataGenerator for random transformation
explore_img_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=45, width_shift_range=0.25, height_shift_range=0.25,
    brightness_range=None, zoom_range=[0.5, 1.25], shear_range=0.2, fill_mode='constant', 
    horizontal_flip=True, preprocessing_function=True
)

print('Training Dataset')
print('Image Augmentation with random_transform method from ImageDataGenerator')
i = 0
ROW=8  # rows of subplots
COL=4  # cols of subplots
plt.figure(figsize=(COL*3.4,ROW*3))
for im in images:
    plt.subplot(ROW,COL,i*2+1)
    plt.title('no augmentation')
    plt.axis('off')
    plt.imshow(im)
    plt.subplot(ROW,COL,i*2+2)
    plt.title('rdm transorm from img_gen')
    plt.axis('off')
    plt.imshow(explore_img_gen.random_transform(im))
    i+=1
plt.show()

### Using functions for dataset.map()

In [None]:
print('Training Dataset')
print('Image Augmentation with ImageDataGenerator')
i = 0
ROW=8  # rows of subplots
COL=4  # cols of subplots
plt.figure(figsize=(COL*3.4,ROW*3))

for im in images:
    plt.subplot(ROW,COL,i*2+1)
    plt.title('no augmentation')
    plt.axis('off')
    plt.imshow(im)
    # use one of the functions, both work here 
    #   but not with get_training_dataset()
    #im = img_gen_random_transform_image(im) # augmented
    im, _ = img_gen_random_transform(im, _) # augmented
    plt.subplot(ROW,COL,i*2+2)
    plt.title('rdm transorm from img_gen')
    plt.axis('off')
    plt.imshow(im)
    i+=1
plt.show()

## Decide on final settings for image data augmentation
After exploring data augmentation with tf.image, tfa.image, tf.kersa.preprocessing.image and the random_transform method of ImageDataGenerator, remember to go back to finalize the data augmentation function.  Note: for demonstration purpose, some augmentations were actually implemented with keras.layers, where the final settings for augmentation were adjusted, too.  

To train with data augmentation, set `augmentation=True` before getting traning images from `training_dataset = get_training_dataset(augmentation=True)`   

To train only with autoaug feature that comes with EfficientNetB7, get data with `augmentation=False`.

## Visualization: Sample Images of All Datasets

### observations  

The following allows me to get an idea about the three datasets.  I was surprised to see a picture that looks like a bridge (no flowers), a little girl (holding a very tiny flower behind her back) and a tatoo of flowers on someone's leg in the training dataset.  There is a blank (white) picture (label=70) in the validation dataset.  There is a picture of a fountain (didn't see flowers) in the test dataset.  There are pictures with people, hands, pets and insects in the pictures in all datasets.  Pictures taken from top view and side views, a single flower or flowers in bundles, close-up, from far in a meadow or along a riverside, zoomed-in and cropped, etc, are common in all datasets.

In [None]:
# Get datasets for visualization
training_dataset   = get_training_dataset(augmentation=True) # with data augmentation
validation_dataset = get_validation_dataset(ordered=False)
test_dataset       = get_test_dataset(ordered=False)  # not for prediction

In [None]:
# you may run these lines multiple times to view different samples from the image sets
R = 7     # rows of subplots/images
C = 6     # cols of subplots/images
B = R*C   # number of images in a batch
print('Training Images - optional random data augmentation')
show_images(next(iter(training_dataset.unbatch().batch(B))), row=R, col=C)

In [None]:
# you may run these lines multiple times to view different samples from the image sets
print('Validation Images')
show_images(next(iter(validation_dataset.unbatch().batch(B))), row=R, col=C)

In [None]:
# you may run these lines multiple times to view different samples from the image sets
print('Test Images - not ordered, shuffled') # randomly shuffles the test dataset for visualization
show_images(next(iter(test_dataset.unbatch().shuffle(buffer_size=BATCH_SIZE).batch(B))), 
            row=R, col=C)

# Step 7: Callbacks for Model Training 

## Learning Rate Scheduler callback  
The [Learning Rate Scheduler callback](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LearningRateScheduler) gets the updated learning rate value from schedule function at the beginning of each epoch.  The schedule function takes an epoch index (integer, indexed from 0) and current learning rate (float) as inputs and returns a new learning rate as output (float).  

    tf.keras.callbacks.LearningRateScheduler(
        schedule, verbose=0
    )

In [None]:
# define a fine-tuned schedule for the Learning Rate Scheduler 
def exponential_lr(epoch,
                  start_lr=0.00001,min_lr=0.00001,max_lr=0.00005,
                  rampup_epochs = 5, sustain_epochs = 0,
                  exp_decay = 0.8):
    def lr(epoch, start_lr, min_lr,max_lr,rampup_epochs,sustain_epochs,
          exp_decay):
        # linear increase from start to rampup_epochs
        if epoch < rampup_epochs:
            lr= ((max_lr-start_lr)/
                rampup_epochs * epoch + start_lr)
        elif epoch < rampup_epochs + sustain_epochs:
            lr = max_lr 
        else:
            lr = ((max_lr - min_lr)* exp_decay ** (epoch-rampup_epochs-sustain_epochs)
                  + min_lr)
            
        return lr
    return lr(epoch,start_lr,min_lr,max_lr,rampup_epochs,sustain_epochs,exp_decay)

# set learning rate scheduler for callback
lr_callback = tf.keras.callbacks.LearningRateScheduler(schedule=exponential_lr,verbose=True)

# learning rate chart
epoch_rng = [i for i in range(EPOCHS)] 
y = [exponential_lr(x) for x in epoch_rng]
plt.plot(epoch_rng,y)
plt.xlim(-1, EPOCHS)

print("Learning rate schedule: start = {:.3g}; peak = {:.3g}; end = {:.3g}".format(y[0], max(y), y[-1]))

## EarlyStopping callback
Reference: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping    
The metric "val_loss' will be monitored and allows training to be stopped early when the metric stops improving.  I set patience = 2, so training will be stopped after 2 epochs with no improvement.

    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', min_delta=0, patience=0, verbose=0,
        mode='auto', baseline=None, restore_best_weights=False
    )

In [None]:
# set earlystopping for callback
es_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

## Checkpoint callback

I referenced tutorials on TensorFlow on [Save and load models](https://www.tensorflow.org/tutorials/keras/save_and_load) and on [Save checkpoints during training](
https://www.tensorflow.org/tutorials/keras/save_and_load#save_checkpoints_during_training)  

    tf.keras.callbacks.ModelCheckpoint(
        filepath, monitor='val_loss', verbose=0, save_best_only=False,
        save_weights_only=False, mode='auto', save_freq='epoch',
        options=None, **kwargs
    )

With `tf.keraas.callbacks.ModelCheckpoint(..., save_best_only=True, save_weights_only=False, ...)`, I saved my best model after each epoch if it is considered the best at the time during training.  With the saved models, I could do the following:  
1. After training, I may load my best model prior to computing predictions. (Trained model at the last epoch may not be the best.)  
2. I may download my best model as pretrained base model to use in other platform.  When creating a notebook version using "Save & Run All", the model will be saved locally and preserved as output in my Kaggle working directory: `/kaggle/working/`.  I may download and use the model at another computing platform, like Google Colab. I may use the saved model as a base model to build on it by adding more layers, or make an adversarial model with it, etc., and see whether that makes a difference in the classification accuracy.  
3. I may use models that are output to my kaggle/working directory in other notebooks, by simply clicking "Add Data" and selecting "Notebook Output Files".  I may continue to train the model in anther session with more external data or more epochs, etc.  Remember to save your model with a different filename each time so you don't write over what you have.  

### Writing checkpoints locally from a TPU model
I spent a lot of time looking up for a solution for the UnimplementedError, so I want to capture what I found here. The following [example from Kaggle](https://www.kaggle.com/docs/tpu#tpu5a) shows an important argument for tf.keras.callbacks.ModelCheckpoint: `options=save_locally`.  And save_locally is configured to save model on `experimental_io_device='/job:localhost'`
```
save_locally   = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
checkpoints_cb = tf.keras.callbacks.ModelCheckpoint('./checkpoints', options=save_locally)
model.fit( . . . , callbacks=[checkpoints_cb])
```
Tensorflow has [API documentation](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint) that explains how to set related arguments to save weights only as opposed to save model locally. It look simple Tensorflow's argument table for the argument "options": 

```
Optional tf.train.CheckpointOptions object if save_weights_only is true, or, 
optional tf.saved_model.SaveOptions object if save_weights_only is false.
```
That means the following:  

#### **A)** If you **save weights only**, i.e. set `save_weights_only=True`:

```
tf.keras.callbacks.ModelCheckpoint(..., 
                                     save_weights_only=True, 
                                     options=save_locally,
                                     ...)
                                     
```
then, `options= ...` must be a **tf.train.CheckpointOptions( ... ) object**, i.e. have `save_locally` pre-defined as below:  
`save_locally = tf.train.CheckpointOptions(experimental_io_device='/job:localhost')`  

Note: if not saving weights locally, set `options=None`   

#### **B)** If you **save the model**, i.e. set `save_weights_only=False`:
```
tf.keras.callbacks.ModelCheckpoint(...,
                                   save_weights_only=False`,
                                   options=save_locally,
                                   ...)
```                                   
then, `options= ...` must be a **tf.saved_model.SaveOptions( ... ) object**, i.e. have `save_locally` pre-defined as below:  
`save_locally = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')`


If options to save locally is not set up properly, the following error will occur when checkpoint files are to be saved during training:  
```
UnimplementedError - File system scheme '[local]' not implemented 
(file: '/kaggle/working/checkpoint_temp/variables/variables_temp/part-00000-of-00001') 
Encountered when executing an operation using EagerExecutor. 
This error cancels all future operations and poisons their output tensors.
```

Expand the above for more details.  The following codes are to set up to save the best model locally.

In [None]:
# set file path to save model locally to your /kaggle/working/
BEST_MODEL_PATH = "/kaggle/working/" + MODEL_NAME   
FILE_DIR = os.path.dirname(BEST_MODEL_PATH)                  

# Create a checkpoint callback that saves the best trained model locally 
#   save_best_only=True, 
save_locally = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
cp_callback  = tf.keras.callbacks.ModelCheckpoint(filepath=BEST_MODEL_PATH,      
                   options=save_locally, monitor='val_loss', verbose=1,
                   save_best_only=True, save_weights_only=False, mode='min')

# show current entries saved in Kaggle output directory
print('list of entries contained in', FILE_DIR, tf.io.gfile.listdir(FILE_DIR))
# the following show current entries saved in Kaggle output directory too
#!ls {FILE_DIR}  # same as 
!ls "/kaggle/working/"

# Step 8: Build the model

In [None]:
# With pretrained model 
with strategy.scope():    
    pretrained_model = efn.EfficientNetB7(
        weights=WEIGHTS, 
        include_top=False ,
        input_shape=[*IMAGE_SIZE, 3]
    )
    pretrained_model.trainable = True # transfer learning
    model = tf.keras.Sequential([
        pretrained_model, 
        tf.keras.layers.Dropout(DROP_RATE),
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(len(CLASSES), kernel_regularizer=regularizers.L2(REG_FACTOR), 
            activation='softmax')
    ])

In [None]:
# display model summary, including all layers, the output shape and the number of parameters for each layer,  
#  the number of trainable parameters and the number of non-trainable parameters. 
pretrained_model.summary()

In [None]:
print('---------------- Pretrained Model: EfficientNetB7 ----------------')
print('Total number of layers in pretained base model =', len(pretrained_model.layers))
print('')

print('---------------------------- My Model ----------------------------')
model.summary()

## Compile the model 

`optimizer='adam'` implements the Adam algorithm with some default values set for some arguments, e.g. learning_rate. Adam optimization is a stochastic gradient descent method.

`loss = 'sparse_categorical_crossentropy'` specifies that crossentropy metric is computed between the labels and predictions. This metric is used when there are two or more label classes. Labels are expected to be provided as integers. In this floser classification challenge, there are 104 different classes of flowers.

`metrics=['sparse_categorical_accuracy']`  
Integer labels are used in the training, validation and test datasets. Thus metric is set to use sparse categorical accuracy, which calculates how often predictions matches the integer labels.

In [None]:
model.compile(
    optimizer='adam',
    loss = 'sparse_categorical_crossentropy',
    metrics=['sparse_categorical_accuracy']
)

# Step 9: Train the model - with callbacks list

Now the model is created, configured and compiled with losses and metrics with `model.compile(...)`, it is time to train the model with `model.fit()`, which outputs a History object.  

    historical = model.fit( ...,                  
                           callbacks=[..., ...]
                           )                   
                           
As the codes indicated, while the model is being trained, events are recorded into the History object named "historical".   This object's attribute, `historical.history` is then used to create plots to show accuracy and loss metrics in the next section.  

### Callbacks
The callbacks argument `callbacks=[lr_callback, es_callback, cp_callback]` allows the listed callbacks to appy during training. Here is the list of [keras.callbacks.Callback](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks) instances.  

The cp_callback works well and saves the best model in the .h5 format.  This allows the best model to be loaded prior to doing predictions on test dataset.  (The trained model at the last epoch may not be necessarily the best model.)  

In [None]:
# get datasets for model training and validation
# AUGMENTATION = True or False 
training_dataset   = get_training_dataset(augmentation=AUGMENTATION) 
validation_dataset = get_validation_dataset(ordered=False)
print('training dataset:       ', training_dataset)
print('validation dataset:     ', validation_dataset)

In [None]:
# fit/train the model 
# save the History object to the variable "historical"
# save checkpoints during training
historical = model.fit(
    training_dataset, 
    steps_per_epoch=STEPS_PER_EPOCH, 
    epochs=EPOCHS, 
    validation_data=validation_dataset,
    callbacks=[lr_callback, es_callback, cp_callback]
) 

In [None]:
# load best model saved by cp_callback during training
model = tf.keras.models.load_model(BEST_MODEL_PATH)

# Step 10: Evaluate the Model

## Confusion Matrix

In [None]:
# Construct confusion matrix
def display_confusion_matrix(cmat, score, precision, recall):
    plt.figure(figsize=(15, 15))
    ax = plt.gca()
    ax.matshow(cmat, cmap='Reds')
    ax.set_xticks(range(len(CLASSES)))
    ax.set_xticklabels(CLASSES, fontdict={'fontsize': 7})
    plt.setp(ax.get_xticklabels(), rotation=45, ha="left", rotation_mode="anchor")
    ax.set_yticks(range(len(CLASSES)))
    ax.set_yticklabels(CLASSES, fontdict={'fontsize': 7})
    plt.setp(ax.get_yticklabels(), rotation=45, ha="right", rotation_mode="anchor")
    titlestring = ""
    if score is not None:
        titlestring += 'f1 = {:.3f} '.format(score)
    if precision is not None:
        titlestring += '\nprecision = {:.3f} '.format(precision)
    if recall is not None:
        titlestring += '\nrecall = {:.3f} '.format(recall)
    if len(titlestring) > 0:
        ax.text(101, 1, titlestring, fontdict={'fontsize': 18, 'horizontalalignment': 'right', 'verticalalignment': 'top', 'color': '#804040'})
    plt.show()


cmdataset = get_validation_dataset(ordered=True)
images_ds = cmdataset.map(lambda image, label: image)
labels_ds = cmdataset.map(lambda image, label: label).unbatch()

cm_correct_labels = next(iter(labels_ds.batch(NUM_VAL_IMAGES))).numpy()
cm_probabilities = model.predict(images_ds)
cm_predictions = np.argmax(cm_probabilities, axis=-1)

labels = range(len(CLASSES))
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=labels)
cmat = (cmat.T / cmat.sum(axis=1)).T  # normalize

score = f1_score(cm_correct_labels, cm_predictions, labels=labels, average='macro')
precision = precision_score(cm_correct_labels, cm_predictions, labels=labels, average='macro')
recall = recall_score(cm_correct_labels, cm_predictions, labels=labels, average='macro')
display_confusion_matrix(cmat, score, precision, recall)

## Plots: accuracy and loss metrics


In [None]:
# Create plots of loss and accuracy on the training and validation datasets.

acc = historical.history['sparse_categorical_accuracy']
val_acc = historical.history['val_sparse_categorical_accuracy']

loss = historical.history['loss']
val_loss = historical.history['val_loss']

epochs_range = range(1, len(historical.history['loss'])+1)

plt.figure(figsize=(14, 14))
plt.subplot(2, 1, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')

plt.subplot(2, 1, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.show()

# Step 11: Compute predictions 
If you are happy with training results above, it is time to use the best trained model to make predictions with `model.predict(...)`.  

`np.savetxt(...)` will create a file that can be submitted to the competition.


In [None]:
# get test dataet for prediction
# ordered for prediction for submission to Kaggle
test_dataset = get_test_dataset(ordered=True) 
print('test dataset:', test_dataset)

In [None]:
# show files in local working directories 
print('list of entries contained in', FILE_DIR, tf.io.gfile.listdir(FILE_DIR)) 

# predict probabilities and match to the most probable integer label for each image
print('Computing predictions...')
test_images_ds = test_dataset.map(lambda image, idnum: image)
probabilities = model.predict(test_images_ds)  
predictions = np.argmax(probabilities, axis=-1)
print(predictions)

# create file to submit to the competition
print('Generating submission.csv file...')
test_ids_ds = test_dataset.map(lambda image, idnum: idnum).unbatch()
test_ids = next(iter(test_ids_ds.batch(NUM_TEST_IMAGES))).numpy().astype('U') # all in one batch
np.savetxt('submission.csv', np.rec.fromarrays([test_ids, predictions]), 
           fmt=['%s', '%d'], delimiter=',', header='id,label', comments='')