# Final Project: Flower Classification
#### By Harsh Deshpande

## Premise

The premise of this project is fairly simple: Given a picture of a flower, classify it appropriately. I got my data from a Flower Classification playground competition I was originally planning to compete in, but after some thought and technical difficulties, I decided to not to compete after all. 

I instead decided to make a web app that uses an appropriately trained model. In the web app, one can upload a picture of a flower and then display what type of flower it is. The data in the notebook below shows all the steps I did to produce a desirable model.

For this project, since I was dealing with very large images, I decided to utilize the use of the Tensor Processing Unit (TPU). A TPU is a piece of hardware like the CPU and GPU. It has far more arithmetic logic units than a GPU or CPU, but they are not general purpose and can only work on Tensors (multi-dimensional matrices). The TPU sped up my work greatly and was able to train Neural Network models quite quickly.

NOTE: If you try to run this notebook locally, it WON'T work. Some data import libraries used as well as code that establishes connections to a clould TPU can only work inside of a Kaggle notebook environment. If you want to run this, feel free to fork this notebook and make sure you have selected a TPU to be the Accelerator in the notebook settings.

Here are some important links that I referenced:

The competition where the data came from: https://www.kaggle.com/c/flower-classification-with-tpus

A starter code notebook for the competition: https://www.kaggle.com/mgornergoogle/getting-started-with-100-flowers-on-tpu/ 

(NOTE: I used some of the code in starter code notebook for file decoding, TPU connections, and other global variables)

## Getting Started

In [None]:
import math, re, os
import tensorflow as tf
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from kaggle_datasets import KaggleDatasets #Specific to Kaggle notebook environment
from tensorflow import keras

In [None]:
print(tf.__version__)
print(np.__version__)

The code below connects to a Google Cloud TPU. This code was taken from the starter notebook up above as there really is no other way to connect

In [None]:
# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection. No parameters necessary if TPU_NAME environment variable is set. On Kaggle this is always the case.
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

print("REPLICAS: ", strategy.num_replicas_in_sync)

## Importing and Managing Data

Below I have set up a `files` dictonary that stores filenames for all the test, train, and validation data for each of the different image sizes

In [None]:
files = {'192': {'train': [], 'val': [], 'test': []}, '224':{'train': [], 'val': [], 'test': []}, '331': {'train': [], 'val': [], 'test': []}, '512': {'train': [], 'val': [], 'test': []}}

In [None]:
try:
    data_loc = KaggleDatasets().get_gcs_path() #Built-in GCP Bucket in Kaggle for Data
except:
    data_loc = '/kaggle/input/flower-classification-with-tpus'
    
!gsutil ls $data_loc

for k in files.keys():
    subdir = f'/tfrecords-jpeg-{k}x{k}'
    loc = data_loc+subdir
    #print(loc)
    files[k]['train'] = tf.io.gfile.glob(loc + '/train/*.tfrec')
    files[k]['test'] = tf.io.gfile.glob(loc + '/test/*.tfrec')
    files[k]['val'] = tf.io.gfile.glob(loc + '/val/*.tfrec')
        
print(files['192'])

The Kaggle notebook environment stores input data into a Google Cloud Bucket. So in order to open up the files, I need to make a `KaggleDatasets().get_gcs_path()` call which gets the directory of where the bucket is stored, and only then can I get input data. I then store all the input data files

In [None]:
IMAGE_SIZE = [192, 192]

CLASSES = ['pink primrose',    'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea',     'wild geranium',     'tiger lily',           'moon orchid',              'bird of paradise', 'monkshood',        'globe thistle',         # 00 - 09
           'snapdragon',       "colt's foot",               'king protea',      'spear thistle', 'yellow iris',       'globe-flower',         'purple coneflower',        'peruvian lily',    'balloon flower',   'giant white arum lily', # 10 - 19
           'fire lily',        'pincushion flower',         'fritillary',       'red ginger',    'grape hyacinth',    'corn poppy',           'prince of wales feathers', 'stemless gentian', 'artichoke',        'sweet william',         # 20 - 29
           'carnation',        'garden phlox',              'love in the mist', 'cosmos',        'alpine sea holly',  'ruby-lipped cattleya', 'cape flower',              'great masterwort', 'siam tulip',       'lenten rose',           # 30 - 39
           'barberton daisy',  'daffodil',                  'sword lily',       'poinsettia',    'bolero deep blue',  'wallflower',           'marigold',                 'buttercup',        'daisy',            'common dandelion',      # 40 - 49
           'petunia',          'wild pansy',                'primula',          'sunflower',     'lilac hibiscus',    'bishop of llandaff',   'gaura',                    'geranium',         'orange dahlia',    'pink-yellow dahlia',    # 50 - 59
           'cautleya spicata', 'japanese anemone',          'black-eyed susan', 'silverbush',    'californian poppy', 'osteospermum',         'spring crocus',            'iris',             'windflower',       'tree poppy',            # 60 - 69
           'gazania',          'azalea',                    'water lily',       'rose',          'thorn apple',       'morning glory',        'passion flower',           'lotus',            'toad lily',        'anthurium',             # 70 - 79
           'frangipani',       'clematis',                  'hibiscus',         'columbine',     'desert-rose',       'tree mallow',          'magnolia',                 'cyclamen ',        'watercress',       'canna lily',            # 80 - 89
           'hippeastrum ',     'bee balm',                  'pink quill',       'foxglove',      'bougainvillea',     'camellia',             'mallow',                   'mexican petunia',  'bromelia',         'blanket flower',        # 90 - 99
           'trumpet creeper',  'blackberry lily',           'common tulip',     'wild rose']

def readTrainVal(ex):
    LABELED_TFREC_FORMAT_TRVAL = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "class": tf.io.FixedLenFeature([], tf.int64)  # shape [] means single element
    }
    ex = tf.io.parse_single_example(ex, LABELED_TFREC_FORMAT_TRVAL)
    image_data = ex['image']
    image = tf.image.decode_jpeg(image_data, channels=3)
    image = tf.cast(image, tf.float32) / 255.0  # convert image to floats in [0, 1] range
    image = tf.reshape(image, [*IMAGE_SIZE, 3]) 
    label = tf.cast(ex['class'], tf.int32)
    return image, label # returns a dataset of (image, label) pairs

def readTst(ex):
    LABELED_TFREC_FORMAT_TEST = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "id": tf.io.FixedLenFeature([], tf.string)
    }
    ex = tf.io.parse_single_example(ex, LABELED_TFREC_FORMAT_TEST)
    image_data = ex['image']
    image = tf.image.decode_jpeg(image_data, channels=3)
    image = tf.cast(image, tf.float32) / 255.0  # convert image to floats in [0, 1] range
    image = tf.reshape(image, [*IMAGE_SIZE, 3]) 
    return image, ex['id'] # returns a dataset of (image, id) pairs

ds_test = tf.data.TFRecordDataset(files['192']['train'][0])
for d_rec in ds_test.take(1):
    i,l = readTrainVal(d_rec)
    plt.imshow(i)
    print(CLASSES[l])

The functions above help to split up the data in the `.tfrec` files. The file layout of the `.tfrec` files was similar to that of a JSON format, except of using a `{key:value}` format, the `.tfrec` file used a format more akin to `{key{value}}`. In the training and validation data, there were three keys that constituted an image object: the photo id, the label of the photo (which corresponds to its class) and the actual encoded bytes of the photo. The test dataset did not have the photo labels. First, template dictionaries were made. In the test and validation files, ids were not really necessary, so the dictionary only contained photo bytes and labels. For the testing data, an id was necessary if I was going to compete in the competition (which I eventually ended up not doing), so the photo bytes and the id was included. The dictionaries were used to parse the appropriate parts of the data and retrive values of the specified keys. The image was decoded and reshaped into a $w \cdot h \cdot 3$ shape. The values were then divided by 255 so that the data was normalized between 0 and 1. The labels and ids were retrieved as well. The labels were included in the test data and the ids were included in the train data. 

The classes array was given from the intro notebook linked as I decided to utilize this feature

In [None]:
ds_test_2 = tf.data.TFRecordDataset(files['224']['train'][0])
IMAGE_SIZE = [224, 224]
for d_rec in ds_test_2.take(1):
    i,l = readTrainVal(d_rec)
    plt.imshow(i)
    print(CLASSES[l])
    

In [None]:
ds_test_3 = tf.data.TFRecordDataset(files['331']['train'][0])
IMAGE_SIZE = [331, 331]
for d_rec in ds_test_3.take(1):
    i,l = readTrainVal(d_rec)
    plt.imshow(i)
    print(CLASSES[l])
    

In [None]:
ds_test_4 = tf.data.TFRecordDataset(files['512']['train'][0])
IMAGE_SIZE = [512, 512]
for d_rec in ds_test_4.take(1):
    i,l = readTrainVal(d_rec)
    plt.imshow(i)
    print(CLASSES[l])
    
IMAGE_SIZE = [192, 192]

As seen above, the test, train, and validation data pictures are the same across all three sizes. The only thing differing is the aforementioned image size. For the interest of time and TPU availability, I will only be working with the 192x192 images

In [None]:
num_img = {'test': 462, 'train': 798, 'val': 232}
AUTO = tf.data.experimental.AUTOTUNE

def training_dataset(f=None, batch_factor=2):
    d_train = tf.data.TFRecordDataset(files[str(IMAGE_SIZE[0])]['train'])
    d_train = d_train.map(readTrainVal)
    if f != None:
        d_train = d_train.map(f)
    b_size =  num_img['train'] // batch_factor
    d_train = d_train.shuffle(2048)
    d_train = d_train.batch(b_size)
    d_train = d_train.prefetch(AUTO)
    return d_train

def validation_dataset(batch_factor=2):
    d_val = tf.data.TFRecordDataset(files[str(IMAGE_SIZE[0])]['val'])
    d_val = d_val.map(readTrainVal)
    b_size = num_img['val'] // batch_factor
    d_val = d_val.shuffle(2048)
    d_val = d_val.batch(b_size)
    d_val = d_val.prefetch(AUTO)
    return d_val

def testing_dataset(batch_factor=2):
    d_tst = tf.data.TFRecordDataset(files[str(IMAGE_SIZE[0])]['test'])
    d_tst = d_tst.map(readTst)
    b_size =  num_img['test'] // batch_factor
    d_tst = d_tst.shuffle(2048)
    d_tst = d_tst.batch(b_size)
    d_tst = d_tst.prefetch(AUTO)
    return d_tst
    
    
tr = training_dataset()
val = validation_dataset()
tst = testing_dataset()

The methods above extract all the test, train, and validation data for the 192x192 images. Due to the large amount of images and the large sizes of them, Batches need to be made. The `num_img` dictionary declared above has the number of images inside one file of the test, train, and val folders of data. The layout of the `.tfrec` filenames are `[number]-[image_size]x[image_size]-[# images].tfrec`. The `num_img` takes from the `[# images]` part of the file name. I decided to make the batch size 1/2 the number of images in a single test/train/val file

In [None]:
print("Training data shapes:")
for image, label in tr.take(-1):
    print(image.numpy().shape, label.numpy().shape)
print("Training data label examples:", label.numpy()[:10])

As seen above, there are 32 batches for the training dataset. Each line of `(399, 192, 192, 3) (399,)` indicates one batch. Now we can start building or model. For this first model, I will build a simple neural network with one convolutional layer to check if the images are being properly processed

In [None]:
from tensorflow.keras.layers import Input, Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.models import Model

In [None]:
#Demo model - will need to run convolutional layers on tpus

with strategy.scope(): #This is for determining if the model runs on the TPU
    mod_conv = keras.Sequential()
    mod_conv.add(Conv2D(32, kernel_size=3, padding='same' ,activation='relu', input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3)))
    mod_conv.add(MaxPooling2D(pool_size=(2, 2)))
    mod_conv.add(Flatten())
    mod_conv.add(Dense(512, activation='relu'))
    mod_conv.add(Dense(104,activation='softmax'))
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_conv.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
mod_conv.summary()

In [None]:
mod_conv.fit(tr, epochs=5, validation_data=val) #Too much to run on cpu

As seen above, a neural network was trained successfully above. A loss function of sparse categorical entropy was used because the labels were not given in one-hot form. The accuracy metric also seems to be the best here due to the classification nature of the problem. As you see, given that the model only ran for 5 epochs and the batch sizes were relatively large, there was a low accuracy rate for both the training and validation data. In order to increase this accuracy, we will have to increase the number of epochs, the number of layers, augment the training data with different image transformations, and perhaps incorporate some pre-trained models from keras into this project as well. After this, we can test the accuracy by submitting results to the competition and exporting our model

## Image Augmentation

In [None]:
augs = []
for d_rec in ds_test.take(1):
    i, l = readTrainVal(d_rec)
    plt.imshow(i)
    augs.append(tf.image.random_brightness(i, 0.8))
    augs.append(tf.image.random_contrast(i, 0.2, 0.8))
    augs.append(tf.image.random_flip_left_right(i))
    augs.append(tf.image.random_flip_up_down(i))
    augs.append(tf.image.random_hue(i, 0.3))
    augs.append(tf.image.random_jpeg_quality(i, 30, 95))
    augs.append(tf.image.random_saturation(i, 0.1, 0.8))
    augs.append(tf.image.central_crop(i, central_fraction=np.random.rand()))
    augs.append(tf.image.rot90(i, k=np.random.randint(4)))

for i in augs:
    fig, ax = plt.subplots()
    plt.imshow(i)


The first image of the images above shows the original. The next 9 images show different transformations on it. The data augmentation function will randomly incorporate some of these 9 augments on some of the images in the training set to help prevent overfitting

In [None]:
def augment(img, lab):
    ch = np.random.rand()
    if ch <= 0.6:
        return img, lab #keep ~60% of images in batch same
    n_apply = np.random.randint(9)
    make_trans = np.random.randint(9, size=n_apply)
    for m in make_trans:
        if m == 0:
            img = tf.image.random_brightness(img, 0.8)
        elif m == 1:
            img = tf.image.random_contrast(img, 0.2, 0.8)
        elif m == 2:
            img = tf.image.random_flip_left_right(img)
        elif m == 3:
            img = tf.image.random_flip_up_down(img)
        elif m == 4:
            img = tf.image.random_hue(img, 0.3)
        elif m == 5: 
            img = tf.image.random_jpeg_quality(img, 30, 95)
        elif m == 6:
            img = tf.image.random_saturation(img, 0.1, 0.8)
        elif m == 7:
            img = tf.image.central_crop(img, central_fraction=np.random.rand())
        else:
            img = tf.image.rot90(img, k=np.random.randint(4))
    return img, lab

tr = training_dataset(f=augment)

In [None]:
for d_rec in ds_test.take(1):
    i, l = readTrainVal(d_rec)
    i, l = augment(i,l)
    plt.imshow(i)

The code above applies a random number of the 9 transformations above onto about 40% of the images. The sample image above shows one of the possible results of applying the augment function to an image. Training using this new dataset on the previous neural network model, we get:

In [None]:
mod_conv.fit(tr, epochs=5, validation_data=val)

The accuracy for both training and validation are still very low as predicted due to the low number of epochs and layers. It is slightly increased because I am fitting some data on a previously trained model. I should also note that I do not plan on augmenting the validation images as I want to keep a check of classifying real-life photos instead of possible altered and unrealistic photos that the training set should now contain

## Building the Neural Network

### Creating Convolutional Layers

In [None]:
for image, label in tr.take(1): #Get first batch of images
    for i in range(10): #take 10 images from first batch
        fig, ax = plt.subplots()
        plt.title("train")
        plt.imshow(image[i])
        
for image, label in val.take(1): 
    for i in range(10): 
        fig, ax = plt.subplots()
        plt.title("val")
        plt.imshow(image[i])

As seen above, much of the flower images are not of the actual flowers themselves, but of the background. Due to this, I believe that the first convolutional layer should have a fairly large size. From the images above, a 25x25 filter size is good. However, there is more differentiating details of the flower when you move toward its middle. This is why I think stride size should be small throughout. The next convolutional layers should have a smaller filter size and stride. First I will try to buld a model with three convolutional layers and a basic dense layer. I will also increase the epoch size to about 30.

In [None]:
with strategy.scope(): #This is for determining if the model runs on the TPU
    mod_conv_2 = keras.Sequential()
    mod_conv_2.add(Conv2D(32, kernel_size=25, strides=3,padding='same' ,activation='relu', input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3)))
    mod_conv_2.add(MaxPooling2D(pool_size=(6, 6))) 
    mod_conv_2.add(Conv2D(64, kernel_size=11,padding='same' ,activation='relu'))
    mod_conv_2.add(MaxPooling2D(pool_size=(2, 2))) 
    mod_conv_2.add(Conv2D(128, kernel_size=3 ,padding='same' ,activation='relu'))
    mod_conv_2.add(MaxPooling2D(pool_size=(2, 2))) 
    mod_conv_2.add(Flatten())
    mod_conv_2.add(Dense(512, activation='relu'))
    mod_conv_2.add(Dense(104,activation='softmax'))
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_conv_2.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
mod_conv_2.summary()

In [None]:
history = mod_conv_2.fit(tr, epochs=30, validation_data=val)

In [None]:
def plotLoss(ep, h):
    fig, ax = plt.subplots()
    plt.plot(range(ep), h.history['loss'], label='training loss')
    plt.plot(range(ep), h.history['val_loss'], label='validation loss')
    plt.legend()
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    
def plotAcc(ep, h):
    fig, ax = plt.subplots()
    plt.plot(range(ep), h.history['accuracy'], label='accuracy')
    plt.plot(range(ep), h.history['val_accuracy'], label='validation accuracy')
    plt.legend()
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")

In [None]:
plotLoss(30, history)
plotAcc(30, history)

As seen above, while training accuracy and loss becomes better, the validation accuracy peaks and loss reaches its lowest point at around 20 epochs. This means that our current model is overfitting

In [None]:
with strategy.scope(): #This is for determining if the model runs on the TPU
    mod_conv_3 = keras.Sequential()
    mod_conv_3.add(Conv2D(32, kernel_size=11,padding='same' ,activation='relu', input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3)))
    mod_conv_3.add(MaxPooling2D(pool_size=(2, 2))) 
    mod_conv_3.add(Conv2D(64, kernel_size=3 ,padding='same' ,activation='relu'))
    mod_conv_3.add(MaxPooling2D(pool_size=(2, 2))) 
    mod_conv_3.add(Flatten())
    mod_conv_3.add(Dense(512, activation='relu'))
    mod_conv_3.add(Dense(104,activation='softmax'))
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_conv_3.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
mod_conv_3.summary()

The model above removes the top layer and decreases the filter size, strides, and number of pooling layers. However, the size of each pooling layer increases. Overfitting may have been possible due to the large size of the filters which led to liberal generalizations in the pooling layer

In [None]:
history = mod_conv_3.fit(tr, epochs=30, validation_data=val)
plotLoss(30, history)
plotAcc(30, history)

The new model has an even more egregious overfitting problem. While the neural network is able to get to near a 100% in accuracy with the training dataset, the validation accuracy flattens out at about 25%. I also noticed that the number of nodes in the hidden layer (512) are a lot less than in the input (147456), so I will try to add more dense layers that are larger. One other aspect is to increase the number of filters produced, which may also help to boost validation scores as well. While validation scores are still very low, the model above did do a much better job with classifying on the training data, so it is possible that the convolutional layers need not change

In [None]:
with strategy.scope(): #This is for determining if the model runs on the TPU
    mod_conv_4 = keras.Sequential()
    mod_conv_4.add(Conv2D(32, kernel_size=11,padding='same' ,activation='relu', input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3)))
    mod_conv_4.add(Conv2D(32, kernel_size=11 ,activation='relu', input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3)))
    mod_conv_4.add(MaxPooling2D(pool_size=(2, 2))) 
    mod_conv_4.add(Conv2D(64, kernel_size=3 ,padding='same' ,activation='relu'))
    mod_conv_4.add(Conv2D(64, kernel_size=3,activation='relu'))
    mod_conv_4.add(MaxPooling2D(pool_size=(2, 2))) 
    mod_conv_4.add(Flatten())
    mod_conv_4.add(Dense(2048, activation='relu'))
    mod_conv_4.add(Dense(104,activation='softmax'))
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_conv_4.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
mod_conv_4.summary()

In [None]:
history = mod_conv_4.fit(tr, epochs=30, validation_data=val)
plotLoss(30, history)
plotAcc(30, history)

The results are much worse for this neural network than the previous one. The previous neural network produced the best results despite its overfitting. However, I also notice that another version of accuracy - sparse categorical accuracy - which I will try on with the previous two models

I should also mention that I could not make the first Dense layer as large as the output of the Flatten layer, as that kept on throwing size errors. I made the Dense layer as large as I could without breaking the model

In [None]:
def plotSparse(ep, h):
    fig, ax = plt.subplots()
    plt.plot(range(ep), h.history['sparse_categorical_accuracy'], label='accuracy')
    plt.plot(range(ep), h.history['val_sparse_categorical_accuracy'], label='validation accuracy')
    plt.legend()
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")

In [None]:
with strategy.scope():
    mod_conv_5 = keras.models.clone_model(mod_conv_3)
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_conv_5.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
mod_conv_5.summary()

In [None]:
history = mod_conv_5.fit(tr, epochs=30, validation_data=val)

In [None]:
plotLoss(30, history)
plotSparse(30, history)

In [None]:
with strategy.scope():
    mod_conv_6 = keras.models.clone_model(mod_conv_4)
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_conv_6.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
mod_conv_6.summary()

In [None]:
history = mod_conv_6.fit(tr, epochs=30, validation_data=val)
plotLoss(30, history)
plotSparse(30, history)

As seen above, there is no difference with the new metric for the duplicate 3rd model. For the duplicate 4th model though, while it generally stays the same for most of it, it does start to improve towards the end especially for loss. We can try to run this for 20 more epochs

In [None]:
history = mod_conv_6.fit(tr, epochs=20, validation_data=val)

In [None]:
plotLoss(20, history)
plotSparse(20, history)

As seen above, the validation sparse categorical accuracy does continue to improve but very very slowly, so it is not very efficient to train this. None of the convolutional neural networks from scratch are working particularly well, which is why I plan on implementing a pre-trained neural network into the model

### Using Pre-trained Models


After doing some research, I believe that the best Neural Network Archtitectures to use would be ResNet and Google's InceptionNet, as they seem to be the most detailed and least erroneous. Both will be accessing the ImageNet database. After looking around on the ImageNet website, they do seem to have a detailed database of flowers

In [None]:
with strategy.scope():
    pt = keras.applications.resnet.ResNet50(weights='imagenet', input_shape=[*IMAGE_SIZE, 3], include_top=False)
    mod_pt_1 = keras.Sequential()
    mod_pt_1.add(pt)
    mod_pt_1.add(keras.layers.GlobalAveragePooling2D()) #using this to get average of large datasets
    mod_pt_1.add(Dense(2048, activation='relu'))
    mod_pt_1.add(Dense(104,activation='softmax'))
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_pt_1.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
mod_pt_1.summary()

In [None]:
history = mod_pt_1.fit(tr, epochs=30, validation_data=val)

In [None]:
plotLoss(30, history)
plotSparse(30, history)

As seen above, while there is still some overfitting, this pretrained model does a much better job at classifying validation data and therefore overfits less than any of the made from scratch convolutional models. 

In [None]:
with strategy.scope():
    pt = keras.applications.inception_v3.InceptionV3(weights='imagenet', input_shape=[*IMAGE_SIZE, 3], include_top=False)
    mod_pt_2 = keras.Sequential()
    mod_pt_2.add(pt)
    mod_pt_2.add(keras.layers.GlobalAveragePooling2D()) #using this to get average of large datasets
    mod_pt_2.add(Dense(2048, activation='relu'))
    mod_pt_2.add(Dense(104,activation='softmax'))
    o = keras.optimizers.SGD(learning_rate = 0.05, momentum=0.3)
    mod_pt_2.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
mod_pt_2.summary()

In [None]:
history = mod_pt_2.fit(tr, epochs=30, validation_data=val)

In [None]:
plotLoss(30, history)
plotSparse(30, history)

The InceptionNet optimizer is by far the most effective model made yet. While there is still a slight overfitting concern, the loss and accuracy scores for both training and validation are the closest to each other out of all the models. It does not make more sense to train for more epochs on this model as the curves flatten out

### Optimizers

so far I have been using a custom SGD optimizer with a `learning_rate = 0.05` and `momentum = 0.3`. From previous experiences throughout the semester, I have learned that too high of a learning rate and/or momentum is generally very faulty, but I can still re-test them again. Aditionally, I will be testing Adagrad and Adam optimizers. Since the InceptionNet model gave the best result of the models, I will be reusing that

In [None]:
opt = [keras.optimizers.SGD(momentum=0.7), keras.optimizers.SGD(learning_rate = 0.7), 'adagrad', 'adam']
hist = []
for o in opt:
    with strategy.scope():
        pt = keras.applications.inception_v3.InceptionV3(weights='imagenet', input_shape=[*IMAGE_SIZE, 3], include_top=False)
        m = keras.Sequential()
        m.add(pt)
        m.add(keras.layers.GlobalAveragePooling2D()) #using this to get average of large datasets
        m.add(Dense(2048, activation='relu'))
        m.add(Dense(104,activation='softmax'))
        m.compile(optimizer=o, loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
    history = m.fit(tr, epochs=30, validation_data=val)
    print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~')
    hist.append(history)

In [None]:
print(len(hist))
for i,h in enumerate(hist):
    if i == 0:
        print('High Momentum')
    elif i == 1:
        print('High learning rate')
    elif i == 2:
        print('Adagrad')
    else:
        print('Adam')
    plotLoss(30, h)
    plotSparse(30, h)

From the result above, it seems like the custom optimzer used throughout the notebook and adagrad are very similar to each other, but it seems like adagrad narrowly edges out as the best optimizer with a slightly higher accuracy rate

### Other Tweaks

#### Batch Sizing

As mentioned earlier, I said that batch sizes being too large could've been a problem, so I will try to reduce them now and test those out

In [None]:
with strategy.scope():
    pt = keras.applications.inception_v3.InceptionV3(weights='imagenet', input_shape=[*IMAGE_SIZE, 3], include_top=False)
    best_model = keras.Sequential()
    best_model.add(pt)
    best_model.add(keras.layers.GlobalAveragePooling2D()) #using this to get average of large datasets
    best_model.add(Dense(2048, activation='relu'))
    best_model.add(Dense(104,activation='softmax'))
    best_model.compile(optimizer='adagrad', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

best_model.summary()
tr_2 = training_dataset(batch_factor=3)
val_2 = validation_dataset(batch_factor=3)

In [None]:
bt_1 = keras.models.clone_model(best_model)
bt_1.compile(optimizer='adagrad', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
h = bt_1.fit(tr_2, epochs=20, validation_data=val_2)
plotLoss(20, h)
plotSparse(20, h)

I ended the training for this model early because it was taking too long and the accuracy was not significantly improving with lower batch size. Sticking to a batch size of 32 seems to be the most optimal at this point

#### Dropout Layers

In order to make sure that there is no overfitting, I will add a small dropout layer. Since there are some overfitting concerns, I will have to make the dropout a bit high. This should yield a lower accuracy score for both the validation and train data

In [None]:
with strategy.scope():
    pt = keras.applications.inception_v3.InceptionV3(weights='imagenet', input_shape=[*IMAGE_SIZE, 3], include_top=False)
    final_model = keras.Sequential()
    final_model.add(pt)
    final_model.add(keras.layers.GlobalAveragePooling2D()) #using this to get average of large datasets
    final_model.add(Dropout(0.3))
    final_model.add(Dense(2048, activation='relu'))
    final_model.add(Dense(104,activation='softmax'))
    final_model.compile(optimizer='adagrad', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

final_model.summary()

In [None]:
h = final_model.fit(tr, epochs=30, validation_data=val)
plotLoss(30, h)
plotSparse(30, h)

Overall, the Dropuout layer didn't affect accuracy or loss all that much

## Conclusions

In [None]:
def export(model):
    pic_batch = tst.map(lambda image, idnum: image)
    p = model.predict(pic_batch)
    pred = np.argmax(p, axis=-1)
    i = 0
    for f in pic_batch.unbatch().take(10):
        fig, ax = plt.subplots()
        plt.title(CLASSES[pred[i]])
        plt.imshow(f)
        i += 1

export(final_model)
    

As seen above with some of the test data, the predictions have ended up not working so well. Nonetheless this was a very complex project with a lot of fitering required - many times the main features in the flower pic was not even the flower itself, which probably led to some inaccurate results. It's also clear that overfitting was prevalant in this model, and more bias towards validation data may have resulted with more and more runs. But given circumstances and time constraints, this model is probably the most accurate model so far - I will export and use this model in the web app.

In [None]:
final_model.save('final_model.h5')