# Dog Breed Identification

Who's a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don't have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that?

In this playground competition, you are provided a strictly canine subset of ImageNet in order to practice fine-grained image categorization. How well you can tell your Norfolk Terriers from your Norwich Terriers? With 120 breeds of dogs and a limited number training images per class, you might find the problem more, err, ruff than you anticipated.

![Dataset samples](https://storage.googleapis.com/kaggle-competitions/kaggle/3333/media/border_collies.png)

## Acknowledgments

We extend our gratitude to the creators of the [Stanford Dogs Dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) for making this competition possible: Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li.

## Problem

Identifying the breed of as dog given an image of a dog.

## Data

The data we're using is from Kaggle's [Dog Breed Identification](https://www.kaggle.com/c/dog-breed-identification/data) competition.

# Evaluation

The evaluation is a file with prediction probabilities for each dog breed of each test image, as stated [here](https://www.kaggle.com/c/dog-breed-identification/overview/evaluation).

## Features

Some information about the data:

* We're dealing with images (unstructured data) so it's probably best we use deep learning/transfer learning.
* There are 120 breed of dogs (this means there are 120 different classes).
* There are around 10,000+ images in the training set (these images have labels).
* There are around 10,000+ images in the test set (theses images have no labels, because we'll want to predict them).

## Workspace setup

In [None]:
# Import necessary tools
import tensorflow as tf
import tensorflow_hub as hub
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import os
import datetime

In [None]:
print("TF version:", tf.__version__)
print("TF Hub version:", hub.__version__)

# Check for GPU availability
print("GPU", "available" if tf.config.list_physical_devices("GPU") else "not")

In [None]:
DATA_PATH = "/kaggle/input/dog-breed-identification/"
MODELS_PATH = "/kaggle/working/models/"
LOGS_PATH = "/kaggle/working/logs/"
OUTPUT_PATH = "/kaggle/working/output/"

# Make sure that the required directories path exists
if not os.path.isdir(MODELS_PATH):
    os.makedirs(MODELS_PATH)
if not os.path.isdir(LOGS_PATH):
    os.makedirs(LOGS_PATH)
if not os.path.isdir(OUTPUT_PATH):
    os.makedirs(OUTPUT_PATH)

# Data Loading

As with all machine learning models, our data has to be in numerical format. So that's what we'll be doing first: turning our images into **Tensors**

Let's tart by accessing our data and checking out the labels.

In [None]:
labels_csv = pd.read_csv(DATA_PATH + "labels.csv")
display(labels_csv.describe())
display(labels_csv.head())

In [None]:
# How manu images are there of each breed?
labels_csv.breed.value_counts().plot.bar(figsize=(20, 10))

In [None]:
labels_csv.breed.value_counts().median()

In [None]:
# Let's view an image
from IPython.display import Image
Image(DATA_PATH + "train/001513dfcb2ffafc82cccf4d8bbaba97.jpg")

## Getting images and their labels

Let's get a list of all our image file path names.

In [None]:
filenames = [DATA_PATH + f"train/{fname}.jpg" for fname in labels_csv["id"]]
filenames[:10]

In [None]:
# Check whether number of filenames matches number of actual image files
if len(os.listdir(DATA_PATH + "train")) == len(filenames):
    print("Filenames match actual amount of files! Proceed.")
else:
    print(
        "Filenames do not match actual amount of files! Check target directory."
    )

In [None]:
# One more check
print(labels_csv.breed[9000])
Image(filenames[9000])

Since we've got our training image filepaths in a list, let's prepare our labels.

In [None]:
labels = labels_csv.breed.values
labels

In [None]:
len(labels)

In [None]:
# See if number of labels matches the number of filenames
if len(labels) == len(filenames):
    print("Number of labels matches number of filenames!")
else:
    print(
        "Number of labels does note matches number of filenames! Check data directory."
    )

In [None]:
# Find the uniques label values
unique_breeds = np.unique(labels)
print(len(unique_breeds))
print(unique_breeds)

## One-Hot Encoding

In [None]:
# Turn a single label into an array of booleans (one-hot array)
print(labels[0])
labels[0] == unique_breeds

In [None]:
# Turn every label into a boolean array
one_hot_labels = [label == unique_breeds for label in labels]
one_hot_labels[:2]

## Creating our own validation set

Since the dataset from Kaggle doesn't come with a validation set, we're going to create our own.

In [None]:
# Setup X & y
X = filenames
y = one_hot_labels

We're going to start off experimenting with ~ 1000 images and increase as needed.

In [None]:
# Set number of images to use for experimenting
NUM_IMAGES = 1000

In [None]:
# Split our data into training and validation of total size NUM_IMAGES
X_train, X_val, y_train, y_val = train_test_split(X[:NUM_IMAGES],
                                                  y[:NUM_IMAGES],
                                                  test_size=0.2,
                                                  random_state=42)

len(X_train), len(X_val), len(y_train), len(y_val)

In [None]:
# Let's have a look on our training data
X_train[:2], y_train[:2]

## Preprocessing Image (turning images into Tensors)

To preprocess our images into Tensors we're going to write a function which does a few things:

1. Take a image filepath as input
2. Use TensorFlow to read the file and save it to a variable `image`
3. Turn our `image` (a jpg) into Tensors
4. Resize the `image` to be a shape of (224, 224)
5. return the modified `image`


In [None]:
IMG_SIZE = 224


# Function for preprocessing images
def process_image(image_path, img_size=IMG_SIZE):
    """
  Takes an image filepath and turns it into a Tensor
  """
    # Read the image file
    image = tf.io.read_file(image_path)
    # Turn the jpeg image into numerical Tensor with 3 color channels (Red, Green, Blue)
    image = tf.image.decode_jpeg(image, channels=3)
    # Convert the color channels values range from 0-255 to 0-1
    image = tf.image.convert_image_dtype(image, tf.float32)
    # Resize the image to our desired values (224, 224)
    image = tf.image.resize(image, size=(img_size, img_size))
    # Return the modified image
    return image

## Turning our data into batches

Why turn our data into batches?

Let's say you're trying to process 10,000+ images in one go... they all might not fit into memory.

So that's why we do about 32 (this is batch size) images at a time (you can manually adjust the batch size if needed).

In order to use TensorFlow effectively, we need our data in the form of Tensor tuples which look like this: (`image`, `label`)

In [None]:
# Simple function to return a tuple (image, label)
def get_image_label(image_path, label):
    """
  Takes an image filepath name and the associated label, processes the image and return a tuple of (image, label)
  """
    image = process_image(image_path)
    return image, label

Now we've got a way to turn our data into tuples of Tensors in the form: (`image`, `label`), let's make a function to turn all of our data (`X` & `y`) into batches!

In [None]:
# Define the batch size. 32 is a good start
BATCH_SIZE = 32


# Function to turn data into batches
def create_data_batches(X,
                        y=None,
                        batch_size=BATCH_SIZE,
                        valid_data=False,
                        test_data=False):
    """
  Creates batches of data out of image (X) and label (y) pairs. Shuffles the data if it's validation data.
  Also accepts test data as input (no labels).
  """
    # If the data is test dataset, we probably don't have labels
    if test_data:
        print("Creating test data batches...")
        data = tf.data.Dataset.from_tensor_slices(
            (tf.constant(X)))  # only filepaths (no labels)
        data_batch = data.map(process_image).batch(BATCH_SIZE)
        return data_batch

    # If the data is a valid dataset, we don't need to shuffle ir
    elif valid_data:
        print("Creating validation data batches...")
        data = tf.data.Dataset.from_tensor_slices((
            tf.constant(X),  # filepaths
            tf.constant(y)))  # labels
        data_batch = data.map(get_image_label).batch(BATCH_SIZE)
        return data_batch

    else:
        print("Creating training data batches...")
        data = tf.data.Dataset.from_tensor_slices(
            (tf.constant(X), tf.constant(y)))
        # Shuffling pathnames and labels bafore mapping image processor function is faster than shuffling images
        data = data.shuffle(buffer_size=len(X))

        # Create (image, label) tuples (this also turns the image path into a preprocessed image)
        data_batch = data.map(get_image_label).batch(BATCH_SIZE)

        return data_batch

In [None]:
# Create training and validation data batches
train_data = create_data_batches(X_train, y_train)
val_data = create_data_batches(X_val, y_val, valid_data=True)

In [None]:
# Check out the different attributes of our data batches
train_data.element_spec, val_data.element_spec

## Visualizing data batches

Our data is now in batches. However, these can be a little hard to understand/comprehend. Let's visualize them!

In [None]:
# Function for viewing images ina a data batch
def show_25_images(images, labels):
    """
  Displays a plot of a 25 of images and their labels from a data batch.
  """
    # Setup the figure
    plt.figure(figsize=(10, 10))
    # Loop through the 25 * for displaying 25 images:
    for i in range(25):
        ax = plt.subplot(5, 5, i + 1)
        # Display an image
        plt.imshow(images[i])
        # Add the image label as the title
        plt.title(unique_breeds[labels[i].argmax()])
        # Turn the grid lines off
        plt.axis("off")

In [None]:
# Let's visualize our training set
train_images, train_labels = next(train_data.as_numpy_iterator())
show_25_images(train_images, train_labels)

In [None]:
# Now let's visualize our validation set
val_images, val_labels = next(val_data.as_numpy_iterator())
show_25_images(val_images, val_labels)

# Building a model

Before we build a model, there are a few things we need to define:

* The input shape (our images shape, in the form of Tensors) to our model.
* The output shape (image label, in the form of Tensors) of our model.
* The URL of the model we want to use from [TensorFlow hub]( https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4)

In [None]:
INPUT_SHAPE = [None, IMG_SIZE, IMG_SIZE,
               3]  # batch, hieght, width, color channels

# Setup output shape of our model
OUTPUT_SHAPE = len(unique_breeds)

# Setup the MobileNetV2 model URL from TensorFlow hub
MODEL_URL = "https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4"

Now we've got our inputs, outputs and model ready to g, let's put them together into a Keras deep learning model.

Knowing this, let's create a function which:
* Takes a input shape, output shape and the model we've chosen as parameters.
* Defines the layers in a Keras model in sequential fashion (do this first, then this, then that).
* Compiles the mode l(says it should be evaluated and improved).
* Builds the model (tells the model the input shape it'll be getting).
* Return the model

All these steps can be found [here](https://www.tensorflow.org/guide/keras/sequential_model)

In [None]:
# Function which builds a Keras model
def create_model(input_shape=INPUT_SHAPE,
                 output_shape=OUTPUT_SHAPE,
                 model_url=MODEL_URL):
    print("Building model with:", model_url)

    # Setup the model layers
    model = tf.keras.Sequential([
        hub.KerasLayer(model_url),  # layer 1 (input layer)
        tf.keras.layers.Dense(units=output_shape,
                              activation="softmax")  # layer 2 (output layer)
    ])

    # Compile the model
    model.compile(loss=tf.keras.losses.CategoricalCrossentropy(),
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=["accuracy"])

    # Build the model
    model.build(input_shape)

    return model

In [None]:
model = create_model()
model.summary()

## Creating callbacks

Callbacks are helper functions a model can use during training to do such things as save its prograss, check its progress or stop training early if a model stops improving.

We'll create two callbacks. One for TehsorBoard, which helps track our model progress, and another for early stopping, which prevents our model from training for too long.

### TensorBoard callback

To setup a TensorBoard callback, we need to do 3 things:
1. Load the TensorBoard notebook extension
2. Create a TensorBoard callback which is able to save logs to a directory and pass it to our model's `fit()` function.
3. visualize our models training log with the `%tensorboard` magic function (we'll do this after model training)

In [None]:
# Load TensorBoard notebook extension
%load_ext tensorboard

In [None]:
# Function to build a TensorBoard callback
def create_tensorboard_callback():
    # Create a log directory for storing TensorBoard logs
    logdir = os.path.join(
        LOGS_PATH,  # make it so the logs get tracked whenever we run an experiment
        datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
    return tf.keras.callbacks.TensorBoard(logdir)

### Early Stopping callback

[Early Stopping](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping) helps stop our model overfitting by stopping training if a certain metric stops improving.

In [None]:
# Create Early Stopping callback
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_accuracy",
                                                  patience=3)

# Training a model (on a subset of data)

Our first model is only going to train on 1000 images, to make sure everything is working.

Let's create a function which trains a model.
* Create a model using `create_model()`.
* Setup a TensorBoard callback using `create_tensorboard_callback()`.
* Call the `fit()` function on our model passing it the training data, validation data, number of epochs to train for (`NUM_EPOCHS`) and the callbacks we'd like to use.
* Return the model.

In [None]:
NUM_EPOCHS = 100


# Function to train and return a trained model
def train_model(num_epochs=NUM_EPOCHS):
    """
  Trains a given model and return the trained version.
  """
    # Create a model
    model = create_model()

    # Create a new TensorBoard session everytime we train a model
    tensorboard = create_tensorboard_callback()

    # Fit the model to the data passing it the callbacks we created
    model.fit(x=train_data,
              epochs=NUM_EPOCHS,
              validation_data=val_data,
              validation_freq=1,
              callbacks=[tensorboard, early_stopping])

    # Return the fitted model
    return model

In [None]:
# Fit the model to the data
model = train_model()

## Checking the TensorBoard logs

The TensorBoard maginc function (`%tensorboard`) will access the logs directory we created earlier and visualize its contents

In [None]:
%tensorboard --logdir $LOGS_PATH

# Making and evaluating prediction using a pre-trained model

In [None]:
predictions = model.predict(val_data, verbose=1)
predictions

In [None]:
index = 42
print(predictions[index])
print(f"Max value (probability of prediction): {np.max(predictions[index])}")
print(f"Sum: {np.sum(predictions[index])}")
print(f"Max index: {np.argmax(predictions[index])}")
print(f"Predicted label: {unique_breeds[np.argmax(predictions[index])]}")

In [None]:
unique_breeds[113]

Having the above functionality is greate but we want to be able to do it at scale. And it would be even better if we could see the image the prediction is being made on!

**Note:** Prediction probabilities are also know as *confidence levels*

In [None]:
# Turn prediction probabilities into their respective label (easier to understand)
def get_pred_label(prediction_probabilities):
    """
  Turn an array of prediction probabilities into a label
  """
    return unique_breeds[np.argmax(prediction_probabilities)]

In [None]:
# Get a predicted label based on an array of prediction probabilities
get_pred_label(predictions[81])

Now sice our validation data is still in a batch dataset, we'll have to unbatchify it to make prediction on validation images and them compare those predictions to the validation labels (truth labels)

In [None]:
# Function to unbatchify a batch dataset
def unbatchify(data):
    """
  Takes a batched dataset of (image, label) Tensors and return separate arrays
  of images and labels
  """
    images = []
    labels = []
    # Loop trhough unbatched data
    for image, label in data.unbatch().as_numpy_iterator():
        images.append(image)
        labels.append(get_pred_label(label))

    return images, labels

In [None]:
# Unbatchify the validation data
val_images, val_labels = unbatchify(val_data)
val_images[0], val_labels[0]

Now we've got ways to get:
* Prediction labels
* validation labels
* validation images

Let's make some function to make these all a bit more visualize.

We'll create a function which:
* Takens an array of prediction probabilities, an array of truth labels, an array of images and an integer.
* Convert the prediction probabilities to a predicted label.
* Plot the predicted label, its probability, the truth and the target image in a single plot.

In [None]:
def plot_pred(prediction_probabilities, labels, images, n=1):
    """
  View the prediction ground truth and image for sample n
  """
    pred_prob, true_label, image = prediction_probabilities[n], labels[
        n], images[n]

    # Get the pred label
    pred_label = get_pred_label(pred_prob)

    # Plot the image & remove ticks
    plt.imshow(image)
    plt.xticks([])
    plt.yticks([])

    # Change the color of the title depending if the prediction is right or wrong
    if pred_label == true_label:
        color = "green"
    else:
        color = "red"

    # Change plot title to be predicted, probability of prediction and truth label
    plt.title(f"{pred_label} {np.max(pred_prob)*100:2.0f}% {true_label}",
              color=color)

In [None]:
plot_pred(prediction_probabilities=predictions,
          labels=val_labels,
          images=val_images)

Now we've got one function to visualize our models top predictions, let's make another to view our models top 10 predictions

This function will:
* Take an input of prediction probabilities array and a ground truth array and an integer.
* Find the prediction using `get_pred_label()`
* Find the top 10:
  * Prediction probabilities indexes
  * Prediction probabilities values
  *Prediction labels
* Plot the top 10 prediction probability values and labels, coloring the true label green.

In [None]:
def plot_pred_conf(prediction_probabilities, labels, n=1):
    """
  Plot the top 10 highest prediction confidences along with the truth label for
  sample n
  """
    pred_prob, true_label = prediction_probabilities[n], labels[n]

    # Get the predicted label
    pred_label = get_pred_label(pred_prob)

    # Find the top 10 prediction confidence indexes
    top_10_pred_indexes = pred_prob.argsort()[-10:][::-1]
    # Find the top 10 prediction connfidence values
    top_10_pred_values = pred_prob[top_10_pred_indexes]
    # Find the top 10 prediction labels
    top_10_pred_labels = unique_breeds[top_10_pred_indexes]

    # Setup plot
    top_plot = plt.bar(np.arange(len(top_10_pred_labels)),
                       top_10_pred_values,
                       color="grey")
    plt.xticks(np.arange(len(top_10_pred_labels)),
               labels=top_10_pred_labels,
               rotation="vertical")

    # Change color of true label
    if np.isin(true_label, top_10_pred_labels):
        top_plot[np.argmax(
            top_10_pred_labels == true_label)].set_color("green")
    else:
        pass

In [None]:
plot_pred_conf(prediction_probabilities=predictions, labels=val_labels, n=9)

Now we've got some function to help us visualize our predictions and evaluate our model, let's check out a few.

In [None]:
# Let's check out a few predictions and their different values
i_multiplier = 10
n_rows = 3
n_cols = 2
n_images = n_cols * n_rows
plt.figure(figsize=(10 * n_cols, 5 * n_rows))
for i in range(n_images):
    plt.subplot(n_rows, 2 * n_cols, 2 * i + 1)
    plot_pred(prediction_probabilities=predictions,
              labels=val_labels,
              images=val_images,
              n=i + i_multiplier)
    plt.subplot(n_rows, 2 * n_cols, 2 * i + 2)
    plot_pred_conf(prediction_probabilities=predictions,
                   labels=val_labels,
                   n=i + i_multiplier)
plt.show()

## Confusion matrix

In [None]:
def plot_conf_matrix(prediction_probabilities, labels):
    """
  Plot the confusion matrix of a trained model given its prediction
  probabilities and desired labels
  """
    # First, we get the corresponding labels of the predictions
    pred_labels = [
        get_pred_label(pred_probs) for pred_probs in prediction_probabilities
    ]

    # Check which breeds are present either in true and predicted labels
    breeds_in_true_labels = set(labels)
    breeds_in_pred_labels = set(pred_labels)
    breeds_in_set = [
        breed for breed in unique_breeds
        if breed in breeds_in_pred_labels and breed in breeds_in_true_labels
    ]

    # Computes the confusion matrix
    conf_mat = confusion_matrix(labels, pred_labels, labels=breeds_in_set)

    # Builds the confusion matrix dataframe (for the x and y ticks in the heatmap)
    conf_df = pd.DataFrame(conf_mat,
                           index=breeds_in_set,
                           columns=breeds_in_set)
    conf_df.dropna(inplace=True)

    # Now we plot the confusion matrix
    fig, ax = plt.subplots(figsize=(20, 20))
    conf_plot = sns.heatmap(conf_df, annot=True, cbar=False)

    plt.title("Confusion matrix")
    plt.xlabel("True label")
    plt.ylabel("Predicted label")

In [None]:
plot_conf_matrix(predictions, val_labels)

## Saving and reloading a trained model

In [None]:
# Create a function to save a model
def save_model(model, suffix=None):
    """
  Save a given model in a model directory and appends a suffix (string)
  """
    # Create a model directory with current time
    modeldir = os.path.join(MODELS_PATH,
                            datetime.datetime.now().strftime("%Y%m%d_%H%M%S"))
    model_path = modeldir + "_" + suffix + ".h5"  # model save format
    print(f"Saving model to: {model_path}...")
    model.save(model_path)
    return model_path

In [None]:
# Create a function to load a trained model
def load_model(model_path):
    print(f"Loading saved model from: {model_path}...")
    model = tf.keras.models.load_model(
        model_path, custom_objects={"KerasLayer": hub.KerasLayer})
    return model

Now we've got a function to save and load a trained mode, let's make sure they work!

In [None]:
# Save our model trained on 1000 images
model_path = save_model(model, suffix="1000_images_mobilenetv2_Adam")

In [None]:
# Load a trained model
loaded_1000_image_model = load_model(model_path)

In [None]:
model.evaluate(val_data, )

In [None]:
model.metrics_names

# Training on the full data

In [None]:
# Create a data batch with the full data set
full_data = create_data_batches(X, y)

In [None]:
full_data

In [None]:
# Create a model for full model
full_model = create_model()

In [None]:
# Create full model callbacks
full_model_tensorboard = create_tensorboard_callback()
# No validation set when training on all the data, so we can't monitor validation accuracy
full_model_early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="accuracy", patience=3)

In [None]:
# Fit the full model to the full data
full_model.fit(x=full_data,
               epochs=NUM_EPOCHS,
               callbacks=[full_model_tensorboard, full_model_early_stopping])

In [None]:
full_model_path = save_model(full_model, suffix="full_image_set_mobilenetv2_Adam")

In [None]:
loaded_full_model = load_model(full_model_path)

# Making predictions on the test dataset

Since our model has been trained on images in the form of Tensor batches, to make predictions on the test data, we'll have to get it into the same format.

To make predictions on the test data, we'll:

* Get the test images filenames.
* Convert the filenames into test data batches using `create_data_batches()` and setting the `test_data` parameter to `True` (since the test data doesn't have labels).
* Make predictions arrar by passing the test batches tot the `predict()` method called on our model.

In [None]:
# Load test image filenames
test_path = DATA_PATH + "test/"
test_filenames = [test_path + fname for fname in os.listdir(test_path)]
test_filenames[:10]

In [None]:
len(test_filenames)

In [None]:
# Create test data batch
test_data = create_data_batches(test_filenames, test_data=True)

In [None]:
test_data

In [None]:
# Make predictions on test data batch using the loaded full model
test_predictions = loaded_full_model.predict(test_data, verbose=1)

In [None]:
# Save predictions (NumPy arrary) to csv file (for access later)
np.savetxt(OUTPUT_PATH + "preds_array.csv", test_predictions, delimiter=",")

In [None]:
test_predictions = np.loadtxt(OUTPUT_PATH + "preds_array.csv", delimiter=",")

In [None]:
test_predictions

In [None]:
test_predictions.shape

# Preparing test dataset predictions for Kaggle

Looking ar the [Kaggle sample submission](https://www.kaggle.com/c/dog-breed-identification/overview/evaluation), we find that it wants our models prediction probability outputs in a DataFrame with and ID and a column for each dog breed.

To get the data in this format, we'll:
* Create a pandas DataFrame with and ID column as well as a column for each dog breed.
* Add data to the ID column by extracting the test image ID's from their filepaths.
* Add data ( the prediction probabilities) to each of the dog breed columns.
* Export the DataFrame as a CSV to submit it to Kaggle.

In [None]:
# Create a pandas DataFrame with empty columns
preds_df = pd.DataFrame(columns=["id"] + list(unique_breeds))
preds_df

In [None]:
# Append test image ID's to predictions DataFrame
test_ids = [os.path.splitext(path)[0] for path in os.listdir(test_path)]
preds_df["id"] = test_ids
preds_df.head()

In [None]:
# Add the prediction probabilities to each dog breed column
preds_df[list(unique_breeds)] = test_predictions
preds_df.head()

In [None]:
# Save our predictions dataframe to CSV for submission to Kaggle
preds_df.to_csv(OUTPUT_PATH +
                "full_model_predictions_submission_1_mobilenetV2.csv",
                index=False)