# Dog Breed Identification built with Fast.ai's CNN using transfer learning
---
## Description

This project will take on a dog breed identification challenge by [Kaggle](https://www.kaggle.com/competitions/dog-breed-identification). The challenge uses the [Stanford Dogs Dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/), which is a subset of the much larger ImageNet dataset.

This notebook will automatically download the Stanford dataset from my personal Google Drive, via a public link. But if you prefer you can [download](https://www.kaggle.com/competitions/dog-breed-identification/data) the dataset as a `.zip` file from Kaggle (you need a free Kaggle account to be able to download the file). If you do download the `.zip` file yourself, be sure to unzip the file in the repo's root dir., and rename the file: `stanford-dogs-dataset`

This project employs Python and the [Fast.ai](https://github.com/fastai/fastai) library to create an image classification model that leverages transfer learning and a convolutional neural network (CNN) to accurately and efficiently identify dog breeds trained on the Stanford Dogs dataset.

This project also serves as the technical foundation for my bachelor's thesis on dog breed classification. The aim of this project, as well as my thesis, is to evaluate the efficiency and accuracy of my model when compared to similar models trained on the Standford Dogs Dataset.

This notebook additionally explores the concepts of exploratory data analysis (EDA), data augmentation, image pre-processing, comprehensive logging of traning statistics among others.

---
## Goals

The goal of an image classification problem is to minimize the loss. Loss refers to the measure of how well a model's predictions match the actual classes/labels of the training data. A lower loss value indicates that the model is more accurate at making predictions.

Striving for a high level of accuracy is also key. Accuracy is measured by how well the trained model can correctly predict the classes of unseen new images.

---
## Structure

This is a broad overview of the main table of contents of this notebook:
1.   Installs, Imports & Settings
2.   Load the dataset
3.   EDA
4.   Dataloader
5.   Training
6.   Logging
7.   Post-Training Analysis
8.   Predictions
9.   Exports
---
## Technical Specifications

Begin by downloading or cloning the repo [GitHub](https://github.com/krullmizter/dog-breed-id-fastai).

### Local Development (Anaconda)

If you run this notebook locally, I recommend using Jupyter Notebook like [Anaconda notebooks](https://anaconda.org/), creating a new enviroment, and running Anaconda with administrative privileges.

You can download and use the base env. files: `environment.yaml`, `requirements.txt` for conda, and Python respectivly. The files can be found in the [repo](https://github.com/krullmizter/dog-breed-id-fastai/tree/main/venv).

Create a conda env. from the terminal:
`conda env create -f environment.yaml`, or import the `environment.yaml` file into your Anaconda navigator.

Install all the base Python packages with pip:
`pip install -r requirements.txt`

#### Errors
#### `PackagesNotFoundError`
If your conda installation can't find a certain package to download, then a tip is to use the dependency name, and the `-c` flag to specify from what channel you wish to download the dependency from:

`conda install fastai pytorch pytorch-cuda -c fastai -c pytorch -c nvidia`

### Google Colab

If you want an easy way to run this notebook, use cloud-hosted GPUs, and have an easy time with dependencies and packages, then I recommend [Google Colab](https://colab.research.google.com/). To get started upload the `main.ipynb` to Colab.

### Training Stats

When running this notebook, a directory called `training` will be created, in the root folder. It will hold a `.json` file with the stats of the model's training since its first successful training run. This way, one can view the past training stats to help with tweaking the model further. The directory will also hold the exported trained model as a `.pkl` file.

### Development

My training was computed locally on an RTX-3070 GPU.

The main software and libraries I used (specified versions are not required):
* Anaconda (1.11.1)
    * Conda (23.3.1)
* Python (3.10.9)
    * pip (22.3.1)
* PyTorch (2.0.0)
    * PyTorch CUDA (11.7)
* Fast.ai (2.7.12)

---
## TODO
* Better: `item_tfms` and `batch_tfms`.
* Single or multi-item detection.
* View bounding boxes.
* Hover effect over the second scatter plot.
* Link to thesis when done.
---

## Copyright 

Copyright (C) 2023 Samuel Granvik

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

---
This code was created by Samuel Granvik. If you use or modify this code, please give attribution to Samuel Granvik. 

Links: [Email](samgran@outlook.com) | [GitHub](https://github.com/krullmizter/) | [LinkedIn](https://www.linkedin.com/in/samuel-granvik-93977013a/)

## Installs & Imports

In [None]:
try:
  import warnings
  warnings.filterwarnings('ignore', category=UserWarning) 

  import os # Let's us interact with the underlying OS
  import json
  import gdown # Download large files from Google Drive
  import random # Random numbers
  import hashlib # Create and use hashes
  import datetime # Get date and time
  import requests # Handles HTTP requests
  import numpy as np # Math functions
  import pandas as pd # Data analysis and manipulation
  from bs4 import BeautifulSoup # Parse HTML
  from datetime import datetime # Let's us use date and time
  from matplotlib import pyplot as plt # Visualisations

  from fastai import __version__
  from fastai.vision.all import * # Computer vision
  from fastai.metrics import error_rate, accuracy, F1Score # Additional metrics
    
  print('Imports complete.\n')

except ImportError as e:
  print(f'Error importing one or more libraries: {e}\n')

if torch.cuda.is_available():
  gpu_name = torch.cuda.get_device_name(0)
  gpu_mem  = torch.cuda.mem_get_info()

  print('Using versions: \nFastai v.', __version__, '\nPyTorch v.', torch.__version__, '\nCUDA v.', torch.version.cuda)
  print(f'\nGPU: {gpu_name}\nGPU Memory: {gpu_mem[1] / 1024 / 1024 / 1024:.2f} GB')
else:
  print('Please install PyTorch CUDA, with GPU utilized the model will be faster.')

Imports complete.

Using versions: 
Fastai v. 2.7.12 
PyTorch v. 2.0.0+cu118 
CUDA v. 11.8

GPU: Tesla T4
GPU Memory: 14.75 GB


## Settings & Paths

In [None]:
# Settings, variables & paths

'''
If export_model is set to true, the code will export the trained model (.pkl) file to the trained directory
If show_plots is set to true, then a plot of the images widths and heights will be displayed
Log set to true will output training stats to the stats (.json) file

Default settings: False, False, True
'''

export_model, show_plots, log = False, False, True

# Automatic reloading, and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# Frees up GPU memory not used by PyTorch or Fast.ai, does not affect the content of the tensors
torch.cuda.empty_cache()

# Paths 

# Directories
base_dir    = os.getcwd()
dataset_dir = os.path.join(base_dir, 'stanford-dogs-dataset')
train_dir   = os.path.join(dataset_dir, 'train')
test_dir    = os.path.join(dataset_dir, 'test')
trained_dir = os.path.join(base_dir, 'trained')

# Files
dataset_zip   = os.path.join(base_dir, 'stanford-dogs-dataset.zip')
stats_file    = os.path.join(trained_dir, 'trained_model_stats.json')
trained_model = os.path.join(trained_dir, 'trained_model.pkl')
laban         = os.path.join(base_dir, 'laban.jpg')

## Load Dataset

In [None]:
''' 
This will download the Kaggle .zip file containing the Stanford Dog Breeds dataset from MY Google Drive. I host the .zip file publicly:
https://drive.google.com/file/d/1fQY2bnPPyGw9xHURMJ-SVlYB8WYztixu/view?usp=sharing
'''

if not os.path.exists(dataset_dir):
  url = 'https://drive.google.com/u/0/uc?id=1fQY2bnPPyGw9xHURMJ-SVlYB8WYztixu'
  output = dataset_zip
  gdown.download(url, output, quiet=False)

  print(f'Unzipping: {dataset_zip}')
        
  with zipfile.ZipFile(dataset_zip, 'r') as z:
    z.extractall(dataset_dir)
  z.close()
        
  print(f'\nUnzipped the dataset, will remove {dataset_zip}')

  os.remove(dataset_zip)
else:
  print(f'{dataset_dir} already exists.')

/content/stanford-dogs-dataset already exists.


## EDA - Exploratory Data Analysis

### EDA - Labels

In [None]:
labels_df = pd.read_csv(os.path.join(dataset_dir, 'labels.csv'))

print(f'Some basic info of the labels.\n')
display(labels_df.info())

print('Shows us the labels.csv file, containing IDs for images, and their corresponding breed.')
labels_df.head()

Some basic info of the labels.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10222 entries, 0 to 10221
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      10222 non-null  object
 1   breed   10222 non-null  object
dtypes: object(2)
memory usage: 159.8+ KB


None

Shows us the labels.csv file, containing IDs for images, and their corresponding breed.


Unnamed: 0,id,breed
0,000bec180eb18c7604dcecc8fe0dba07,boston_bull
1,001513dfcb2ffafc82cccf4d8bbaba97,dingo
2,001cdf01b096e06d78e9e5112d419397,pekinese
3,00214f311d5d2247d5dfe4fe24b2303d,bluetick
4,0021f9ceb3235effd7fcde7f7538ed62,golden_retriever


In [None]:
print('The three breeds/classes with the most, and least amount of images.')

amount_breed = labels_df.pivot_table(index='breed', aggfunc=len).rename(columns={'id': 'amount'})

largest  = amount_breed.nlargest(3, 'amount')
smallest = amount_breed.nsmallest(3, 'amount')

pd.concat([largest, smallest])

The three breeds/classes with the most, and least amount of images.


Unnamed: 0_level_0,amount
breed,Unnamed: 1_level_1
scottish_deerhound,126
maltese_dog,117
afghan_hound,116
briard,66
eskimo_dog,66
brabancon_griffon,67


### EDA - Images

In [None]:
if show_plots:
  # Analyze training images widths and heights
  all_widths, all_heights, min_res_list, max_res_list = [], [], [], []

  min_res_img, max_res_img = '', ''

  min_pxs = float('inf')
  max_pxs = float('-inf')
    
  # Loop over all images in the training dir. 
  for f in os.listdir(train_dir):
    img_path = os.path.join(train_dir, f)
        
    # Add the resolution of each image to separate arrays of widths and heights
    with Image.open(img_path) as img:
      w, h = img.size
        
      all_widths.append(w)
      all_heights.append(h)
        
      pxs = w * h
            
      # Check which image has the smallest and largest resolution
      if pxs < min_pxs:
        min_pxs = pxs
        min_res_list = [w, h]
        min_res_name = f
      elif pxs > max_pxs:
        max_pxs = pxs
        max_res_list = [w, h]
        max_res_name = f
      img.close()
            
  print(f'Resolution Statistics:')
  print(f'Average: { int(sum(all_widths) / len(all_widths)) }x{ int(sum(all_heights) / len(all_heights)) }px')
  print(f'Smallest: {min_res_list[0]}x{min_res_list[1]}px ({min_res_name})')
  print(f'Largest: {max_res_list[0]}x{max_res_list[1]}px ({max_res_name})')

  # Plot the distrubutions of the training image's width and height on two scatter plots
  fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))

  min_widths  = min(all_widths)+1
  max_widths  = max(all_widths)+1
  min_heights = min(all_heights)+1
  max_heights = max(all_heights)+1

  # Shows the entire training dataset distrubution
  ax1.scatter(all_widths, all_heights, alpha=0.25, s=3, color='green')
  ax1.set_title('Distribution of the training images by resolution.')
  ax1.set_xlabel('Width (px)')
  ax1.set_xticks(np.arange(0, max_widths, 200))
  ax1.set_ylabel('Height (px)')
  ax1.set_yticks(np.arange(0, max_heights, 200))

  # Shows a plot which is zoomed in on the more concentrated values
  xmin, xmax, ymin, ymax = min_res_list[0], 750, min_res_list[1], 750
  ax2.scatter(all_widths, all_heights, alpha=0.25, s=25, color='green')
  ax2.set_xlim(xmin, xmax)
  ax2.set_ylim(ymin, ymax)
  ax2.set_title('Zoomed-in view of the distribution of the training images by resolution.')
  ax2.set_xlabel('Width (px)')
  ax2.set_xticks(np.arange(xmin, xmax+1, 25))
  ax2.set_ylabel('Height (px)')
  ax2.set_yticks(np.arange(ymin, ymax+1, 50))

  plt.subplots_adjust(hspace=.25)
  plt.show()
else:
  print('If you wish to show the plots, change the show_plots variable in the settings cell.')

If you wish to show the plots, change the show_plots variable in the settings cell.


## Training

### Hyperparameters & Data Augmentation

In [None]:
# Free up some GPU memory if possible
torch.cuda.empty_cache()

arch = convnext_base # Pre-trained model we train upon
ep = 3 # One epoch refers to the entire training dataset being proceced one time in the neural network
bs = 50 # Amount of images to feed in one batch to the neural network during one training iteration
sz = 224 # The target size of each image that gets fed into the network
opt_func = Adam # The optimization function to use, such as Adam, Ranger or SGD

# Used to set the max split size (in MB) for the chunks of images that is used during traning, this is done to lower the GPU memory usage
max_split_size = 50

'''
We use transforms (tfms) to manipulate our data before traning the model. The tfms are applied to the dataloader.

item_tfms: A set amount of transformations applied to each image in the dataset to standardize the input images
Resize is always needed to scale all input images to the same size, otherwise the data loading wont work

batch_tfms: Transformations such as random noise or pixel value changes applied to a batch of images in the dataset
The * is used to unpack a list or tuple. 
'''
 
item_tfms = [
    RandomResizedCrop(sz)
]

batch_tfms = [
    *aug_transforms(size=sz), # The aug_transforms is a pre-defined list of transforms: flips, rotations, zooming, lighting among others
    Normalize.from_stats(*imagenet_stats),
]

# Check if the Resize method is used, and if any transformations are being applied to the training data. This is used later in the stats file.
if len(set(item_tfms)) > 1 or batch_tfms:
  transforms = True
else:
  transforms = False

print(f'Training will use: {ep} epochs, with a batch size of: {bs}, and the target size of each input image is: {sz}px. Applied transformations?: {transforms}')

Training will use: 3 epochs, with a batch size of: 50, and the target size of each input image is: 224px. Applied transformations?: True


### Dataloader

In [None]:
torch.cuda.empty_cache()

# We use a dataloader to load our dataset, and also augment and pre-process the data
dls = ImageDataLoaders.from_csv(
  path=dataset_dir,
  folder='train',
  test='test',
  suff='.jpg',
  size=sz,
  bs=bs,
  item_tfms=item_tfms, 
  batch_tfms=batch_tfms,
  max_split_size_mb=max_split_size,
  device = torch.device('cuda')
)

train_len = len(dls.train_ds)
val_len   = len(dls.valid_ds)
test_len  = len(os.listdir(test_dir))

# Find the percentage of the valid. dataset 
val_pct = round((val_len / (val_len + train_len) * 100))

print(f'Amount of images in each dataset\nTotal: { (train_len + val_len) + test_len }\n')
print(f'Training: {train_len}\nValidation: {val_len} ({val_pct}% of training dataset) \nTestning: {test_len}')

Amount of images in each dataset
Total: 20579

Training: 8178
Validation: 2044 (20% of training dataset) 
Testning: 10357


In [None]:
# Show some random training images and their corresonding labels
dls.show_batch(max_n=3)

### Learner

In [None]:
'''
Here we create main training object a learner object to aid in setting up, pre-process, running the training loop
The learner combines the previously created dls, and our chosen pre-trained network, along with training metrics
'''

learner = cnn_learner(
  dls,
  arch,
  metrics=[error_rate, accuracy]
)

learner.opt_func = opt_func

print(f'Will use {arch.__name__}, and {learner.opt_func.__name__} as the optimization function.')

Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth
100%|██████████| 548M/548M [00:06<00:00, 90.2MB/s]


Will use vgg19, and Adam as the optimization function.


In [None]:
# This lets us view our models architecture, layers and our defined hyperparameters
learner.summary()

### Learning Rate

In [None]:
torch.cuda.empty_cache()

'''
In this cell we are trying to find the optimal learning rate (lr) using the method: lr_find
lr_find increases the lr on a subset of training images until the loss diverges or the accuracy drops off
We want to use the .valley value for our base / suggested learning rate in later cells
lr later controls the step size of the gradient descent whilst training
'''

lr = learner.lr_find(suggest_funcs=(minimum, steep, valley, slide))
lr_sug = lr.valley
lr_sug_ex = format(lr_sug, '.2e')

print(f'Suggested lr: {lr_sug}, and its exponential notation: {lr_sug_ex}')

OutOfMemoryError: ignored

### Fine Tuning (Training)

In [None]:
torch.cuda.empty_cache()

'''
The fine_tune method allows us to train or fine-tune our model based on a pre-trained model with new data. 
fine_tune only works with the last layer of the pre-trained model, as the rest of the pre-trained layers are frozen. 
One could use the fit_one_cycle method, but it is more suited for training a model from scratch.
'''

print(f'Training with: {ep} epochs, with a learning rate of: {lr_sug_ex}\n')

# Start training timing
start_time = datetime.now()

# Train/Fine-tune
learner.fine_tune(ep, lr_sug)

# End training timing
end_time = datetime.now()

# Get the total training time in seconds
total_seconds = int((end_time - start_time).total_seconds())

# Compute hours, minutes, and seconds
hr  = total_seconds // 3600
min = (total_seconds % 3600) // 60
sec = total_seconds % 60

# Format the result to: hr:mm:sec
training_time = f'{hr:02d}:{min:02d}:{sec:02d}'

print(f'\nFine-tuning (training) complete.\nIt took: {training_time}')

Training with: 5 epochs, with a learning rate of: 1.74e-03



epoch,train_loss,valid_loss,error_rate,accuracy,time
0,1.916582,0.20579,0.062622,0.937378,03:38


epoch,train_loss,valid_loss,error_rate,accuracy,time


OutOfMemoryError: ignored

## Evaluation & Logging

### Evaluate Validation Dataset

In [None]:
# The validate method is used to output the performance of the trained model against the validation dataset
valid_metrics = learner.validate()

# Convert the validation metrics to floating point numbers with two trailing decimal points, i.e. percentage values
valid_metrics = [round(x*100, 2) for x in valid_metrics]

In [None]:
'''
We combine the validation metrics, hyperparameters, and additional metrics for the trained model in a single cell and save/log them into a .json file
This helps us gain a comprehensive understanding of the model's performance, preserve previous training statistics, and identify areas that require improvement
We utilize a hash fucntion, that creates a hash based on all the changeable parameters like GPU, EP, BS, and Arch among others. The hash updates, if any of the parameters are changed
This approach enables us to check if the metrics we are attempting to append to the stats file already exist, thereby preventing duplicates
'''

if log:
  # Get current date and time
  now = datetime.now()
  curr_time = now.strftime('%d/%m/%Y - %H:%M:%S')

  # A list of additional metrics, and the hyperparameters used during training
  add_metrics = [curr_time, training_time, arch.__name__, learner.opt_func.__name__,gpu_name, val_pct, ep, bs, sz, lr_sug_ex, transforms]

  # Store both the additional and validation metrics in a single dataframe, with custom column names
  trained_stats_df = pd.DataFrame([
    add_metrics + valid_metrics], 
    columns=['Time Created', 'Training Time', 'Arch', 'Opt. Func.', 'GPU', 'Train/Val. Split(%)', 'EP', 'BS', 'SZ', 'LR', 'Transforms?', 'Loss(%)', 'Error(%)', 'Accuracy(%)'
  ])
        
  # Convert the df to a JSON formatted string
  trained_stats_json = trained_stats_df.to_json(orient='records', double_precision=2)

  # Create a Python dictionary of the above created JSON string
  trained_stats_dict = json.loads(trained_stats_json)

  # Create the trained dir. if it doesn't exist
  if not os.path.exists(trained_dir):
    os.mkdir(trained_dir)
    print(f'Created: {trained_dir}')

  # Create a unique hash based on all the changeable parameters
  hash_values = arch.__name__ + learner.opt_func.__name__ + gpu_name + str(val_pct) + str(ep) + str(bs) + str(sz) + lr_sug_ex + str(transforms)
  curr_hash = hashlib.sha256(hash_values.encode('utf-8')).hexdigest()
    
  '''
  If the stats file exists, and the current hash isn't in the JSON file, append the metrics to the stats file
  If the stats file dosen't exist create it and add the metrics to the stats file
  '''

  if os.path.exists(stats_file):
    with open(stats_file, 'r') as file_read:
      stats_file_data = json.load(file_read)

      if curr_hash in stats_file_data:
        print('The metrics you are trying to append to the stats file already exist.')
      else:
        stats_file_data[curr_hash] = trained_stats_dict
        file_read.close()

        with open(stats_file, 'w') as file_write:
          json.dump(stats_file_data, file_write, indent=2)
        file_write.close()

        print('Appended new metrics to the stats file.')
  else:
    with open(stats_file, 'w') as file_write:
      json.dump({curr_hash: trained_stats_dict}, file_write, indent=2)
    file_write.close()
        
    print(f'Created the stats file, and added the metrics to it.')
else:
  print('To log training stats, set the log variable in the settings cell to True.')

### Evaluate Test Dataset

In [None]:
# Here we use get_preds to test the trained model on a test dataset, contrary to the validate method get_preds is testing on more 'real-world' data 

test_files = get_image_files(test_dir)

# Create a test dataloader
test_dl = dls.test_dl(test_files, bs=bs)

# Get the predictions on the test dataloader
preds, targs = learner.get_preds(dl=test_dl)

preds.shape

## Post Training Analysis

In [None]:
# Shows the steps, callbacks, and performance of the model during each training iteration
learner.show_training_loop()

In [None]:
# Show a random batch of images from the trained model
learner.show_results(max_n=3)

In [None]:
# Lets us view, interpret and analyze performance of our trained classification model
interp = ClassificationInterpretation.from_learner(learner)

In [None]:
# We plot a matrix that shows us the the distrubution of correctly and incorrectly classified classes
interp.plot_confusion_matrix(figsize=(15, 15))

In [None]:
# The 6 worst predicted images i.e. the images with highest losses
interp.plot_top_losses(6, figsize=(15, 10))

In [None]:
# Lets us view the most confused classes classed during training, and how many times they were wrongly predicted
df = pd.DataFrame(data=interp.most_confused(min_val=5))
df.columns = ['Predicted', 'Actual', 'Amount of wrong predictions']
df.head()

In [None]:
'''
We use the F1-score to measure the model's accuracy for both binary/single or multi-class classifications
It uses a weighted average on the precission and recall of a trained model, F1-score = 2 * (precision * recall) / (precision + recall)
F1-scores ranges between 0 - 1, 1 = perfect precision and recall, 0 = model is performing worse than random chance.
'''

interp.print_classification_report()

## Predictions

In [None]:
# Here we have a function to predict the class and confidence on one image, based on a trained model, provided by a path
def predict_img(img):
  print(img)
  pred_class, pred_idx, probs = learner.predict(img)
  conf = probs[pred_idx] * 100
  print(f'Predicted class: {pred_class.capitalize()}, confidence: {conf:.2f}%')

In [None]:
# Choose an unlabeled, unseen image from the test dataset for prediction
test_img = Path(test_dir).ls()[123]
predict_img(test_img)

In [None]:
# A function that lets us use the BeautifulSoup and HTTP response library to search and download a random dog image from Google
def rand_dog_image_pred():
  from PIL import Image

  url = 'https://www.google.com/search?q=dog&tbm=isch'
  res = requests.get(url)
    
  # Parse the HTML content using BeautifulSoup
  soup = BeautifulSoup(res.content, 'html.parser')
    
  # Fetch all image tags from the page
  img_tags = soup.find_all('img')
    
  # Filter the image tags to only include ones with a src attribute
  img_tags = [img for img in img_tags if 'src' in img.attrs]
    
  # Choose a random image
  random_img_tag = random.choice(img_tags)

  img_url = random_img_tag['src']
    
  # Do a HTTP request to get the image content
  img_res = requests.get(img_url)

  img_name = 'rand_dog.jpg'
    
  # Write the random dog image to a file named rand_dog in the root dir.
  with open(img_name, 'wb') as f:
    f.write(img_res.content)
  f.close()
        
  print(f'Downloaded one random dog image from:\n{img_url}\n')
    
  # Print out the downloaded image
  img = Image.open(img_name)
  img.show()

  # Predict
  predict_img(os.path.realpath(f.name))

  # Remove the image after the prediciton
  os.remove(img_name)

In [None]:
rand_dog_image_pred()

In [None]:
predict_img(Path(laban))

In [None]:
'''
Here we evalute the predicitions of our trained model on the entire test dataset
We first need to list all the images file paths using get_image_files
Then we create a test dataloader object and pass it to the get_preds method
get_prds returns a tuple of predictions and labels
'''

test_files = get_image_files(test_dir)
test_dl = learner.dls.test_dl(test_files, bs=sz)

preds, targs = learner.get_preds(dl=test_dl)

## Export Trained Model

In [None]:
''' 
Exports the trained model as a .pki file in the trained directory
This .pkl file can be used to predict new images
The exported .pkl file can't be used to develop the model further, it is just used for direct predictions
'''

if export_model:
  learner.export(trained_model)
  print(f'Exported trained model file {trained_model}')       
else:
  print('Didn\'t export the trained model, change the export variable in the settings cell if you wish to export.')