<a href="https://colab.research.google.com/github/Agewerc/wildlifeComputerVision/blob/master/wild_animal_computer_vision.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An Algorithm for Wildlife Animals Identification

In notebook we will be working with the Oregon Wildlife [dataset](https://www.kaggle.com/virtualdvid/oregon-wildlife/kernels) created by [David Molina](https://www.kaggle.com/virtualdvid) with a google scrapper.It constains about 14.000 pictures of 19 different wildlife species such as Deers, Cougars, Grey Wolfs and so on.

Wouldn't it be fun to have a app that tells you what animals are you observing in the nature when you are camping or hiking? This project may be a first step to create this app.

We aim to generate a model that scans an image and identify what is the animal species on the screen. To achieve this goal we will make use of **Convolutional Neural Networks.** <br><br>

## How does a Convolutional Neural Network function ?  

[deep.ai](https://deepai.org/machine-learning-glossary-and-terms/convolutional-neural-network): CNNs process images as volumes, receiving a color image as a rectangular box where the width and height are measure by the number of pixels associated with each dimension, and the depth is three layers deep for each color (RGB). These layers are called channels. Within each pixel of the image, the intensity of the R, G, or B is expressed by a number. That number is part of three, stacked two-dimensional matrices that make up the image volume and form the initial data that is fed to into the convolutional network. The network then begins to filter the image by grouping squares of pixels together and looking for patterns, performing what is known as a convolution. This process of pattern analysis is the foundation of CNN functions.<br><br>


![CNN](https://miro.medium.com/max/2510/1*vkQ0hXDaQv57sALXAJquxA.jpeg)


### Libraries
- numpy (Linear Algebra)
- pandas (Data Manipulation and Analysis)
- glob (File Manipulation)
- os (File Manipulation)
- regex (text patterns)
- random (sampling data)
- PIL (image processing)
- Sklearn (Evaluation Metrics)
- Pytorch (Deep Learning)


### The Road Ahead

We break the notebook into separate steps.  Feel free to use the links below to navigate the notebook.

* [Step 1](#step1): Import Libraries and Load the Dataset
* [Step 2](#step2): Create a CNN to Classify Wild Animals (from Scratch)
* [Step 3](#step3): Create a CNN to Classify Wild Animals (using Transfer Learning)
* [Step 4](#step3): Test the model and create an Algorithm

We first mount the folder on the google drive with the dataset.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


https://github.com/Agewerc/wildlifeComputerVision/tree/master

<a id='step1'></a>
## Step 1: Import Dataset and Libraries

Our fist step is to import all the libraries used in this project.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from glob import glob
import pandas as pd
import os
import re
import random
from PIL import Image
from sklearn.preprocessing import LabelEncoder
import os
import pandas as pd
from skimage import io
import torch
import torch.nn as nn  # All neural network modules, nn.Linear, nn.Conv2d, BatchNorm, Loss functions
import torch.optim as optim  # For all Optimization algorithms, SGD, Adam, etc.
import torchvision.transforms as transforms  # Transformations we can perform on our dataset
import torchvision
from torch.utils.data import (Dataset, DataLoader)  # Gives easier dataset managment and creates mini batches
from torchvision.datasets import ImageFolder
import torchvision.models as models

## Visualize the data

First lets examine the files.
- Which are the species?
- How many pictures do we have from each animal?



In [None]:
!pip install kaggle




In [None]:
!kaggle datasets download -d virtualdvid/oregon-wildlife


Dataset URL: https://www.kaggle.com/datasets/virtualdvid/oregon-wildlife
License(s): unknown
Downloading oregon-wildlife.zip to /content
100% 4.54G/4.55G [00:56<00:00, 104MB/s]
100% 4.55G/4.55G [00:56<00:00, 86.5MB/s]


In [None]:
!unzip oregon-wildlife.zip


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f393079cfbbc7b8be6.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f3be83575a3a92d435.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f3e101456dcdac39dd.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f4acd5fcab454fef8f.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f4c09f2366d77e2ae6.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f50d7a4e8bd196a9eb.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f53e5ae3b6b216d79f.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f6116e16c85f34b0b2.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f71539caada8e56085.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f9bd8ffa9626adbd93.png  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/f9c45486969596ebfe.jpg  
  inflating: oregon_wildlife/oregon_wildlife/gray_wolf/

In [None]:
import os

# Path to the unzipped dataset folder
dataset_path = "./oregon_wildlife"
animals_list = os.listdir(dataset_path)
animals_file_list = []

for i in range(len(animals_list)):
    # List all files in each animal's directory
    animals_file_list.append(os.listdir(os.path.join(dataset_path, animals_list[i])))
    n = len(animals_file_list[i])
    print('There are', n , animals_list[i], 'images.')


Apparently we have a balanced dataset,  which means we have a similar 9but not equal) proportion of images of animals from each species.
To have a more real notion let's visualize the animals.






In [None]:
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from torchvision import transforms
from PIL import Image

# Path to the unzipped dataset
dataset_path = "./oregon_wildlife/oregon_wildlife" # Modified path to include the nested directory

animals_list = os.listdir(dataset_path)
animals_file_list = []

# Create a list of files (images) for each animal category
for i in range(len(animals_list)):
    animals_file_list.append(os.listdir(os.path.join(dataset_path, animals_list[i])))

# Check the structure of your lists
print(animals_list)  # Check the folder names
print(animals_file_list)  # Check the image file names in each folder

# Plot settings
fig = plt.figure(figsize=(16, 16))
columns = 4
rows = 5

# Loop through the animals and display images
for i in range(1, len(animals_list) + 1):
    # Construct the full path to the image
    img_path = os.path.join(dataset_path, animals_list[i-1], animals_file_list[i-1][0])

    # Check if the image file exists
    if not os.path.exists(img_path):
        print(f"File not found: {img_path}")
        continue  # Skip this iteration if the file doesn't exist

    # Load the image using its new path
    img = Image.open(img_path)

    # Apply transformations: Resize to 256x256
    compose = transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor()  # Convert to tensor for compatibility with PIL
    ])
    img = compose(img)

    # Plot each image
    fig.add_subplot(rows, columns, i)
    plt.axis('off')
    plt.title(animals_list[i-1])
    plt.imshow(img.permute(1, 2, 0))  # Re-order dimensions for displaying (channels first to channels last)

plt.show()

Very beautiful animals right?

## Load Data

Now we will make use of Pytorch elements `transform`, `ImageFolder`, `DataLoader` to load the data.

The next steps will be the following.

1. Create a dataframe with the name of each file, the animal and the absolute path.
2. Select files that will further be in the train, test and validation sets.
3. Perform transformation in the data such as reshaping, croping and rotation that will allow the images that are from different sizes to be analyzed together.
4. Load the datasets using the DataLoader function, that will transform the images in tensors that will be analysed by the CNN.

In [None]:
dir = '/content/gdrive/My Drive/oregon_wildlife'
files = [f for f in glob(dir + "**/**", recursive=True)] # create a list will allabsolute path of all files

In [None]:
!pip install pandas
import pandas as pd
from glob import glob

dir = './oregon_wildlife' # Modified path to the dataset directory
files = [f for f in glob(dir + "/**/*", recursive=True)] # create a list will allabsolute path of all files

df_animals = pd.DataFrame({"file_path":files}) # transform in a dataframe
df_animals['animal'] = df_animals['file_path'].str.extract(r'oregon_wildlife/(.+?)/') # extract the name of the animal using regex
df_animals['file'] = df_animals['file_path'].str.extract(r'oregon_wildlife/.+/(.+)') # extrat the file name using regex
df_animals = df_animals.dropna() # drop nas

Now we split the data in train, test and validation (inside the dataframe).

In [None]:
animal_set = set(df_animals['animal'])
train_val_test_list = [0,1,2]
train_val_weights = [70,15,15]
df_animals['train_val_test'] = 'NA'

for an in animal_set:
  n = sum(df_animals['animal'] == an) # count the number of animals
  train_val_test = random.choices(train_val_test_list, weights= train_val_weights,  k=n)
  df_animals.loc[df_animals['animal'] == an, 'train_val_test'] = train_val_test

Now we will create the dictonary `transform`. it will be used to transform the train, test and validation datasets.
We will apply different transformations on the train and test/validation datasets. Data augmentation is used in the trainning dataset to avoid overfitting, that means to avoid the a very good performance on the trainning set but a bed performane on the validation and testing datasets (bed generalization). The methods used were:
- Flipping the images horizontally
- Random Cropping: Extract randomly a 224 × 224 pixels section from 256 × 256 pixels
- RandomRotation: Randomly rotate the image by 10 degrees.


In [None]:
transform = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(10),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ]),
    'valid': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ]),
}


We create an auxiliary function to make sure the data is correctly splited among train, test and validation.   

In [None]:
def check_train(path):
    return (df_animals[df_animals['file_path'] == path].train_val_test == 0).bool

def check_valid(path):
    return (df_animals[df_animals['file_path'] == path].train_val_test == 1).bool

def check_test(path):
    return (df_animals[df_animals['file_path'] == path].train_val_test == 2).bool

#### Load the dataset

In [None]:
# Reading Dataset
image_datasets = {
    'train' : ImageFolder(root= dir, transform=transform['train'], is_valid_file=check_train),
    'valid' : ImageFolder(root=dir, transform=transform['valid'], is_valid_file=check_valid),
    'test' : ImageFolder(root=dir, transform=transform['test'], is_valid_file=check_test)
}

In [None]:
num_workers = 0
batch_size = 50

loaders_scratch = {
    'train' : DataLoader(image_datasets['train'], shuffle = True, batch_size = batch_size),
    'valid' : DataLoader(image_datasets['valid'], shuffle = True, batch_size = batch_size),
    'test' : DataLoader(image_datasets['test'], shuffle = True, batch_size = batch_size)
}

#### USE GPU

In [None]:
# check if CUDA is available
use_cuda = torch.cuda.is_available()

<a id='step2'></a>
## Step 2: Create a CNN to Classify Wild Animals (from Scratch)

We create a CNN that reveives tensors of `224 x 224 x 3` dimensions (that's how we prepared the dataset).

Some of the elements of our network built from scratch.
- Five convolutional layers. The last will return a tensor of `128 x 256, x 3` dimensions. Padding equals 1.
- A relu function applied after every convolutional iteration.
- A pooling function applied after every convolutional iteration.
- Two dropout layers to avoid overffiting.
- Two fully connected layers.  




 that will transform `128 x 256 x 3`
- relu


In [None]:
import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer (sees 224 x 224 x 3 image tensor)
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        # convolutional layer (sees 122 x 122 x 16 tensor)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        # convolutional layer (sees 56 x 56 x 32 tensor)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        # convolutional layer (sees 28 x 28 x 64 tensor)
        self.conv4 = nn.Conv2d(64, 128, 3, padding=1)
        # convolutional layer (sees 14 x 14 x 128 tensor)
        self.conv5 = nn.Conv2d(128, 256, 3, padding=1)

        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        # dropout layer (p=0.25)
        self.dropout = nn.Dropout(0.25)

        self.conv_bn1 = nn.BatchNorm2d(224,3)
        self.conv_bn2 = nn.BatchNorm2d(16)
        self.conv_bn3 = nn.BatchNorm2d(32)
        self.conv_bn4 = nn.BatchNorm2d(64)
        self.conv_bn5 = nn.BatchNorm2d(128)
        self.conv_bn6 = nn.BatchNorm2d(256)

        # linear layer (64 * 4 * 4 -> 133)
        self.fc1 = nn.Linear(256 * 7 * 7, 512)
        # linear layer (133 -> 133)
        self.fc2 = nn.Linear(512, 20)


    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.conv_bn2(self.pool(F.relu(self.conv1(x))))
        x = self.conv_bn3(self.pool(F.relu(self.conv2(x))))
        x = self.conv_bn4(self.pool(F.relu(self.conv3(x))))
        x = self.conv_bn5(self.pool(F.relu(self.conv4(x))))
        x = self.conv_bn6(self.pool(F.relu(self.conv5(x))))
        # flatten image input
        x = x.view(-1, 256 * 7 * 7)
        # add dropout layer
        x = self.dropout(x)
        # add 1st hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        # add dropout layer
        x = self.dropout(x)
        # add 2nd hidden layer, with relu activation function
        x = self.fc2(x)
        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()
print(model_scratch)

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()

We define an optmizer and a loss function.

- **loss function**: Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0. From the [ml-cheatsheet](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html)

- **optimizer**: Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in high-dimensional optimization problems this reduces the computational burden, achieving faster iterations in trade for a lower convergence rate. From [wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)

In [None]:

# specify loss function
criterion_scratch = nn.CrossEntropyLoss()

# specify optimizer
optimizer_scratch = optim.SGD(model_scratch.parameters(), lr=0.001, momentum=0.9)

In [None]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):

    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf

    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0

        ###################
        # train the model #
        ###################
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # perform a single optimization step (parameter update)
            optimizer.step()
            # update training loss
            ## record the average training loss, using something like
            train_loss = train_loss + (1 / (batch_idx + 1)) * (loss.data - train_loss)


        ######################
        # validate the model #
        ######################
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            # update average validation loss
            valid_loss = valid_loss + (1 / (batch_idx + 1)) * (loss.data - valid_loss)


        # print training/validation statistics
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch,
            train_loss,
            valid_loss
            ))

        ## TODO: save the model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            torch.save(model.state_dict(), '/content/gdrive/My Drive/model_scratch.pt')
            valid_loss_min = valid_loss

    # return trained model
    return model

# train the model
model_scratch = train(25, loaders_scratch, model_scratch, optimizer_scratch, criterion_scratch, use_cuda, 'model_scratch.pt')

# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('/content/gdrive/My Drive/model_scratch.pt'))



Epoch: 1 	Training Loss: 0.088085 	Validation Loss: 0.030234
Validation loss decreased (inf --> 0.030234).  Saving model ...
Epoch: 2 	Training Loss: 0.000973 	Validation Loss: 0.018603
Validation loss decreased (0.030234 --> 0.018603).  Saving model ...
Epoch: 3 	Training Loss: 0.000654 	Validation Loss: 0.012961
Validation loss decreased (0.018603 --> 0.012961).  Saving model ...
Epoch: 4 	Training Loss: 0.000458 	Validation Loss: 0.010354
Validation loss decreased (0.012961 --> 0.010354).  Saving model ...
Epoch: 5 	Training Loss: 0.000359 	Validation Loss: 0.008294
Validation loss decreased (0.010354 --> 0.008294).  Saving model ...
Epoch: 6 	Training Loss: 0.000263 	Validation Loss: 0.007032
Validation loss decreased (0.008294 --> 0.007032).  Saving model ...
Epoch: 7 	Training Loss: 0.000242 	Validation Loss: 0.005830
Validation loss decreased (0.007032 --> 0.005830).  Saving model ...
Epoch: 8 	Training Loss: 0.000218 	Validation Loss: 0.005224
Validation loss decreased (0.00583

### (IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images.  Use the code cell below to calculate and print the test loss and accuracy.  Ensure that your test accuracy is greater than 10%.

In [None]:
def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    if torch.cuda.is_available():
      model.cuda()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)

    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

# call test function
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)

<a id='step3'></a>
## Step 3: Create a CNN to Classify Wild Animals (using Transfer Learning)

We will now use transfer learning to create a CNN that can identify the animals from the images.

From [neurohive](https://neurohive.io/en/popular-networks/vgg16/):
*VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.*










In [None]:
## TODO: Specify data loaders
loaders_transfer = loaders_scratch

In [None]:
import torchvision.models as models
import torch.nn as nn

model_transfer = models.vgg16(pretrained=True)

for param in model_transfer.features.parameters():
    param.requires_grad = False

n_inputs = model_transfer.classifier[6].in_features
last_layer = nn.Linear(n_inputs, 133)
model_transfer.classifier[6] = last_layer


# if GPU is available, move the model to GPU
if use_cuda:
    model_transfer.cuda()
print(model_transfer)

In [None]:
criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.SGD(model_transfer.classifier.parameters(), lr=0.001)

In [None]:
# train the model
model_transfer = train(25, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')

In [None]:
model_transfer.load_state_dict(torch.load('/content/gdrive/My Drive/model_scratch.pt'))

In [None]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)

In [None]:
### TODO: Write a function that takes a path to an image as input
### and returns the animal that is predicted by the model.

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from torch.autograd import Variable
import random
import re

# create a list with a class names
class_names = image_datasets['train'].classes
class_names = [re.sub("\d{3}.", "", item) for item in class_names]
class_names = [re.sub("_", " ", item) for item in class_names]

def predict_breed_transfer(img_path):

    # load the image and return the predicted breed
    img = Image.open(img_path) # Load the image from provided path

    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )

    preprocess = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize]
    )

    img_tensor = preprocess(img).float()
    img_tensor.unsqueeze_(0)  # Insert the new axis at index 0 i.e. in front of the other axes/dims.
    img_tensor = Variable(img_tensor) #The input to the network needs to be an autograd Variable

    if use_cuda:
        img_tensor = Variable(img_tensor.cuda())

    model_transfer.eval()
    output = model_transfer(img_tensor) # Returns a Tensor of shape (batch, num class labels)
    output = output.cpu()

    # Our prediction will be the index of the class label with the largest value.
    predict_index = output.data.numpy().argmax()

    predicted_breed = class_names[predict_index]
    true_breed = image_datasets['train'].classes[predict_index]

    return (predicted_breed, true_breed)

# Create list of test image paths
test_img_paths = list(df_animals[df_animals.train_val_test == 2].file_path)
np.random.shuffle(test_img_paths)

for img_path in test_img_paths[0:20]:
    predicted_breed, true_breed = predict_breed_transfer(img_path)
    print("Predicted Animal:" , predicted_breed, "\n", "True Animal:" , true_breed)
    img=mpimg.imread(img_path)
    imgplot = plt.imshow(img)
    plt.show()
