# Dog Model Detector
## By: Jon Barker
## Date: 15-January-2024

This is a model I am working through for my February Machine Learning YouTube series. I will be using the Stanford dogs dataset to train a Convolution Neural Network (CNN) to be able to recognize dog faces and dog breeds. After this model is created, the goal will be to implement this model into a live streaming webcam. With the processing occurring, I want to be able to trigger some type of hardware based around the recognition

Libraries:

This model will be using sklearn, and keras for machine learning processing.

In [58]:
import numpy as np
import random
import cv2                
import matplotlib.pyplot as plt   
import sys
import os
import dlib
import pandas as pd

from skimage import io
from tqdm import tqdm
from glob import glob
from sklearn.datasets import load_files


from keras.callbacks import ModelCheckpoint  
from keras.preprocessing import image                  
from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
import np_utils

from IPython.display import Image, display

import xml.etree.ElementTree as ET
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split



Tensor breakdown:

The below functions are used for extracting features from images using different pre-trained deep learning models from the Keras library. Each function is designed to work with a specific model: VGG16, VGG19, ResNet50, Xception, and InceptionV3

In [59]:
def extract_VGG16(tensor):
	from keras.applications.vgg16 import VGG16, preprocess_input
	return VGG16(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def extract_VGG19(tensor):
	from keras.applications.vgg19 import VGG19, preprocess_input
	return VGG19(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def extract_Resnet50(tensor):
	from keras.applications.resnet50 import ResNet50, preprocess_input
	return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def extract_Xception(tensor):
	from keras.applications.xception import Xception, preprocess_input
	return Xception(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def extract_InceptionV3(tensor):
	from keras.applications.inception_v3 import InceptionV3, preprocess_input
	return InceptionV3(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

## Dataset Cleaning Babyyyy:

This function, `load_dataset`, will process the image and annotation directories to create the required datasets. It uses a label encoder to convert breed names into numerical labels and then onehot-encodes these labels. The function returns the image file paths, the onehot-encoded targets, and the list of dog breed names (dog_names) for the training dataset. The same list is used for the validation and test datasets but is not returned, as the breed names will be the same across all sets.

When you run this code, ensure that the paths ('dogImages/train', 'dogImages/valid', 'dogImages/test', and 'annotations/Annotation') match the structure of your dataset. Also, make sure all necessary Python packages (like `glob`, `numpy`, `xml.etree.ElementTree`, `sklearn.preprocessing`, etc.) are installed and imported.


When the Stanford Dataset is unzipped, you see it's organized into two different folders, the images themselves and the Annotations containing the XML files, but without the XML extension...

The parse_annotation script goes through and creates a python list for the features of the XML file


In [60]:
# Updated function to parse the annotation files (without .xml extension)
def parse_annotation(file_path):
    """Parse an annotation file and extract details."""
    tree = ET.parse(file_path)
    root = tree.getroot()

    annotation_details = {
        "folder": root.find("folder").text,
        "filename": root.find("filename").text,
        "class_name": root.find("object/name").text,
        "bndbox": {
            "xmin": int(root.find("object/bndbox/xmin").text),
            "ymin": int(root.find("object/bndbox/ymin").text),
            "xmax": int(root.find("object/bndbox/xmax").text),
            "ymax": int(root.find("object/bndbox/ymax").text)
        }
    }
    return annotation_details

Once we have the annotations in a python list, we need to be able to actually match them with the image in the images folder. The Code below matches the annotations with the images in the nested imaged path.

In [61]:
# Step 2: Loop through annotations and match with images

# Paths to the images and annotations directories
dataset_base_path = os.getcwd()  # Update this with the actual dataset path
images_nested_path = os.path.join(dataset_base_path, 'dogImages', 'images', 'Images')
annotations_nested_path = os.path.join(dataset_base_path, 'dogImages', 'annotations', 'Annotation')

    # Extract data
data = []
nested_annotation_files = os.listdir(annotations_nested_path)


for breed_folder in nested_annotation_files:
        breed_annotation_path = os.path.join(annotations_nested_path, breed_folder)
        if os.path.isdir(breed_annotation_path):
            for annotation_file in os.listdir(breed_annotation_path):
                annotation_path = os.path.join(breed_annotation_path, annotation_file)
                if os.path.isfile(annotation_path):
                    annotation_data = parse_annotation(annotation_path)
                    image_path = os.path.join(images_nested_path, breed_folder, annotation_data["filename"] + ".jpg")
                    data.append({
                        "image_path": image_path,
                        "class_name": annotation_data["class_name"],
                        "bounding_box": annotation_data["bndbox"],
                        "breed": breed_folder
                    })

Pandas is the goat, so we need a panda

In [62]:
# Create DataFrame
df = pd.DataFrame(data)
df.head()

Unnamed: 0,image_path,class_name,bounding_box,breed
0,/Users/jon/Documents/Python/dog_detector/dog_d...,silky_terrier,"{'xmin': 93, 'ymin': 117, 'xmax': 269, 'ymax':...",n02097658-silky_terrier
1,/Users/jon/Documents/Python/dog_detector/dog_d...,silky_terrier,"{'xmin': 216, 'ymin': 69, 'xmax': 498, 'ymax':...",n02097658-silky_terrier
2,/Users/jon/Documents/Python/dog_detector/dog_d...,silky_terrier,"{'xmin': 0, 'ymin': 0, 'xmax': 366, 'ymax': 331}",n02097658-silky_terrier
3,/Users/jon/Documents/Python/dog_detector/dog_d...,silky_terrier,"{'xmin': 174, 'ymin': 1, 'xmax': 1018, 'ymax':...",n02097658-silky_terrier
4,/Users/jon/Documents/Python/dog_detector/dog_d...,silky_terrier,"{'xmin': 124, 'ymin': 27, 'xmax': 357, 'ymax':...",n02097658-silky_terrier


In [63]:
# Split into train and test sets
train_df, test_df = train_test_split(df, test_size=0.2)  # 80% train, 20% test

# Display first few rows of train and test DataFrames
print("Train DataFrame:")

print(train_df.shape)
train_df.head()


Train DataFrame:
(16464, 4)


Unnamed: 0,image_path,class_name,bounding_box,breed
1631,/Users/jon/Documents/Python/dog_detector/dog_d...,Pembroke,"{'xmin': 185, 'ymin': 169, 'xmax': 346, 'ymax'...",n02113023-Pembroke
16864,/Users/jon/Documents/Python/dog_detector/dog_d...,basenji,"{'xmin': 52, 'ymin': 46, 'xmax': 400, 'ymax': ...",n02110806-basenji
14713,/Users/jon/Documents/Python/dog_detector/dog_d...,African_hunting_dog,"{'xmin': 110, 'ymin': 13, 'xmax': 474, 'ymax':...",n02116738-African_hunting_dog
18206,/Users/jon/Documents/Python/dog_detector/dog_d...,Afghan_hound,"{'xmin': 1, 'ymin': 28, 'xmax': 302, 'ymax': 463}",n02088094-Afghan_hound
14180,/Users/jon/Documents/Python/dog_detector/dog_d...,Australian_terrier,"{'xmin': 45, 'ymin': 50, 'xmax': 240, 'ymax': ...",n02096294-Australian_terrier


In [64]:
print(test_df.shape)
test_df.head()

(4116, 4)


Unnamed: 0,image_path,class_name,bounding_box,breed
14052,/Users/jon/Documents/Python/dog_detector/dog_d...,Rottweiler,"{'xmin': 15, 'ymin': 42, 'xmax': 321, 'ymax': ...",n02106550-Rottweiler
18560,/Users/jon/Documents/Python/dog_detector/dog_d...,Welsh_springer_spaniel,"{'xmin': 58, 'ymin': 16, 'xmax': 373, 'ymax': ...",n02102177-Welsh_springer_spaniel
13998,/Users/jon/Documents/Python/dog_detector/dog_d...,Rottweiler,"{'xmin': 1, 'ymin': 92, 'xmax': 467, 'ymax': 373}",n02106550-Rottweiler
17205,/Users/jon/Documents/Python/dog_detector/dog_d...,Bedlington_terrier,"{'xmin': 256, 'ymin': 94, 'xmax': 458, 'ymax':...",n02093647-Bedlington_terrier
2057,/Users/jon/Documents/Python/dog_detector/dog_d...,Staffordshire_bullterrier,"{'xmin': 115, 'ymin': 11, 'xmax': 499, 'ymax':...",n02093256-Staffordshire_bullterrier


In [66]:
# Print statistics about the dataset
#print('There are %d total dog categories.' % len(dog_names))
#print('There are %s total dog images.\n' % len(np.hstack([train_files])))
#print('There are %d training dog images.' % len(train_files))
#print('There are %d validation dog images.' % len(valid_files))
#print('There are %d test dog images.' % len(test_files))

# Step 2: Detect Dogs

In this section, we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

In [67]:
from tensorflow.keras.applications import ResNet50
# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

# Pre-process the Data
When using TensorFlow as backend, Keras CNNs require a 4D array (which we'll also refer to as a 4D tensor) as input, with shape 

<center>(nb_samples,rows,columns,channels)</center>

where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

The path_to_tensor function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is  224×224
  pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape

(1,224,224,3).
 
The paths_to_tensor function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape

(nb_samples,224,224,3).
 
Here, nb_samples is the number of samples, or number of images, in the supplied array of image paths. It is best to think of nb_samples as the number of 3D tensors (where each 3D tensor corresponds to a different image) in your dataset!

In [68]:
def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

# Making Predictions with ResNet-50
Getting the 4D tensor ready for ResNet-50, and for any other pre-trained model in Keras, requires some additional processing. First, the RGB image is converted to BGR by reordering the channels. All pre-trained models have the additional normalization step that the mean pixel (expressed in RGB as  [103.939,116.779,123.68]
  and calculated from all pixels in all images in ImageNet) must be subtracted from every pixel in each image. This is implemented in the imported function preprocess_input. If you're curious, you can check the code for preprocess_input here.

Now that we have a way to format our image for supplying to ResNet-50, we are now ready to use the model to extract the predictions. This is accomplished with the predict method, which returns an array whose  i
 -th entry is the model's predicted probability that the image belongs to the  i
 -th ImageNet category. This is implemented in the ResNet50_predict_labels function below.

By taking the argmax of the predicted probability vector, we obtain an integer corresponding to the model's predicted object class, which we can identify with an object category through the use of this dictionary.

In [69]:
def ResNet50_predict_labels(img_path):
    # returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(ResNet50_model.predict(img))

## Write a Dog Detector
While looking at the dictionary, you will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151-268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained ResNet-50 model, we need only check if the ResNet50_predict_labels function above returns a value between 151 and 268 (inclusive).

We use these ideas to complete the dog_detector function below, which returns True if a dog is detected in an image (and False if not).

In [70]:
### returns "True" if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    prediction = ResNet50_predict_labels(img_path)
    return ((prediction <= 268) & (prediction >= 151)) 

## (IMPLEMENTATION) Assess the Dog Detector
Question 3: Use the code cell below to test the performance of your dog_detector function.

What percentage of the images in human_files_short have a detected dog?
What percentage of the images in dog_files_short have a detected dog?
Answer: The pre-trained ResNet-50 model detects 2.0% of dogs in the first 100 images in human_files and 100.0% of faces in the first 100 images in dog_files.

In [73]:
def performance_dog_detector(dog_files_short):
    dog_files_short = dog_files_short.tolist()
  
    face_dog = [int(dog_detector(dog_img)) for dog_img in dog_files_short]
    print(f"face_dog = {face_dog}")
    ratio_dog = sum(face_dog)/len(face_dog)*100
    
    print ('{}% of dogs faces in the first 100 images in dog_files with ResNet-50'.format(ratio_dog))

In [74]:
data = test_df
data = data[~data['image_path'].str.endswith('%s.jpg')]
data_first_100 = data.iloc[:100]


performance_dog_detector(data_first_100['image_path'])

face_dog = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1]
95.0% of dogs faces in the first 100 images in dog_files with ResNet-50


In [78]:
data_first_100['image_path'].iloc[11]

'/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02093754-Border_terrier/n02093754_5404.jpg'

In [35]:
dog_files_short = test_df['image_path'].tolist()
print(dog_files_short)
    

['/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02109961-Eskimo_dog/n02109961_12107.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02089867-Walker_hound/n02089867_1208.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02100877-Irish_setter/n02100877_131.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02104029-kuvasz/n02104029_3766.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02110958-pug/n02110958_14654.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02086240-Shih-Tzu/n02086240_646.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02105251-briard/n02105251_8075.jpg', '/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02115913-dhole/n02115913_4025.jpg', '/Users/jon/Documents/Python/dog_detector/dog_det

In [None]:
'/Users/jon/Documents/Python/dog_detector/dog_detector/dogImages/images/Images/n02091635-otterhound/%s.jpg'