# Dogs vs Cats Image Classification

In this notebook, we'll explore some options to implement a model able to classify images of cats and dogs

In [28]:
import os
import random as rd
import shutil
import tensorflow as tf

In [3]:
# Basic initialization
rd.seed(9)
tf.random.set_seed(9)
CAT_AND_DOGS_DATA_URL: str = "https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"

## Data loading

To build our image classifier, we begin by downloading the dataset. The dataset we are using is a filtered version of <a href="https://www.kaggle.com/c/dogs-vs-cats/data" target="_blank">Dogs vs. Cats</a> dataset from Kaggle (ultimately, this dataset is provided by Microsoft Research).

In [23]:
data_folder_name: str = "cats_and_dogs_filtered"
zip_dir: str = tf.keras.utils.get_file(
    fname=f"{data_folder_name}.zip",
    origin=CAT_AND_DOGS_DATA_URL,
    extract=True # Extract the content from the zip file in the .keras folder
)

data_folder_path: str = f"{os.getcwd()}/{data_folder_name}"
# Check if the data already exists in the current directory
if not os.path.isdir(data_folder_path):
    # Copy the data from the .keras folder to the current folder
    shutil.copytree(src=f"{os.path.dirname(zip_dir)}/{data_folder_name}", dst=f"{os.getcwd()}/{data_folder_name}")

train_dog_dir: str = f"{data_folder_path}/train/dogs"
train_cat_dir: str = f"{data_folder_path}/train/cats"
validation_dog_dir: str = f"{data_folder_path}/validation/dogs"
validation_cat_dir: str = f"{data_folder_path}/validation/cats"

print(f"Number of training images for dogs: {len(os.listdir(train_dog_dir))}")
print(f"Number of training images for cats: {len(os.listdir(train_cat_dir))}")
print(f"Number of validation images for dogs: {len(os.listdir(validation_dog_dir))}")
print(f"Number of validation images for cats: {len(os.listdir(validation_cat_dir))}")

Number of training images for dogs: 1000
Number of training images for cats: 1000
Number of validation images for dogs: 500
Number of validation images for cats: 500


The data provided in the dataset only have the train and validation datasets, but not the test one. To solve this, we'll split the validation set to create a test set.

In [33]:
nb_test_dog_images: int = int(len(os.listdir(validation_dog_dir))/2)
nb_test_cat_images: int = int(len(os.listdir(validation_cat_dir))/2)

test_folder_path: str = f"{os.getcwd()}/{data_folder_name}/test"
test_dog_dir: str = f"{test_folder_path}/dogs"
test_cat_dir: str = f"{test_folder_path}/cats"
# Check if the test directory already exists
if not os.path.isdir(test_folder_path):
    os.mkdir(test_folder_path)

    # Pick random files from the validation set to move to the test set
    os.mkdir(test_dog_dir)
    for i in range(nb_test_dog_images):
        shutil.move(src=f"{validation_dog_dir}/{rd.choice(os.listdir(validation_dog_dir))}", dst=test_dog_dir)
    os.mkdir(test_cat_dir)
    for i in range(nb_test_cat_images):
        shutil.move(src=f"{validation_cat_dir}/{rd.choice(os.listdir(validation_cat_dir))}", dst=test_cat_dir)

print(f"Number of training images for dogs: {len(os.listdir(train_dog_dir))}")
print(f"Number of training images for cats: {len(os.listdir(train_cat_dir))}")
print(f"Number of validation images for dogs: {len(os.listdir(validation_dog_dir))}")
print(f"Number of validation images for cats: {len(os.listdir(validation_cat_dir))}")
print(f"Number of test images for dogs: {len(os.listdir(test_dog_dir))}")
print(f"Number of test images for cats: {len(os.listdir(test_cat_dir))}")

Number of training images for dogs: 1000
Number of training images for cats: 1000
Number of validation images for dogs: 250
Number of validation images for cats: 250
Number of validation images for dogs: 250
Number of validation images for cats: 250


In [31]:
f"{validation_dog_dir}/{rd.choice(os.listdir(validation_dog_dir))}"

'/home/luiky/Documents/repositories/dogs_vs_cats_api/cats_and_dogs_filtered/validation/dogs/dog.2225.jpg'