# Building A Convolutional Image Classifier With Keras and Tensorflow

This project revolves around useing modern deep-learning networks to build an image classifier with Keras. We will design our own custom convnet with reusable blocks and perform visual feature extraction. We will also use transfer learning to boost our model and utilzie data augmentation to extend our dataset. 

Before we begin, let's break down the theory behind our project so we get a better understanding. 

The goal of our project is to design a neural network which can "understand" a natural image well-enough to solve the same kinds of problems the human visual system can solve. 

There are many neural networks (eg. RNNs, GNNs, CNNs), all utilized for different purposes and applications in machine learning. For example, Recurrent Neural Networks (RNNs) are great for text based classification tasks. This can include objectives like sentiment analysis. The neural networks that are best for image classification are called convolutional neural networks (CNN or convnet). 

A CNN consists of two parts: a convolutional base and a dense head. 

The base is used to extract the features from an image. WHat does this mean? Each convolutional layer applies filters (small matrices) that detect specific patterns in different part of the image. These filters help break down the image into different levels of abstraction. 
- First layers detect basic features (edges, corners, textures). 
- Middle layers detect more complex structures (shapes, object). 
- Deeper layers recognize high-level features (eg. faces, cats, cars).

Each layer transforms the image into multiple feature maps, which are "filtered versions" of the image, highlighting different aspects. By gradually learning from low-level details to high-level concepts, the CNN builds an abstract understanding of the image inputted allowing the classification to become easier. 

The head now recieves meaninfgul structured information from the base instead of raw data/pixels. This allows it to make an educated guess. 

Now, during training, we want our network to learn two things. 
1. which features to extract from an image 
2. which class goes with what features

CNNs are rarely trained from scratch and a more common approach is to reuse the base of a pretrained model. To the pretrained base, we can then attach an untrained head. In other wrods, we reuse thepart of a network that has already learned to extract features adn attach it to some fresh layers to learn. 

Enough talking, let's get coding! 

## Step 1 - Loading the Data

In [6]:
# Imports
import os, warnings
import matplotlib.pyplot as plt
from matplotlib import gridspec

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory
%pip install kagglehub
import kagglehub

# Reproducibility
def set_seed(seed=31415):
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
set_seed()

# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells

# Download the dataset using KaggleHub API
path = kagglehub.dataset_download("ryanholbrook/car-or-truck")

# Load training and validation sets
ds_train_ = image_dataset_from_directory(
    os.path.join(path, 'train'),
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)
ds_valid_ = image_dataset_from_directory(
    os.path.join(path, 'valid'),
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=False,
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
    ds_train_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
    ds_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)


Collecting kagglehub
  Downloading kagglehub-0.2.9-py3-none-any.whl.metadata (21 kB)
Collecting tqdm (from kagglehub)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Downloading kagglehub-0.2.9-py3-none-any.whl (39 kB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Installing collected packages: tqdm, kagglehub
Successfully installed kagglehub-0.2.9 tqdm-4.67.1
Note: you may need to restart the kernel to use updated packages.


  from .autonotebook import tqdm as notebook_tqdm


Downloading from https://www.kaggle.com/api/v1/datasets/download/ryanholbrook/car-or-truck?dataset_version_number=1...


100%|██████████| 77.7M/77.7M [00:04<00:00, 18.0MB/s]

Extracting model files...





Found 5117 files belonging to 2 classes.


2025-01-29 19:37:28.945972: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Found 5051 files belonging to 2 classes.


## Step 2 - Define Pretrained Base 

The most commonly used dataset for pretraining is ImageNet, a large dataset of many kind of natural images. Keras includes a variety models pretrained on ImageNet in its applications module. The pretrained model we'll use is called VGG16. 

Some information abotu VGG16: 

Architecture:

VGG16 consists of 16 layers in total, which include:
- 13 convolutional layers: These layers apply filters to the input image to detect features like edges, textures, and patterns.
- 3 fully connected layers: After extracting features through convolution, these layers help the network make decisions or classifications based on those features.
It uses a relatively simple and uniform structure where each convolutional layer uses 3x3 filters and 2x2 max-pooling layers. This helps the model focus on learning spatial hierarchies of features in images.

Pretrained on ImageNet:
- VGG16 is commonly used as a pre-trained model that has been trained on a large dataset called ImageNet. ImageNet consists of millions of labeled images across 1000 different categories (e.g., animals, objects, scenes).
Because of this, VGG16 has already learned to extract general features from images (e.g., edges, textures, shapes) and can be used for transfer learning, where you fine-tune it for a specific task (such as classifying new categories of images).

Why use VGG16?
- Transfer Learning: Instead of training a deep neural network from scratch (which can be computationally expensive), you can use VGG16 pretrained on ImageNet. This approach allows you to leverage the features the model has already learned, and you can fine-tune it to your own dataset. This is often faster and requires less data.
Performance: VGG16 has proven to be a very effective model for image classification and is used in many research and real-world applications.

In [8]:
from tensorflow.keras.applications import VGG16

# Load the VGG16 model pre-trained on ImageNet
pretrained_base = VGG16(
    include_top=False,
    weights='imagenet',
    input_shape=(128, 128, 3)
)
pretrained_base.trainable = False

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
