# Why Convolutional Neural Networks 

* Densely connected layers learn global patterns from their input feature space
* Convolution layers learn local patterns 

* The patterns they learn are transational invariant 
> Like the pattern they might learn from the lower right half of an image, same they ll be able to recognize from another corner in another image. Dense would have to learn both, making conv nets data efficient. Visual world is fundamentally transational invaraiant
* Can learn spatial hierarchies of patterns 
> First layer might learn eyes, ears, nose. Second will learn the in depth and intricate features of each part and so on. Allowing convnets to learn increasingly complex and abstract features. World is inherently spatially hierarchial.

* 3D Tensors -> Feature Maps
* Two Spatial axes (width and height) and one Depth Axis (color channel)
* Convolution operation extracts patches from input feature and applies same transformation to all of these patches -> output feature map
* The output feature maps depth no longer stands for the color channels rather stand for _Filters_
* Filters encode specific aspects of the input data at a high level -> " presenece of a face in the input "
* (28, 28 ,1) ---> (26, 26, 32) = Went from a feature map of size (28, 28, 1) to a feature map of size (26, 26, 32)
* Each of the 32 output channels contains a 26x26 grid of values. Each of them are a response map of the filter over that input.
* So every dimension in the depth map (32) is a feature (or a filter)
* A filter map at depth "n" -> [:, :, n]

* Keras Conv2d -> (output depth, (window_height, window_width))
* A window slides over the input feature map to produce a 1D Tensor of shape (depth,) which is then spatially patched together into a 3D output map of shape (height, width, output_depth)
* Padding - Adds an extra approximate same number of rows and columns to either side to preserve spatial dimension
* Stride - Distance between two succesive windows over which the filter goes on the input matrix. Normally its 1, higher stride values downsample the feature map.
* Max Pooling - Downsamples feature maps. Instead of locally transforming the patches via a learned linear transformation like convolution they are transformed via a hardcoded max tensor operation. Usually 2x2 windows stride 2
* Conv2d usually 3x3 window stride 1
* We downsample the feature maps to reduce the number of parameters and spatial hierachies and patterns the successive layers or final dense layers would have to learn for each class or feature. 

# Convnet Scratch

In [1]:
#Dataset taken from https://www.kaggle.com/datasets/carlosrunner/pizza-not-pizza

import os, shutil

print("Number of pizza images are ", len(os.listdir("pizza_not_pizza\\pizza")))
print("Number of not pizza images are ", len(os.listdir("pizza_not_pizza\\not_pizza")))

Number of pizza images are  983
Number of not pizza images are  983


In [17]:
#Creating train, val, test dataset

original_dataset_dir = "pizza_not_pizza"
base_dir = "dataset"
os.mkdir(base_dir)

train_dir = os.path.join(base_dir, "train")
os.mkdir(train_dir)

test_dir = os.path.join(base_dir, "test")
os.mkdir(test_dir)

val_dir = os.path.join(base_dir, "val")
os.mkdir(val_dir)

train_pizza_images_dir = os.path.join(train_dir, "pizza")
os.mkdir(train_pizza_images_dir)
train_not_pizza_images_dir = os.path.join(train_dir, "not_pizza")
os.mkdir(train_not_pizza_images_dir)

test_pizza_images_dir = os.path.join(test_dir, "pizza")
os.mkdir(test_pizza_images_dir)
test_not_pizza_images_dir = os.path.join(test_dir, "not_pizza")
os.mkdir(test_not_pizza_images_dir)

val_pizza_images_dir = os.path.join(val_dir, "pizza")
os.mkdir(val_pizza_images_dir)
val_not_pizza_images_dir = os.path.join(val_dir, "not_pizza")
os.mkdir(val_not_pizza_images_dir)

In [18]:
from tqdm.notebook import tqdm
import random

pizza_images_list = os.listdir("pizza_not_pizza\\pizza")
random.shuffle(pizza_images_list)
not_pizza_images_list = os.listdir("pizza_not_pizza\\not_pizza")
random.shuffle(not_pizza_images_list)

train_len = 500
val_len = 800
test_len = 983

In [19]:
for index, file in tqdm(enumerate(pizza_images_list)):
    if(index < train_len):
        dest_dir = os.path.join(train_pizza_images_dir, file)
        source_dir = os.path.join(original_dataset_dir, "pizza", file)
        shutil.copy(source_dir, dest_dir)

    elif(index >= train_len and index < val_len):
        dest_dir = os.path.join(val_pizza_images_dir, file)
        source_dir = os.path.join(original_dataset_dir, "pizza", file)
        shutil.copy(source_dir, dest_dir)

    elif(index >= val_len and index < test_len):
        dest_dir = os.path.join(test_pizza_images_dir, file)
        source_dir = os.path.join(original_dataset_dir, "pizza", file)  
        shutil.copy(source_dir, dest_dir)

print("Training Pizza Images are ", len(os.listdir(train_pizza_images_dir)))
print("Testing pizza images are", len(os.listdir(test_pizza_images_dir)))
print("Validation pizza images are", len(os.listdir(val_pizza_images_dir)))

0it [00:00, ?it/s]

Training Pizza Images are  500
Testing pizza images are 183
Validation pizza images are 300


In [20]:
for index, file in tqdm(enumerate(not_pizza_images_list)):
    if(index < train_len):
        dest_dir = os.path.join(train_not_pizza_images_dir, file)
        source_dir = os.path.join(original_dataset_dir, "not_pizza", file)
        shutil.copy(source_dir, dest_dir)

    elif(index >= train_len and index < val_len):
        dest_dir = os.path.join(val_not_pizza_images_dir, file)
        source_dir = os.path.join(original_dataset_dir, "not_pizza", file)
        shutil.copy(source_dir, dest_dir)

    elif(index >= val_len and index < test_len):
        dest_dir = os.path.join(test_not_pizza_images_dir, file)
        source_dir = os.path.join(original_dataset_dir, "not_pizza", file)  
        shutil.copy(source_dir, dest_dir)

print("Training Non Pizza Images are ", len(os.listdir(train_not_pizza_images_dir)))
print("Testing Non pizza images are", len(os.listdir(test_not_pizza_images_dir)))
print("Validation Non pizza images are", len(os.listdir(val_not_pizza_images_dir)))

0it [00:00, ?it/s]

Training Non Pizza Images are  500
Testing Non pizza images are 183
Validation Non pizza images are 300


982