<a href="https://colab.research.google.com/github/rylan-berry/DeepLearningIndependentStudy/blob/main/Chapter9ProblemSet_RylanBerry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
The task is to explore Convolutional Neural Networks (CNNs) by answering conceptual questions, implementing core operations like 2D convolution and pooling from scratch, building and training a simple CNN model on a standard image dataset, visualizing feature maps, and summarizing the findings.

## Conceptual Questions on CNNs

### Subtask:
Formulate a set of conceptual questions covering fundamental aspects of Convolutional Networks, such as the convolution operation, padding, stride, pooling layers (max and average), activation functions, feature maps, and the benefits of CNNs for image processing, drawing from Chapter 9 of the Deep Learning book and the provided video.


## Conceptual Questions on CNNs

1.  **What is the convolution operation in the context of CNNs, and what is its primary purpose?**

     - *The convolution operation takes two functions (matricis in our case) and does element wise multiplication and then summation of those for each case where function two fits in function 1. The purpose is to slowly inform each section of the closest other functions. For example, nearby pixels will try to inform about itself based on the neighbhoring pixels primarily.*

2.  **Explain the role of padding in a convolutional layer. Name two common types of padding and when they might be used.**

    - *Adding extra information to a dataset to make it fit the input (commonly just filled with zeros). It can also serve to keep edge information that would typically only get a few convolutions, but with padding it get's to contribute longer.*
    - *SAME Padding (Zero Padding): Add padding to an image after a convolution to maintain the same size as the input.*
    - *VALID Padding (No Padding): Only takes in the ammount of inputs a layer allows (if an input is too large and needs to be cut down). Dims are smaller after this operation. Ofc used to reduce the size of an image, but can also be used if you want the model to focus on specific features and throw out boundry data.*

3.  **How does stride affect the output dimensions of a convolutional layer?**

    - *It reduces the size since stride determines how many steps each kernel move takes. So increasing it means greater step size and less operations. And less operations means less numbers outputted.*

4.  **Compare and contrast Max Pooling and Average Pooling layers. What is the main purpose of pooling layers in general?**

    - *Pooling goes through an area and decides a value for that area (can overlap like the convolution opperation). Max pooling takes the highest value of that section and outputs it, while Average pooling takes the average of the area.*

5.  **Identify a common activation function used in CNNs and describe its role.**

    - *Rectified Linear Unit (ReLU) is common for CNNs. It introduces nonlinearity and is simple (and avoids gradient problems since it's either 0 or 1).*

6.  **What do feature maps represent in a CNN?**
    - *Feature maps is an overlay for an image, with each pixel getting a percentage on the expectation of what feature it is. And applied across an entire image, a model should be able to decide what in an image is what.*

7.  **What are the key benefits of using CNNs for image processing compared to traditional neural networks (e.g., fully connected networks)?**
    - *This is much simpiler, there are less operations since not all of the inputs effect all of the other inputs, it really only focuses on the neighbors, making CNNs faster to compute. And the architecture is based on how eyes actually identify things, which is pretty cool!!*

## Implement 2D Convolution Operation

### Subtask:
Implement a 2D convolution operation from scratch using libraries like NumPy, including options for padding and stride. This step should involve generating example input data and a kernel, then applying the convolution.


In [1]:
import numpy as np
print("NumPy imported successfully.")

NumPy imported successfully.


In [2]:
from math import ceil

In [3]:
from math import ceil

def convolve2d(input_matrix, kernel_matrix, padding='valid', stride=1):
    input_height, input_width = input_matrix.shape
    kernel_height, kernel_width = kernel_matrix.shape

    pad_top, pad_bottom, pad_left, pad_right = 0, 0, 0, 0

    if padding == 'same':
        # Calculate output dimensions based on 'same' padding logic:
        # output_dim = ceil(input_dim / stride)
        # This is a common interpretation in frameworks like TensorFlow/Keras
        output_height_target = ceil(input_height / stride)
        output_width_target = ceil(input_width / stride)

        # Calculate total padding needed to achieve the target output dimensions
        total_padding_h = max(0, (output_height_target - 1) * stride + kernel_height - input_height)
        total_padding_w = max(0, (output_width_target - 1) * stride + kernel_width - input_width)

        # Distribute padding to top/bottom and left/right. If padding is odd, more goes to bottom/right.
        pad_top = total_padding_h // 2
        pad_bottom = total_padding_h - pad_top
        pad_left = total_padding_w // 2
        pad_right = total_padding_w - pad_left
    elif padding != 'valid':
        print("ERROR: Padding method does not exist")
        return None

    # Apply padding to the input matrix
    padded_input = np.pad(input_matrix, ((pad_top, pad_bottom), (pad_left, pad_right)), mode='constant')

    # Calculate the actual output dimensions after padding and considering stride
    # Formula: (Padded_Input_Dimension - Kernel_Dimension) // Stride + 1
    output_height = (padded_input.shape[0] - kernel_height) // stride + 1
    output_width = (padded_input.shape[1] - kernel_width) // stride + 1

    # Initialize the output matrix with zeros
    output_matrix = np.zeros((output_height, output_width))

    # Perform the convolution operation
    for i in range(output_height):
        for j in range(output_width):
            # Extract the current window (receptive field) from the padded input
            window = padded_input[i * stride : i * stride + kernel_height,
                                  j * stride : j * stride + kernel_width]

            # Perform element-wise multiplication and sum (dot product)
            output_matrix[i, j] = np.sum(window * kernel_matrix)

    return output_matrix

print("convolve2d function defined.")

# Generate sample input data and a kernel
input_data = np.array([
    [1, 1, 1, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 1, 1, 1],
    [0, 0, 1, 1, 0],
    [0, 1, 1, 0, 0]
])

kernel = np.array([
    [1, 0, 1],
    [0, 1, 0],
    [1, 0, 1]
])

print("\nSample Input Data:\n", input_data)
print("\nSample Kernel:\n", kernel)

# Experiment with different padding and stride values
print("\n--- Experimenting with different parameters ---")

# 1. 'valid' padding, stride 1
# Expected output height: (5 - 3) // 1 + 1 = 3
# Expected output width: (5 - 3) // 1 + 1 = 3
output_valid_stride1 = convolve2d(input_data, kernel, padding='valid', stride=1)
print("\nOutput (padding='valid', stride=1):\n", output_valid_stride1)

# 2. 'same' padding, stride 1
# Expected output height: ceil(5/1) = 5
# Expected output width: ceil(5/1) = 5
output_same_stride1 = convolve2d(input_data, kernel, padding='same', stride=1)
print("\nOutput (padding='same', stride=1):\n", output_same_stride1)

# 3. 'valid' padding, stride 2
# Expected output height: (5 - 3) // 2 + 1 = 2
# Expected output width: (5 - 3) // 2 + 1 = 2
output_valid_stride2 = convolve2d(input_data, kernel, padding='valid', stride=2)
print("\nOutput (padding='valid', stride=2):\n", output_valid_stride2)

# 4. 'same' padding, stride 2
# Expected output height: ceil(5/2) = 3
# Expected output width: ceil(5/2) = 3
output_same_stride2 = convolve2d(input_data, kernel, padding='same', stride=2)
print("\nOutput (padding='same', stride=2):\n", output_same_stride2)


convolve2d function defined.

Sample Input Data:
 [[1 1 1 0 0]
 [0 1 1 1 0]
 [0 0 1 1 1]
 [0 0 1 1 0]
 [0 1 1 0 0]]

Sample Kernel:
 [[1 0 1]
 [0 1 0]
 [1 0 1]]

--- Experimenting with different parameters ---

Output (padding='valid', stride=1):
 [[4. 3. 4.]
 [2. 4. 3.]
 [2. 3. 4.]]

Output (padding='same', stride=1):
 [[2. 2. 3. 1. 1.]
 [1. 4. 3. 4. 1.]
 [1. 2. 4. 3. 3.]
 [1. 2. 3. 4. 1.]
 [0. 2. 2. 1. 1.]]

Output (padding='valid', stride=2):
 [[4. 4.]
 [2. 4.]]

Output (padding='same', stride=2):
 [[2. 3. 1.]
 [1. 4. 3.]
 [0. 2. 1.]]


## Implement Pooling Layers

### Subtask:
Implement Max Pooling and Average Pooling layers from scratch. Provide example input data and demonstrate how these layers reduce spatial dimensions.


In [4]:
def max_pooling(input_matrix, pool_size, stride=None):
  in_h, in_w = input_matrix.shape
  p_h, p_w = pool_size
  if stride == None:
    stride = 1
  output_height = (in_h - p_h) // stride + 1
  output_width = (in_w - p_w) // stride + 1
  out = np.zeros((output_height, output_width))
  for i in range(output_height):
    for j in range(output_width):
      window = input_matrix[i*stride : i*stride+p_h, j*stride : j*stride+p_w]
      out[i][j] = np.max(window)
  return out


print("max_pooling function defined.")

max_pooling function defined.


In [5]:
def average_pooling(input_matrix, pool_size, stride=None):
  in_h, in_w = input_matrix.shape
  p_h, p_w = pool_size
  if stride == None:
    stride = 1
  output_height = (in_h - p_h) // stride + 1
  output_width = (in_w - p_w) // stride + 1
  out = np.zeros((output_height, output_width))
  for i in range(output_height):
    for j in range(output_width):
      window = input_matrix[i*stride : i*stride+p_h, j*stride : j*stride+p_w]
      out[i][j] = np.mean(window)
  return out

print("average_pooling function defined.")

average_pooling function defined.


In [6]:
print("\n--- Demonstrating Pooling Operations ---")

# Generate a sample input matrix
pooling_input_data = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16]
])
pool_size = (2, 2)

print("\nOriginal Input Data for Pooling:\n", pooling_input_data)
print("Pool Size:", pool_size)

# Apply Max Pooling
max_pooled_output = max_pooling(pooling_input_data, pool_size)
print("\nMax Pooled Output:\n", max_pooled_output)

# Apply Average Pooling
average_pooled_output = average_pooling(pooling_input_data, pool_size)
print("\nAverage Pooled Output:\n", average_pooled_output)


--- Demonstrating Pooling Operations ---

Original Input Data for Pooling:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
Pool Size: (2, 2)

Max Pooled Output:
 [[ 6.  7.  8.]
 [10. 11. 12.]
 [14. 15. 16.]]

Average Pooled Output:
 [[ 3.5  4.5  5.5]
 [ 7.5  8.5  9.5]
 [11.5 12.5 13.5]]


## Build a Simple CNN Model

Implement CNNs into the NN package and make a new MNIST model. Include a flatten layer to move the CNNs 2d shape to 1d. Use the sample from the answer key version as a basis for model inspiration.

In [7]:
!pip install rb-deeplearning-lib==0.2.9
import rb_deeplearning_lib

Collecting rb-deeplearning-lib==0.2.9
  Downloading rb_deeplearning_lib-0.2.9-py3-none-any.whl.metadata (8.8 kB)
Downloading rb_deeplearning_lib-0.2.9-py3-none-any.whl (13 kB)
Installing collected packages: rb-deeplearning-lib
  Attempting uninstall: rb-deeplearning-lib
    Found existing installation: rb-deeplearning-lib 0.2.8
    Uninstalling rb-deeplearning-lib-0.2.8:
      Successfully uninstalled rb-deeplearning-lib-0.2.8
Successfully installed rb-deeplearning-lib-0.2.9


In [8]:
from rb_deeplearning_lib import Model, mse_loss

In [10]:
from rb_deeplearning_lib import Convo2D

In [11]:
kernel

array([[1, 0, 1],
       [0, 1, 0],
       [1, 0, 1]])

In [12]:
input_data

array([[1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0],
       [0, 0, 1, 1, 1],
       [0, 0, 1, 1, 0],
       [0, 1, 1, 0, 0]])

In [13]:
conTest = Convo2D(kernel,padding='same',stride=2) #using default padding and stride
conTest.params()[0].vals

array([1, 0, 1])

In [14]:
conTest(input_data) == output_same_stride2

array([[[ True,  True,  True],
        [ True,  True,  True],
        [ True,  True,  True]]])

In [15]:
pooling_input_data

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [16]:
pool_size

(2, 2)

In [17]:
poolTest = MaxPooling(pool_size) #using default stride

In [18]:
poolTest(pooling_input_data) == max_pooled_output

array([[[ True,  True,  True],
        [ True,  True,  True],
        [ True,  True,  True]]])

In [19]:
poolTest2 = AvgPooling(pool_size) #using default stride

In [20]:
poolTest2(pooling_input_data) == average_pooled_output

array([[[ True,  True,  True],
        [ True,  True,  True],
        [ True,  True,  True]]])

In [21]:
kernel.shape

(3, 3)

In [22]:
m = Model([Convo2D(kernel),MaxPooling(pool_size)],loss_fn=mse_loss)


In [23]:
x_train = np.random.rand(10, 5, 5) # 10 samples, each 5x5 image
y_train = np.random.randint(0, 2, (10,2,2)) # 10 labels (binary classification example)

x_val = np.random.rand(2, 5, 5) # 2 validation samples, each 5x5 image
y_val = np.random.randint(0, 2, (2,2,2)) # 2 validation labels

In [24]:
y_train.shape

(10, 2, 2)

In [25]:
y_val.shape

(2, 2, 2)

In [26]:
m(x_train).shape

(10, 2, 2)

In [27]:
m(x_val).shape

(2, 2, 2)

In [28]:
m.train(20, x_train, y_train, x_val, y_val, batch_size=1)

#Ok, maybe not a backprop issue, but an update issue. Regular optim works on other NN models but not here, likely an issue with params

epoch: 0 	 loss: 34.68726232446199
ep0: b0/10ep0: b1/10ep0: b2/10ep0: b3/10ep0: b4/10ep0: b5/10ep0: b6/10ep0: b7/10ep0: b8/10ep0: b9/10epoch: 1 	 loss: 0.27791138066704774
epoch: 2 	 loss: 1.1705174391047195
epoch: 3 	 loss: 0.5984476946135048
epoch: 4 	 loss: 0.7314372662331357
epoch: 5 	 loss: 0.3676912589457245
epoch: 6 	 loss: 0.3478450964378932
epoch: 7 	 loss: 0.5546801397186385
epoch: 8 	 loss: 1.012329095422709
epoch: 9 	 loss: 0.5979963242352069
epoch: 10 	 loss: 1.128466386792609
epoch: 11 	 loss: 0.6179974795546153
epoch: 12 	 loss: 0.7052842574186196
epoch: 13 	 loss: 0.7002898687592528
epoch: 14 	 loss: 0.8524257311404697
epoch: 15 	 loss: 0.4696200677217798
epoch: 16 	 loss: 0.7299302468499798
epoch: 17 	 loss: 0.631450615565303
epoch: 18 	 loss: 0.831587707647354
epoch: 19 	 loss: 0.37690052391472284
epoch: 20 	 loss: 0.7429505358522794


In [29]:
import numpy as np
from rb_deeplearning_lib import Model, mse_loss

# Re-initialize the model (Convo2D and MaxPooling are already defined in previous cells)
m = Model([Convo2D(kernel), MaxPooling(pool_size)], loss_fn=mse_loss)

# Re-initialize training data for a fresh start
x_train = np.random.rand(10, 5, 5) # 10 samples, each 5x5 image
y_train = np.random.randint(0, 2, 10) # 10 labels (binary classification example)

print("Model and data re-initialized.")

# 1. Perform a single forward pass with a small batch
# Get a single input and its corresponding target
input_batch = Values(x_train[0:1]) # Wrap input in Values object
target_batch = Values(np.array([y_train[0:1]]).astype(float)) # Wrap target in Values object, ensure float type

# Perform forward pass
output = m(input_batch)

# Calculate loss using the model's loss_fn
loss = m.loss_fn(output, target_batch)

print(f"\nInitial loss after one forward pass: {loss.vals}")

# 2. Call loss.backward() to compute gradients
loss.backward()
print("Backward pass completed.")

# 3. Inspect the gradient of the Convo2D layer's kernel
conv_kernel_grad = m.blocks.arr[0].kernel.grad

print("\n--- Convo2D Kernel Gradient Inspection ---")
print(f"Shape of Convo2D kernel gradient: {conv_kernel_grad.shape}")
print(f"Max value of Convo2D kernel gradient: {np.max(conv_kernel_grad)}")
print(f"Min value of Convo2D kernel gradient: {np.min(conv_kernel_grad)}")
print(f"Mean value of Convo2D kernel gradient: {np.mean(conv_kernel_grad)}")

print(f"Convo2D Params: {m.blocks.arr[0].params()}")
print(f"Seq Params: {m.blocks.params()}")

# Inspect gradients of other layers if any
# In this simple model, only the Convo2D layer has learnable parameters (the kernel).
# The MaxPooling layer does not have learnable parameters, so no grad to inspect directly.


Model and data re-initialized.

Initial loss after one forward pass: 3.1213152097116525
Backward pass completed.

--- Convo2D Kernel Gradient Inspection ---
Shape of Convo2D kernel gradient: (1, 3, 3)
Max value of Convo2D kernel gradient: 27.27228882384024
Min value of Convo2D kernel gradient: 0.7873614957679098
Mean value of Convo2D kernel gradient: 13.946390005474433
Convo2D Params: vals: array([[1, 0, 1],
       [0, 1, 0],
       [1, 0, 1]])
grads: array([[[27.27228882,  0.7873615 , 23.06911026],
        [ 6.74720045,  7.84812259, 24.73351547],
        [13.80482566,  9.41239693, 11.84268837]]])
Seq Params: [vals: array([[1, 0, 1],
       [0, 1, 0],
       [1, 0, 1]])
grads: array([[[27.27228882,  0.7873615 , 23.06911026],
        [ 6.74720045,  7.84812259, 24.73351547],
        [13.80482566,  9.41239693, 11.84268837]]])]
