<div style="  background: linear-gradient(145deg, #0f172a, #1e293b);  border: 4px solid transparent;  border-radius: 14px;  padding: 18px 22px;  margin: 12px 0;  font-size: 26px;  font-weight: 600;  color: #f8fafc;  box-shadow: 0 6px 14px rgba(0,0,0,0.25);  background-clip: padding-box;  position: relative;">  <div style="    position: absolute;    inset: 0;    padding: 4px;    border-radius: 14px;    background: linear-gradient(90deg, #06b6d4, #3b82f6, #8b5cf6);    -webkit-mask:       linear-gradient(#fff 0 0) content-box,       linear-gradient(#fff 0 0);    -webkit-mask-composite: xor;    mask-composite: exclude;    pointer-events: none;  "></div>    <b>GOING DEEPER: IMAGE MODELING WITH KERAS</b>    <br/>  <span style="color:#9ca3af; font-size: 18px; font-weight: 400;">(Deep Learning Architectures, Parameter Counting, and Pooling)</span></div>

## Table of Contents
1. [Building Deep Networks](#section-1)
2. [Why Do We Want Deep Networks?](#section-2)
3. [How Many Parameters?](#section-3)
4. [Reducing Parameters with Pooling](#section-4)
5. [Conclusion](#section-5)

***

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 1. BUILDING DEEP NETWORKS</span><br>

### Introduction to Deep Architectures
In image modeling, we often start with a simple Convolutional Neural Network (CNN). A basic network might consist of a single convolutional layer followed by a flattening step and a dense output layer. However, to capture more complex patterns, we need to "go deeper" by stacking multiple convolutional layers.

### Single Convolutional Layer
The fundamental building block involves a `Conv2D` layer, which extracts features using filters (kernels), followed by `Flatten` to convert the 2D feature maps into a 1D vector, and finally a `Dense` layer for classification.

#### Original Code (From PDF)


In [None]:
model = Sequential()
model.add(Conv2D(10, kernel_size=2, activation='relu',
                 input_shape=(img_rows, img_cols, 1)))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))



### Building a Deeper Network
To build a deeper network, we add more `Conv2D` layers before flattening. This allows the network to process the output of the first convolution, effectively creating a hierarchy of features.

<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> ðŸ’¡ <b>Tip:</b> In the PDF, the code snippet uses <code>padding='equal'</code>. This is likely a typo or pseudocode. In Keras, the standard padding options are <code>'valid'</code> (no padding) or <code>'same'</code> (output size equals input size). We will use <code>'same'</code> in the executable code below. </div>

#### Enhanced Executable Code
Below is a complete, runnable example of building both a shallow and a deep CNN using Keras.



In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten

# Define dummy image dimensions for the example
img_rows, img_cols = 28, 28

# --- 1. Shallow Network ---
print("Building Shallow Network...")
model_shallow = Sequential()
model_shallow.add(Conv2D(10, kernel_size=2, activation='relu', 
                         input_shape=(img_rows, img_cols, 1)))
model_shallow.add(Flatten())
model_shallow.add(Dense(3, activation='softmax'))
model_shallow.summary()

# --- 2. Deep Network ---
print("\nBuilding Deep Network...")
model_deep = Sequential()

# First convolutional layer
model_deep.add(Conv2D(10, kernel_size=2, activation='relu', 
                      input_shape=(img_rows, img_cols, 1), 
                      padding='same')) # Corrected from 'equal' to 'same'

# Second convolutional layer
model_deep.add(Conv2D(10, kernel_size=2, activation='relu'))

model_deep.add(Flatten())
model_deep.add(Dense(3, activation='softmax'))

model_deep.summary()



***

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 2. WHY DO WE WANT DEEP NETWORKS?</span><br>

### Feature Hierarchy
Deep networks are powerful because they learn representations of data in a hierarchical fashion. As data flows through the layers, the network extracts increasingly complex features.

| Layer Depth | Feature Complexity | Visual Examples |
| :--- | :--- | :--- |
| **Early Layers** | Low-level features | Edges, lines, simple textures, diagonal gradients. |
| **Intermediate Layers** | Mid-level features | Patterns, curves, simple shapes (circles, squares), eyes, corners. |
| **Late Layers** | High-level features | Complex objects, faces, buildings, entire distinct entities. |

### Trade-offs
While depth improves the model's ability to understand complex data, it comes with costs:
1.  **Computational Cost:** More layers mean more operations (multiplications and additions), requiring more powerful hardware (GPUs) and longer training times.
2.  **Data Requirements:** Deeper networks generally have more parameters, which increases the risk of overfitting unless significantly more training data is provided.

***

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 3. HOW MANY PARAMETERS?</span><br>

Understanding the number of parameters is crucial for managing model size and computational efficiency. The calculation differs between Dense (Fully Connected) layers and Convolutional layers.

### 3.1 Counting Parameters in Dense Layers

For a Dense layer, every input unit is connected to every output unit.

**Formula:**
$$ Parameters = (N_{in} \times N_{out}) + N_{bias} $$
*   $N_{in}$: Number of input units.
*   $N_{out}$: Number of units in the current layer.
*   $N_{bias}$: One bias term per output unit ($N_{out}$).

#### Example Calculation (From PDF)
Consider a model with:
1.  Input: 784 units
2.  Dense Layer 1: 10 units
3.  Dense Layer 2: 10 units
4.  Output Layer: 3 units

*   **Layer 1:** $784 \times 10 + 10 = 7850$
*   **Layer 2:** $10 \times 10 + 10 = 110$
*   **Output:** $10 \times 3 + 3 = 33$
*   **Total:** $7850 + 110 + 33 = 7993$

#### Executable Code: Dense Parameter Count


In [None]:
model_dense = Sequential()
model_dense.add(Dense(10, activation='relu', input_shape=(784,)))
model_dense.add(Dense(10, activation='relu'))
model_dense.add(Dense(3, activation='softmax'))

# Verify the calculation
model_dense.summary()



### 3.2 Counting Parameters in CNNs

Convolutional layers share weights across the image, making them much more parameter-efficient than Dense layers.

**Formula:**
$$ Parameters = (K_h \times K_w \times C_{in} \times C_{out}) + C_{bias} $$
*   $K_h, K_w$: Kernel height and width.
*   $C_{in}$: Number of input channels (depth).
*   $C_{out}$: Number of output filters (depth).
*   $C_{bias}$: One bias term per output filter ($C_{out}$).

#### Example Calculation (From PDF)
Consider a CNN with:
1.  Input: $28 \times 28 \times 1$
2.  Conv2D Layer 1: 10 filters, kernel size 3.
3.  Conv2D Layer 2: 10 filters, kernel size 3.
4.  Flatten
5.  Dense Output: 3 units.

*   **Conv2D_1:** $3 \times 3 \times 1 \text{ (input)} \times 10 \text{ (filters)} + 10 = 100$
*   **Conv2D_2:** $3 \times 3 \times 10 \text{ (input from prev)} \times 10 \text{ (filters)} + 10 = 910$
*   **Flatten:** 0 parameters (just reshaping).
*   **Dense:** Input size is $28 \times 28 \times 10 = 7840$ (assuming padding='same').
    $7840 \times 3 + 3 = 23,523$
*   **Total:** $100 + 910 + 0 + 23,523 = 24,533$

#### Executable Code: CNN Parameter Count


In [None]:
model_cnn = Sequential()
# Layer 1
model_cnn.add(Conv2D(10, kernel_size=3, activation='relu', 
                     input_shape=(28, 28, 1), padding='same'))
# Layer 2
model_cnn.add(Conv2D(10, kernel_size=3, activation='relu', 
                     padding='same'))
# Flatten
model_cnn.add(Flatten())
# Output
model_cnn.add(Dense(3, activation='softmax'))

# Verify the calculation
model_cnn.summary()



### 3.3 Increasing Units/Filters
Increasing the number of units or filters drastically increases the parameter count, especially in the Dense layers following a Flatten operation.

**Example Comparison:**
If we increase the filters in the CNN example above to 5 (layer 1) and 15 (layer 2):
*   Flatten output becomes: $28 \times 28 \times 15 = 11,760$.
*   Dense layer params: $11,760 \times 3 + 3 = 35,283$.
*   Total params jump significantly.



In [None]:
# High parameter model example
model_large = Sequential()
model_large.add(Conv2D(5, kernel_size=3, activation='relu', 
                       input_shape=(28, 28, 1), padding='same'))
model_large.add(Conv2D(15, kernel_size=3, activation='relu', 
                       padding='same'))
model_large.add(Flatten())
model_large.add(Dense(3, activation='softmax'))

model_large.summary()



***

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 4. REDUCING PARAMETERS WITH POOLING</span><br>

### The Problem
As seen in the previous section, the `Flatten` layer can result in a massive vector if the spatial dimensions ($Height \times Width$) of the feature maps are large. This leads to an explosion of parameters in the subsequent Dense layer.

### The Solution: Max Pooling
Pooling layers reduce the spatial dimensions of the input volume. **Max Pooling** is the most common type. It operates by sliding a window over the input and taking the maximum value within that window.

*   **Effect:** Reduces image size (e.g., halves height and width).
*   **Parameters:** 0 (Pooling has no learnable weights).
*   **Benefit:** Summarizes features and provides translation invariance.

### Manual Implementation (NumPy)
To understand how Max Pooling works, we can implement it manually using NumPy.



In [None]:
import numpy as np

# Create a dummy image (4x6)
im = np.array([
    [1, 3, 2, 4, 1, 0],
    [5, 2, 1, 0, 3, 2],
    [4, 1, 0, 5, 2, 1],
    [2, 6, 3, 1, 4, 0]
])

print("Original Image Shape:", im.shape)
print(im)

# Initialize result array (half the size)
result = np.zeros((im.shape[0]//2, im.shape[1]//2))

# Manual Max Pooling (Window size 2x2, Stride 2)
for ii in range(result.shape[0]):
    for jj in range(result.shape[1]):
        # Extract 2x2 window
        window = im[ii*2 : ii*2+2, jj*2 : jj*2+2]
        # Take max
        result[ii, jj] = np.max(window)

print("\nPooled Image Shape:", result.shape)
print(result)



### Keras Implementation
In Keras, we use the `MaxPool2D` layer.

<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> ðŸ’¡ <b>Tip:</b> Adding pooling layers significantly reduces the number of parameters in the final Dense layer because the input to <code>Flatten</code> is much smaller. </div>

#### Executable Code: CNN with Max Pooling


In [None]:
from tensorflow.keras.layers import MaxPool2D

model_pool = Sequential()

# Layer 1: Conv + Pool
model_pool.add(Conv2D(5, kernel_size=3, activation='relu', 
                      input_shape=(28, 28, 1)))
model_pool.add(MaxPool2D(2)) # Reduces 28x28 -> 14x14 (approx, depending on padding)

# Layer 2: Conv + Pool
model_pool.add(Conv2D(15, kernel_size=3, activation='relu'))
model_pool.add(MaxPool2D(2)) # Reduces further

model_pool.add(Flatten())
model_pool.add(Dense(3, activation='softmax'))

# Observe the reduction in parameters compared to the previous section
model_pool.summary()



**Analysis of Output:**
1.  Notice the `Output Shape` decreases after every `MaxPooling2D` layer.
2.  Notice the `Param #` for pooling layers is 0.
3.  Notice the final `Dense` layer has significantly fewer parameters compared to the model without pooling.

***

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 5. CONCLUSION</span><br>

In this notebook, we explored the mechanics of "going deeper" with Keras image models.

**Key Takeaways:**
1.  **Depth Matters:** Deeper networks allow the model to learn a hierarchy of features, from simple edges to complex objects.
2.  **Parameter Counting:**
    *   **Dense Layers:** Parameters grow multiplicatively with input and output size ($N_{in} \times N_{out}$).
    *   **Conv Layers:** Parameters depend on kernel size and depth, not image size ($K \times K \times C_{in} \times C_{out}$), making them efficient.
3.  **Pooling is Essential:** Max Pooling reduces the spatial dimensions of feature maps. This controls the explosion of parameters in the final dense layers, reduces computational cost, and helps prevent overfitting.

**Next Steps:**
*   Experiment with different kernel sizes (e.g., 5x5 vs 3x3).
*   Try adding `Dropout` layers to further prevent overfitting in deep networks.
*   Apply these architectures to real-world datasets like MNIST or CIFAR-10.
