# Exploring Convolutional Layers: Fashion-MNIST Experiments

## 1. Context and Objective

In this notebook, we will explore the impact of Convolutional Neural Networks (CNNs) on image classification tasks. We will move beyond treating neural networks as black boxes and analyze how architectural choices affect performance.

**Dataset Selected**: Fashion-MNIST
- **Description**: A dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images.
- **Justification**: It serves as a more challenging direct replacement for the original MNIST dataset. While MNIST digits are easy to classify with simple dense networks, Fashion-MNIST requires capturing more complex spatial patterns (textures, shapes of clothing), making it ideal for demonstrating the advantages of inductive bias in CNNs.

**Tasks covers in this notebook**:
1. Dataset Exploration (EDA)
2. Baseline Model (Dense Only)
3. CNN Architecture Design
4. Controlled Experiments (Kernel Size Analysis)
5. Interpretation
6. SageMaker Deployment Code

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers, models, datasets

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

print(f"TensorFlow Version: {tf.__version__}")

## 2. Dataset Exploration (EDA)
We start by loading the dataset and understanding its structure.

In [None]:
# Load Fashion-MNIST dataset
(x_train, y_train), (x_test, y_test) = datasets.fashion_mnist.load_data()

# Class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")
print(f"Number of classes: {len(np.unique(y_train))}")

### Visualization
Let's visualize random samples from each class to understand the complexity of the task.

In [None]:
plt.figure(figsize=(10, 5))
for i in range(10):
    # Find the first image of each class
    idx = np.where(y_train == i)[0][0]
    plt.subplot(2, 5, i+1)
    plt.imshow(x_train[idx], cmap='gray')
    plt.title(class_names[i])
    plt.axis('off')
plt.tight_layout()
plt.show()

### Class Distribution
Checking if the dataset is balanced.

In [None]:
unique, counts = np.unique(y_train, return_counts=True)
plt.bar(class_names, counts)
plt.xticks(rotation=45)
plt.title("Class Distribution in Training Set")
plt.show()

## 3. Preprocessing
Neural networks converge faster on normalized data. We also need to reshape the images to include the channel dimension (H, W, C), which is (28, 28, 1) for grayscale.

In [None]:
# Normalize pixel values to be between 0 and 1
x_train_norm = x_train.astype('float32') / 255.0
x_test_norm = x_test.astype('float32') / 255.0

# Reshape for CNN (batch, height, width, channels)
x_train_cnn = x_train_norm.reshape((-1, 28, 28, 1))
x_test_cnn = x_test_norm.reshape((-1, 28, 28, 1))

print(f"New shape for CNN: {x_train_cnn.shape}")

## 4. Baseline Model (Non-Convolutional)
We implement a simple Multi-Layer Perceptron (MLP) containing only Dense (Fully Connected) layers. This ignores the 2D spatial structure of the image.

In [None]:
def build_baseline_model():
    model = models.Sequential([
        layers.Flatten(input_shape=(28, 28, 1)),
        layers.Dense(128, activation='relu'),
        layers.Dense(64, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

baseline_model = build_baseline_model()
baseline_model.summary()

In [None]:
# Train Baseline
history_baseline = baseline_model.fit(x_train_cnn, y_train, epochs=10, validation_data=(x_test_cnn, y_test), verbose=1)

## 5. CNN Architecture Design
We design a custom CNN. 
**Architecture Decisions:**
- **Conv2D Layers**: To extract spatial features.
- **ReLU Activation**: To introduce non-linearity.
- **MaxPooling**: To provide spatial invariance and reduce dimensionality.
- **Structure**: Conv -> Pool -> Conv -> Pool -> Flatten -> Dense.

In [None]:
def build_cnn_model(kernel_size=(3,3)):
    model = models.Sequential([
        # First Convolutional Block
        layers.Conv2D(32, kernel_size, activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D((2, 2)),
        
        # Second Convolutional Block
        layers.Conv2D(64, kernel_size, activation='relu'),
        layers.MaxPooling2D((2, 2)),
        
        # Classification Head
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

## 6. Controlled Experiments: Kernel Size
We investigate the effect of Kernel Size on the model's performance.
- **Model A**: Kernel Size 3x3 (Standard, captures fine details)
- **Model B**: Kernel Size 5x5 (Larger receptive field, might miss fine details or over-smooth)

We keep everything else (filters, layers, pooling) constant.

In [None]:
# Experiment A: 3x3 Kernels
print("Training CNN with 3x3 Kernels...")
cnn_3x3 = build_cnn_model(kernel_size=(3,3))
cnn_3x3.summary()
history_3x3 = cnn_3x3.fit(x_train_cnn, y_train, epochs=10, validation_data=(x_test_cnn, y_test), verbose=1)

In [None]:
# Experiment B: 5x5 Kernels
print("Training CNN with 5x5 Kernels...")
cnn_5x5 = build_cnn_model(kernel_size=(5,5))
history_5x5 = cnn_5x5.fit(x_train_cnn, y_train, epochs=10, validation_data=(x_test_cnn, y_test), verbose=1)

### Results Comparison

In [None]:
plt.figure(figsize=(12, 4))

# Accuracy Plot
plt.subplot(1, 2, 1)
plt.plot(history_baseline.history['val_accuracy'], label='Baseline (Dense)')
plt.plot(history_3x3.history['val_accuracy'], label='CNN (3x3)')
plt.plot(history_5x5.history['val_accuracy'], label='CNN (5x5)')
plt.title('Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

# Loss Plot
plt.subplot(1, 2, 2)
plt.plot(history_baseline.history['val_loss'], label='Baseline (Dense)')
plt.plot(history_3x3.history['val_loss'], label='CNN (3x3)')
plt.plot(history_5x5.history['val_loss'], label='CNN (5x5)')
plt.title('Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

## 7. Interpretation and Architectural Reasoning

### Why did the CNN outperform the Baseline?
The CNN outperforms the dense baseline because of **Inductive Bias** tailored for images. Dense layers treat every pixel as independent and fully connected to every neuron in the next layer, ignoring the spatial relationship between pixels. CNNs assume that:
1.  **Local Connectivity**: Pixels close to each other form meaningful features (edges, textures).
2.  **Translation Invariance**: A feature (like a button or a corner) is the same regardless of where it appears in the image, thanks to weight sharing in convolution filters.

### Effect of Kernel Size
Comparing 3x3 vs 5x5 kernels helps us understand receptive fields. 
- **3x3** kernels are generally preferred in modern architectures (like VGG) because stacking them increases non-linearity while keeping parameter count low.
- **5x5** kernels capture a larger area in one go but might be efficient for simple patterns (like in LeNet-5). In Fashion-MNIST, where objects are centered and structures are relatively simple, both perform well, but 3x3 often yields slightly better generalization or parameter efficiency.

### When NOT to use Convolution?
Convolution is not appropriate when data **does not have a grid-like topology** or local spatial correlation. For example:
- Tabular data (rows/columns in a spreadsheet).
- Sets of independent measurements.
- Graphs (use Graph Neural Networks).

## 8. Deployment in SageMaker

To deploy this model in AWS SageMaker, we need to wrap the training code into a script `train.py` and use the SageMaker Python SDK.

**Note**: This section requires an active AWS session.

In [None]:
%%writefile train.py
import tensorflow as tf
import os
import argparse

def train(args):
    # Load Data
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
    x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0
    x_test = x_test.reshape((-1, 28, 28, 1)).astype('float32') / 255.0
    
    # Define Model (CNN 3x3)
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    # Train
    print("Starting training...")
    model.fit(x_train, y_train, epochs=args.epochs, batch_size=args.batch_size)
    
    # Save Model (SageMaker expects model in /opt/ml/model)
    model_dir = os.environ.get('SM_MODEL_DIR', '/opt/ml/model')
    model.save(f"{model_dir}/00000001")
    print("Model saved.")

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=32)
    args = parser.parse_args()
    train(args)

In [None]:
import sagemaker
from sagemaker.tensorflow import TensorFlow

# Define Sagemaker Session
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role() # Needs AWS Configuration

# Create Estimator
estimator = TensorFlow(
    entry_point='train.py',
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    framework_version='2.11',
    py_version='py39',
    hyperparameters={'epochs': 10}
)

# Note: Uncomment to run deployment
# estimator.fit()
# predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')