# Exploring Convolutional Layers Through Data and Experiments
## Fashion-MNIST Case Study

### Context and Motivation
In this project, neural networks are treated as architectural components rather than black boxes.
The goal is to understand how convolutional layers introduce inductive bias that improves learning
on image-based data.

Using the Fashion-MNIST dataset, we compare a baseline fully connected network against a
convolutional neural network (CNN), and perform controlled experiments to analyze the effect
of convolutional design choices.


In [1]:
%pip install numpy matplotlib pandas torch

Collecting numpy
  Downloading numpy-2.4.2-cp314-cp314-win_amd64.whl.metadata (6.6 kB)
Collecting matplotlib
  Downloading matplotlib-3.10.8-cp314-cp314-win_amd64.whl.metadata (52 kB)
Collecting pandas
  Downloading pandas-3.0.0-cp314-cp314-win_amd64.whl.metadata (19 kB)
Collecting torch
  Downloading torch-2.10.0-cp314-cp314-win_amd64.whl.metadata (31 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.3-cp314-cp314-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Downloading cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.61.1-cp314-cp314-win_amd64.whl.metadata (116 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.9-cp314-cp314-win_amd64.whl.metadata (6.4 kB)
Collecting pillow>=8 (from matplotlib)
  Downloading pillow-12.1.0-cp314-cp314-win_amd64.whl.metadata (9.0 kB)
Collecting pyparsing>=3 (from matplotlib)
  Downloading pypar

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'c:\\Users\\Santi\\Documents\\Libros de la universidad\\noveno\\trabajos de TDSE\\Exploring-Convolutional-Layers-Through-Data\\.venv\\Lib\\site-packages\\torch\\include\\ATen\\native\\transformers\\cuda\\mem_eff_attention\\iterators\\predicated_tile_access_iterator_residual_last.h'



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from sklearn.model_selection import train_test_split


In [None]:
# Load processed Fashion-MNIST CSV files
train_df = pd.read_csv("data/processed/fashion-mnist-train.csv")
test_df  = pd.read_csv("data/processed/fashion-mnist-test.csv")

train_df.head()


In [None]:
print("Training set shape:", train_df.shape)
print("Test set shape:", test_df.shape)

train_df["label"].value_counts().sort_index()


- Each sample contains 784 numerical pixel values (28×28 image)
- Labels range from 0 to 9 (10 clothing categories)
- Images are grayscale


In [None]:
def show_samples(df, n=6):
    plt.figure(figsize=(8,3))
    for i in range(n):
        pixels = df.iloc[i, 1:].values.reshape(28, 28)
        label = df.iloc[i, 0]
        plt.subplot(1, n, i+1)
        plt.imshow(pixels, cmap="gray")
        plt.title(f"Label: {label}")
        plt.axis("off")
    plt.show()

show_samples(train_df)


Preprocessing steps:
- Normalize pixel values to [0, 1]
- Reshape data for CNN input
- Convert labels to tensors


In [None]:
X_train = train_df.iloc[:, 1:].values / 255.0
y_train = train_df.iloc[:, 0].values

X_test = test_df.iloc[:, 1:].values / 255.0
y_test = test_df.iloc[:, 0].values

X_train.shape, X_test.shape


In [None]:
class FashionMNISTDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        image = self.X[idx].reshape(1, 28, 28)  # 1 channel
        label = self.y[idx]
        return image, label


In [None]:
train_dataset = FashionMNISTDataset(X_train, y_train)
test_dataset  = FashionMNISTDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader  = DataLoader(test_dataset, batch_size=64)


In [None]:
class BaselineNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.model(x)


In [None]:
baseline_model = BaselineNN()
sum(p.numel() for p in baseline_model.parameters())


In [None]:
def train_model(model, loader, epochs=5):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    history = []

    for epoch in range(epochs):
        total_loss = 0
        for x, y in loader:
            optimizer.zero_grad()
            preds = model(x)
            loss = criterion(preds, y)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        avg_loss = total_loss / len(loader)
        history.append(avg_loss)
        print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")

    return history


In [None]:
baseline_history = train_model(baseline_model, train_loader)


In [None]:
class FashionMNISTCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),

            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 7 * 7, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = self.conv(x)
        x = self.fc(x)
        return x


In [None]:
cnn_model = FashionMNISTCNN()
sum(p.numel() for p in cnn_model.parameters())


In [None]:
cnn_history = train_model(cnn_model, train_loader)


Experiment:
- Compare kernel size 3×3 vs 5×5
- Keep all other parameters fixed


In [None]:
class CNNKernel5(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 14 * 14, 10)
        )

    def forward(self, x):
        return self.fc(self.conv(x))


In [None]:
cnn_k5 = CNNKernel5()
train_model(cnn_k5, train_loader)


### Why did convolutional layers outperform the baseline?
Convolutional layers exploit spatial locality and weight sharing, reducing the number of parameters
while preserving spatial structure.

### What inductive bias does convolution introduce?
Translation invariance and local feature extraction.

### When is convolution not appropriate?
For non-spatial data such as tabular business metrics or symbolic data.


In [None]:
# Save trained CNN model
torch.save(cnn_model.state_dict(), "model.pth")
print("Model saved as model.pth")


This experiment demonstrates that convolutional layers are not merely performance optimizations,
but architectural components that encode domain assumptions.

Understanding these assumptions is critical for designing robust and explainable AI systems
in enterprise environments.
