# Fashion MNIST Classification using PyTorch

This notebook showcases how to build, train, and evaluate an Artificial Neural Network (ANN) from scratch using PyTorch. The goal is to classify fashion items into one of ten categories using the **Fashion MNIST dataset**.

### Project Highlights
- Built an ANN with two hidden layers to classify grayscale images of clothing items.
- Normalized input features for improved training stability.
- Created a custom `Dataset` class and used `DataLoader` for batching and shuffling.
- Trained the model using Stochastic Gradient Descent (SGD) and CrossEntropyLoss.
- Evaluated the model using manually computed accuracy on test data.

> This project was independently implemented by **Feroz Khan** as part of a hands-on deep learning learning journey using PyTorch.

---

**Dataset**: [Fashion MNIST (Kaggle)](https://www.kaggle.com/zalando-research/fashionmnist)  
**Framework**: PyTorch  
**Language**: Python 3  

Update: Added GPU to use full dataset


In [None]:
from google.colab import files
files.upload()

In [None]:
# Importing Essential Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

In [None]:
# Set manual seed for reproducibility
torch.manual_seed(42)

<torch._C.Generator at 0x787d8a640870>

In [None]:
# Check GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'gpu')
print(f"Using Device: {device}")

In [None]:
df = pd.read_csv('/content/fmnist_small.csv')
df.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,9,0,0,0,0,0,0,0,0,0,...,0,7,0,50,205,196,213,165,0,0
1,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,0,...,142,142,142,21,0,3,0,0,0,0
3,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,8,0,0,0,0,0,0,0,0,0,...,213,203,174,151,188,10,0,0,0,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6000 entries, 0 to 5999
Columns: 785 entries, label to pixel784
dtypes: int64(785)
memory usage: 35.9 MB


In [None]:
# Train Test Split
X = df.iloc[:,1:]
y = df.iloc[:,0]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Scale the features for stable training - (Range between 0 and 1)
X_train = X_train / 255.0
X_test = X_test / 255.0

In [None]:
# Create CustomDataset Class
class CustomDataset(Dataset):
  def __init__(self, features, labels):
    # Also converting to tensors
    self.features = torch.tensor(features.values, dtype = torch.float32)
    self.labels = torch.tensor(labels.values, dtype= torch.long)

  def __len__(self):
    return len(self.features)

  def __getitem__(self, index):
    return self.features[index], self.labels[index]

In [None]:
# Create train_dataset and test_dataset objects
train_dataset = CustomDataset(X_train, y_train)
test_dataset = CustomDataset(X_test, y_test)

In [None]:
# Create train and test loader
train_loader = DataLoader(train_dataset, batch_size= 32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size = 32, shuffle=False)  # We don't want to shuffle data during prediction

In [None]:
# Define NN Class
class ANN(nn.Module):
  def __init__(self, num_features):
    super().__init__()

    self.network = nn.Sequential(
        nn.Linear(num_features, 128),
        nn.ReLU(),
        nn.Linear(128, 64),
        nn.ReLU(),
        nn.Linear(64, 10),

    )

  # Forward Pass
  def forward(self, features):
    return self.network(features)

In [None]:
# Set learning rate and epochs
lr = 1e-1
epochs = 100

In [None]:
# Instantiate the model
model = ANN(X_train.shape[1])
# Loss Function
criteria = nn.CrossEntropyLoss()
# Optimizer
optimizer = optim.Adam(model.parameters(), lr = lr)

In [None]:
# Training Loop

for epoch in range(epochs):

  total_epoch_loss = 0

  for batch_features, batch_labels in train_loader:

    # Forward Pass
    y_pred = model(batch_features)

    # Calculate Loss
    loss = criteria(y_pred, batch_labels)

    # Clear gradients
    optimizer.zero_grad()

    # Backpropogation
    loss.backward()

    # Update params
    optimizer.step()

    # Batch Loss
    total_epoch_loss += loss.item()

  print(f"Epoch: {epoch+1} --> Batch Loss: {total_epoch_loss/len(train_loader)}")


Epoch: 1 --> Batch Loss: 0.7174815807739894
Epoch: 2 --> Batch Loss: 0.7537097035845121
Epoch: 3 --> Batch Loss: 0.7309567685921987
Epoch: 4 --> Batch Loss: 0.7046292368570963
Epoch: 5 --> Batch Loss: 0.705610785484314
Epoch: 6 --> Batch Loss: 0.7232549188534418
Epoch: 7 --> Batch Loss: 0.7247406369447709
Epoch: 8 --> Batch Loss: 0.75518454571565
Epoch: 9 --> Batch Loss: 0.7242428803443909
Epoch: 10 --> Batch Loss: 0.7067167170842489
Epoch: 11 --> Batch Loss: 0.6934777277708054
Epoch: 12 --> Batch Loss: 0.7225390124320984
Epoch: 13 --> Batch Loss: 0.7402949102719625
Epoch: 14 --> Batch Loss: 0.7274574979146322
Epoch: 15 --> Batch Loss: 0.6931825445095698
Epoch: 16 --> Batch Loss: 0.7365703785419464
Epoch: 17 --> Batch Loss: 0.7366181371609369
Epoch: 18 --> Batch Loss: 0.7284923322995503
Epoch: 19 --> Batch Loss: 0.7324538131554922
Epoch: 20 --> Batch Loss: 0.7217838974793752
Epoch: 21 --> Batch Loss: 0.749851337770621
Epoch: 22 --> Batch Loss: 0.716709914803505
Epoch: 23 --> Batch Loss

In [None]:
from torchinfo import summary

In [None]:
summary(model, input_size=(32, 784))

Layer (type:depth-idx)                   Output Shape              Param #
ANN                                      [32, 10]                  --
├─Sequential: 1-1                        [32, 10]                  --
│    └─Linear: 2-1                       [32, 128]                 100,480
│    └─ReLU: 2-2                         [32, 128]                 --
│    └─Linear: 2-3                       [32, 64]                  8,256
│    └─ReLU: 2-4                         [32, 64]                  --
│    └─Linear: 2-5                       [32, 10]                  650
│    └─Softmax: 2-6                      [32, 10]                  --
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 3.50
Input size (MB): 0.10
Forward/backward pass size (MB): 0.05
Params size (MB): 0.44
Estimated Total Size (MB): 0.59

In [None]:
# Model Evaluation
model.eval()    # No BP; no neuron droput; batch normalization off - Behavioral changes

total = 0
correct = 0

with torch.no_grad():

  for batch_features, batch_labels in test_loader:
    y_pred = model(batch_features)  # 32 x 10
    y_pred = torch.argmax(y_pred, dim = 1)

    total = total + batch_labels.shape[0]

    correct = correct + (y_pred == batch_labels).sum().item()

print(correct/total)


0.6266666666666667


### Next Steps
- Train on the full dataset (70,000 images) using GPU
- Experiment with optimizers like Adam or RMSprop
- Add regularization techniques like Dropout or Batch Normalization
- Tune model architecture and hyperparameters to push accuracy beyond 90%

> This project was created by **Feroz Khan** to apply foundational concepts of PyTorch in a real-world classification task.



In [None]:
## Let's use GPUs to load more data and train efficiently

In [None]:
from google.colab import files
files.upload()

Saving fashion-mnist_train.csv to fashion-mnist_train.csv
Buffered data was truncated after reaching the output size limit.

In [None]:
# Check GPU
import torch

device = torch.device('gpu' if torch.cuda.is_available() else 'gpu')
print(device)

RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, maia, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu