## Question 2 Problem (B)

### STEP 1: Manual 3×3 Convolution (from scratch) + Verification vs nn.Conv2d


We will first implement:

manual_conv2d(x, weight, bias, padding=1, stride=1)

Required shapes:

x: (B, Cin, H, W)

weight: (Cout, Cin, 3, 3)

bias: (Cout, )

output y: (B, Cout, Hout, Wout)

Then we will verify correctness:

Create nn.Conv2d(Cin, Cout, kernel_size=3, padding=1, stride=1, bias=True)

Random input x

ytorch = conv(x)

Extract parameters and compute ymanual

Print max_abs_diff and assert < 1e-5

In [1]:
# ==========================================================
# Problem 2(b) - STEP 1: Manual 3x3 Conv2D + Verification
# ==========================================================

import torch
import torch.nn as nn

def manual_conv2d(x, weight, bias=None, padding=1, stride=1):
    """
    Manual 2D convolution with 3x3 kernel using clear nested loops.
    Supports:
      - x:      (B, Cin, H, W)
      - weight: (Cout, Cin, 3, 3)
      - bias:   (Cout,) or None
      - padding=1, stride=1 (required)
    Returns:
      - y: (B, Cout, Hout, Wout)
    """
    assert stride == 1, "This manual implementation is required to support stride=1."
    assert padding == 1, "This manual implementation is required to support padding=1."
    assert weight.shape[2:] == (3, 3), "Kernel must be 3x3."

    B, Cin, H, W = x.shape
    Cout, Cin_w, Kh, Kw = weight.shape
    assert Cin == Cin_w, "Cin mismatch between x and weight."

    # Pad input: (B, Cin, H+2p, W+2p)
    x_pad = torch.nn.functional.pad(x, (padding, padding, padding, padding))  # (left,right,top,bottom)

    # Output spatial dims for stride=1, padding=1, kernel=3:
    # Hout = H, Wout = W
    Hout = (H + 2*padding - Kh)//stride + 1
    Wout = (W + 2*padding - Kw)//stride + 1

    # Allocate output
    y = torch.zeros((B, Cout, Hout, Wout), dtype=x.dtype)

    # Nested loops: readability > speed
    for b in range(B):
        for cout in range(Cout):
            for i in range(Hout):
                for j in range(Wout):
                    acc = 0.0
                    for cin in range(Cin):
                        for u in range(3):
                            for v in range(3):
                                acc += float(weight[cout, cin, u, v]) * float(x_pad[b, cin, i+u, j+v])
                    if bias is not None:
                        acc += float(bias[cout])
                    y[b, cout, i, j] = acc

    return y


# -------------------------
# Verification / Sanity Test
# -------------------------

# Reproducibility
SEED = 42
torch.manual_seed(SEED)
print(f"[INFO] Random Seed = {SEED}")

# Choose small sizes for quick verification
B, Cin, Cout, H, W = 2, 3, 4, 8, 8

# Create conv layer
conv = nn.Conv2d(Cin, Cout, kernel_size=3, padding=1, stride=1, bias=True)

# Random input
x = torch.randn(B, Cin, H, W)

# PyTorch output
ytorch = conv(x)

# Extract parameters
w = conv.weight.detach().clone()
b = conv.bias.detach().clone()

# Manual output
ymanual = manual_conv2d(x, w, b, padding=1, stride=1)

# Compare
max_abs_diff = (ytorch.detach() - ymanual).abs().max().item()
print(f"max abs diff = {max_abs_diff:.2e}", "(PASS)" if max_abs_diff < 1e-5 else "(FAIL)")

# Required assertion
assert max_abs_diff < 1e-5, "Manual conv does NOT match nn.Conv2d within tolerance!"


[INFO] Random Seed = 42
max abs diff = 2.38e-07 (PASS)


### STEP 2: Define CNN-Base-BN (required architecture) + shape sanity check


Now we build the required CNN exactly:

Conv(1→16, 3×3, padding=1) → BN → ReLU

Conv(16→32, 3×3, padding=1) → BN → ReLU → MaxPool(2×2)

Conv(32→64, 3×3, padding=1) → BN → ReLU → MaxPool(2×2)

Flatten → Linear(64×7×7 → 10)

We’ll also do a shape sanity check with one batch:

input: (B,1,28,28)

after pool1: 14×14

after pool2: 7×7

final logits: (B,10)

In [2]:
# ==========================================================
# Problem 2(b) - STEP 2 (Fixed): MNIST Setup + Shape Check
# ==========================================================

import torch
import random
import numpy as np
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# ---- Reproducibility ----
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
print("[INFO] Random Seed =", SEED)

# ---- Dataset ----
transform = transforms.ToTensor()  # Normalizes to [0,1]

train_ds = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
test_ds  = datasets.MNIST(root="./data", train=False, download=True, transform=transform)

BATCH_SIZE = 128

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
test_loader  = DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=False)

print("[INFO] Train size:", len(train_ds))
print("[INFO] Test size:", len(test_ds))

# ---- CNN Definition (again for safety) ----
import torch.nn as nn
import torch.nn.functional as F

class CNN_Base_BN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.bn1   = nn.BatchNorm2d(16)

        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.bn2   = nn.BatchNorm2d(32)

        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn3   = nn.BatchNorm2d(64)

        self.pool = nn.MaxPool2d(2,2)
        self.fc   = nn.Linear(64*7*7, 10)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.pool(x)

        x = F.relu(self.bn3(self.conv3(x)))
        x = self.pool(x)

        x = x.view(x.size(0), -1)
        return self.fc(x)

# ---- Shape sanity check ----
model = CNN_Base_BN()

x_batch, y_batch = next(iter(train_loader))
logits = model(x_batch)

print("[SANITY] input shape:", x_batch.shape)
print("[SANITY] logits shape:", logits.shape)
print("[SANITY] labels shape:", y_batch.shape)


[INFO] Random Seed = 42


100%|██████████| 9.91M/9.91M [00:00<00:00, 32.0MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.11MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 9.91MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 11.5MB/s]


[INFO] Train size: 60000
[INFO] Test size: 10000
[SANITY] input shape: torch.Size([128, 1, 28, 28])
[SANITY] logits shape: torch.Size([128, 10])
[SANITY] labels shape: torch.Size([128])


### STEP 3: Train CNN-Base-BN (Cross-Entropy) with proper BatchNorm train/eval + logging


Now we train the CNN with:

Loss: nn.CrossEntropyLoss() (softmax + CE internally)

Optimizer: Adam

Epochs: 5

Logging each epoch:

train loss

train accuracy

test accuracy

Critical requirement: using

model.train() in training loop

model.eval() in evaluation loop

In [3]:
# ==========================================================
# Problem 2(b) - STEP 3: Train CNN-Base-BN with Logging
# ==========================================================

import torch
import torch.nn as nn
import torch.optim as optim
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("[INFO] device:", device)

model = CNN_Base_BN().to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

EPOCHS = 5

def accuracy_from_logits(logits, y):
    preds = torch.argmax(logits, dim=1)
    return (preds == y).float().mean().item()

for epoch in range(1, EPOCHS + 1):
    start_t = time.time()

    # --------------------
    # TRAIN (BN uses batch stats)
    # --------------------
    model.train()
    train_loss_sum, train_acc_sum, n_batches = 0.0, 0.0, 0

    for x, y in train_loader:
        x, y = x.to(device), y.to(device)

        optimizer.zero_grad()
        logits = model(x)
        loss = criterion(logits, y)

        loss.backward()
        optimizer.step()

        train_loss_sum += loss.item()
        train_acc_sum += accuracy_from_logits(logits, y)
        n_batches += 1

    avg_train_loss = train_loss_sum / n_batches
    avg_train_acc = train_acc_sum / n_batches

    # --------------------
    # EVAL (BN uses running estimates)
    # --------------------
    model.eval()
    test_acc_sum, test_batches = 0.0, 0

    with torch.no_grad():
        for x, y in test_loader:
            x, y = x.to(device), y.to(device)
            logits = model(x)
            test_acc_sum += accuracy_from_logits(logits, y)
            test_batches += 1

    avg_test_acc = test_acc_sum / test_batches
    epoch_time = time.time() - start_t

    print(f"[EPOCH {epoch}] train_loss={avg_train_loss:.4f} | "
          f"train_acc={avg_train_acc*100:.2f}% | "
          f"test_acc={avg_test_acc*100:.2f}% | "
          f"time={epoch_time:.2f}s")


[INFO] device: cpu
[EPOCH 1] train_loss=0.1400 | train_acc=95.72% | test_acc=98.44% | time=123.21s
[EPOCH 2] train_loss=0.0450 | train_acc=98.64% | test_acc=97.92% | time=129.99s
[EPOCH 3] train_loss=0.0320 | train_acc=98.98% | test_acc=98.66% | time=129.67s
[EPOCH 4] train_loss=0.0291 | train_acc=99.07% | test_acc=98.99% | time=128.58s
[EPOCH 5] train_loss=0.0241 | train_acc=99.23% | test_acc=99.01% | time=132.13s


In [4]:
# ==========================================================
# Homework 1 - Problem 2(b)
# Manual 3x3 Convolution Verification + CNN-Base-BN Training
# ==========================================================

import time
import random
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader


# -------------------------
# 0) Reproducibility + Data
# -------------------------
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

print(f"[INFO] Random Seed = {SEED}")

transform = transforms.ToTensor()  # scales MNIST pixels to [0,1]

train_ds = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
test_ds  = datasets.MNIST(root="./data", train=False, download=True, transform=transform)

BATCH_SIZE = 128
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
test_loader  = DataLoader(test_ds,  batch_size=BATCH_SIZE, shuffle=False)

print(f"[INFO] Train size: {len(train_ds)} | Test size: {len(test_ds)} | Batch size: {BATCH_SIZE}")


# -----------------------------------------
# 1) Part A: Manual 3x3 Convolution Function
# -----------------------------------------
def manual_conv2d(x, weight, bias=None, padding=1, stride=1):
    """
    Manual 2D convolution for 3x3 kernels using explicit loops (CPU).
    Shapes:
      x:      (B, Cin, H, W)
      weight: (Cout, Cin, 3, 3)
      bias:   (Cout,) or None
    Required: padding=1, stride=1
    """
    assert stride == 1, "Required: stride=1"
    assert padding == 1, "Required: padding=1"
    assert weight.shape[2:] == (3, 3), "Kernel must be 3x3"

    B, Cin, H, W = x.shape
    Cout, Cin_w, Kh, Kw = weight.shape
    assert Cin == Cin_w, "Cin mismatch between x and weight"

    # Pad input: (left,right,top,bottom)
    x_pad = F.pad(x, (padding, padding, padding, padding))

    # Output dims for stride=1, padding=1, kernel=3 -> Hout=H, Wout=W
    Hout = (H + 2 * padding - Kh) // stride + 1
    Wout = (W + 2 * padding - Kw) // stride + 1

    y = torch.zeros((B, Cout, Hout, Wout), dtype=x.dtype)

    # Sliding window loops
    for b in range(B):
        for cout in range(Cout):
            for i in range(Hout):
                for j in range(Wout):
                    acc = 0.0
                    for cin in range(Cin):
                        for u in range(3):
                            for v in range(3):
                                acc += float(weight[cout, cin, u, v]) * float(x_pad[b, cin, i + u, j + v])
                    if bias is not None:
                        acc += float(bias[cout])
                    y[b, cout, i, j] = acc
    return y


# -------------------------------------------
# 2) Part A Verification vs nn.Conv2d (PASS)
# -------------------------------------------
B, Cin, Cout, H, W = 2, 3, 4, 8, 8
conv = nn.Conv2d(Cin, Cout, kernel_size=3, padding=1, stride=1, bias=True)

x = torch.randn(B, Cin, H, W)
ytorch = conv(x)

w = conv.weight.detach().clone()
b = conv.bias.detach().clone()

ymanual = manual_conv2d(x, w, b, padding=1, stride=1)

max_abs_diff = (ytorch.detach() - ymanual).abs().max().item()
print(f"max abs diff = {max_abs_diff:.2e}", "(PASS)" if max_abs_diff < 1e-5 else "(FAIL)")
assert max_abs_diff < 1e-5, "Manual conv does NOT match nn.Conv2d within tolerance!"


# ---------------------------------------
# 3) Part B: CNN-Base-BN (Required Model)
# ---------------------------------------
class CNN_Base_BN(nn.Module):
    """
    Required block pattern: Conv(3x3) -> BatchNorm2d -> ReLU
    Architecture:
      1) Conv(1->16, 3x3, pad=1) -> BN -> ReLU
      2) Conv(16->32, 3x3, pad=1) -> BN -> ReLU -> MaxPool(2)
      3) Conv(32->64, 3x3, pad=1) -> BN -> ReLU -> MaxPool(2)
      4) Flatten -> Linear(64*7*7 -> 10)
    """
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1, stride=1, bias=True)
        self.bn1   = nn.BatchNorm2d(16)

        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1, stride=1, bias=True)
        self.bn2   = nn.BatchNorm2d(32)

        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1, stride=1, bias=True)
        self.bn3   = nn.BatchNorm2d(64)

        self.pool  = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc    = nn.Linear(64 * 7 * 7, num_classes)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))   # 28x28
        x = F.relu(self.bn2(self.conv2(x)))   # 28x28
        x = self.pool(x)                      # 14x14

        x = F.relu(self.bn3(self.conv3(x)))   # 14x14
        x = self.pool(x)                      # 7x7

        x = x.view(x.size(0), -1)             # 64*7*7
        logits = self.fc(x)                   # (B,10)
        return logits


# ---------------------------
# 4) Shape sanity check
# ---------------------------
model_tmp = CNN_Base_BN()
xb, yb = next(iter(train_loader))
out = model_tmp(xb)
print("[SANITY] input shape:", xb.shape)
print("[SANITY] logits shape:", out.shape)
print("[SANITY] labels shape:", yb.shape)


# ---------------------------
# 5) Train / Eval with Logging
# ---------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("[INFO] device:", device)

model = CNN_Base_BN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

EPOCHS = 5

def accuracy_from_logits(logits, y):
    preds = torch.argmax(logits, dim=1)
    return (preds == y).float().mean().item()

for epoch in range(1, EPOCHS + 1):
    start_t = time.time()

    # TRAIN MODE (BN uses batch statistics)
    model.train()
    train_loss_sum, train_acc_sum, n_batches = 0.0, 0.0, 0

    for x, y in train_loader:
        x, y = x.to(device), y.to(device)

        optimizer.zero_grad()
        logits = model(x)
        loss = criterion(logits, y)

        loss.backward()
        optimizer.step()

        train_loss_sum += loss.item()
        train_acc_sum += accuracy_from_logits(logits, y)
        n_batches += 1

    avg_train_loss = train_loss_sum / n_batches
    avg_train_acc = train_acc_sum / n_batches

    # EVAL MODE (BN uses running mean/var)
    model.eval()
    test_acc_sum, test_batches = 0.0, 0

    with torch.no_grad():
        for x, y in test_loader:
            x, y = x.to(device), y.to(device)
            logits = model(x)
            test_acc_sum += accuracy_from_logits(logits, y)
            test_batches += 1

    avg_test_acc = test_acc_sum / test_batches
    epoch_time = time.time() - start_t

    print(f"[EPOCH {epoch}] train_loss={avg_train_loss:.4f} | "
          f"train_acc={avg_train_acc*100:.2f}% | "
          f"test_acc={avg_test_acc*100:.2f}% | "
          f"time={epoch_time:.2f}s")


[INFO] Random Seed = 42
[INFO] Train size: 60000 | Test size: 10000 | Batch size: 128
max abs diff = 2.38e-07 (PASS)
[SANITY] input shape: torch.Size([128, 1, 28, 28])
[SANITY] logits shape: torch.Size([128, 10])
[SANITY] labels shape: torch.Size([128])
[INFO] device: cpu
[EPOCH 1] train_loss=0.1312 | train_acc=96.04% | test_acc=98.43% | time=122.27s
[EPOCH 2] train_loss=0.0451 | train_acc=98.54% | test_acc=98.89% | time=119.38s
[EPOCH 3] train_loss=0.0350 | train_acc=98.90% | test_acc=98.67% | time=120.31s
[EPOCH 4] train_loss=0.0276 | train_acc=99.10% | test_acc=98.67% | time=119.30s
[EPOCH 5] train_loss=0.0218 | train_acc=99.28% | test_acc=98.94% | time=120.09s


### Part B only (CNN-Base-BN training)

MNIST loaders (normalize [0,1], batch 128, seed)

CNN-Base-BN architecture (only 3×3 convs, BN after conv before ReLU)

Train with nn.CrossEntropyLoss()

Adam optimizer

Explicit model.train() and model.eval()

Log per epoch: train loss, train acc, test acc

In [5]:
# ==========================================================
# Problem 2(b) - PART B: CNN-Base-BN Training on MNIST
# ==========================================================

import time
import random
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# -----------------------
# 1) Reproducibility
# -----------------------
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

print(f"[INFO] Random Seed = {SEED}")

# -----------------------
# 2) Dataset + DataLoaders
# -----------------------
transform = transforms.ToTensor()  # scales pixels to [0,1]

train_ds = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
test_ds  = datasets.MNIST(root="./data", train=False, download=True, transform=transform)

BATCH_SIZE = 128
train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
test_loader  = DataLoader(test_ds,  batch_size=BATCH_SIZE, shuffle=False)

print(f"[INFO] Train size = {len(train_ds)} | Test size = {len(test_ds)} | Batch size = {BATCH_SIZE}")

# -----------------------
# 3) CNN-Base-BN Model (Required)
# -----------------------
class CNN_Base_BN(nn.Module):
    """
    Required Block Pattern: Conv(3x3) -> BatchNorm2d -> ReLU

    Architecture:
    1) Conv(1->16, 3x3, padding=1) -> BN -> ReLU
    2) Conv(16->32, 3x3, padding=1) -> BN -> ReLU -> MaxPool(2x2)
    3) Conv(32->64, 3x3, padding=1) -> BN -> ReLU -> MaxPool(2x2)
    4) Flatten -> Linear(64*7*7 -> 10)
    """
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1, stride=1, bias=True)
        self.bn1   = nn.BatchNorm2d(16)

        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1, stride=1, bias=True)
        self.bn2   = nn.BatchNorm2d(32)

        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1, stride=1, bias=True)
        self.bn3   = nn.BatchNorm2d(64)

        self.pool  = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc    = nn.Linear(64 * 7 * 7, num_classes)

    def forward(self, x):
        # Block 1
        x = self.conv1(x)
        x = self.bn1(x)      # BN before ReLU (required)
        x = F.relu(x)

        # Block 2
        x = self.conv2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.pool(x)     # 28x28 -> 14x14

        # Block 3
        x = self.conv3(x)
        x = self.bn3(x)
        x = F.relu(x)
        x = self.pool(x)     # 14x14 -> 7x7

        # Flatten + Linear
        x = x.view(x.size(0), -1)  # (B, 64*7*7)
        logits = self.fc(x)        # (B, 10)
        return logits

# -----------------------
# 4) Device + Loss + Optimizer
# -----------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("[INFO] device:", device)

model = CNN_Base_BN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

EPOCHS = 5
print(f"[INFO] Optimizer=Adam | lr=1e-3 | epochs={EPOCHS}")

def accuracy_from_logits(logits, y):
    preds = torch.argmax(logits, dim=1)
    return (preds == y).float().mean().item()

# -----------------------
# 5) Train + Eval Loop (BN train/eval required)
# -----------------------
for epoch in range(1, EPOCHS + 1):
    start_t = time.time()

    # ---- TRAIN: uses batch statistics + updates running stats ----
    model.train()
    train_loss_sum, train_acc_sum, n_batches = 0.0, 0.0, 0

    for x, y in train_loader:
        x, y = x.to(device), y.to(device)

        optimizer.zero_grad()
        logits = model(x)
        loss = criterion(logits, y)

        loss.backward()
        optimizer.step()

        train_loss_sum += loss.item()
        train_acc_sum += accuracy_from_logits(logits, y)
        n_batches += 1

    avg_train_loss = train_loss_sum / n_batches
    avg_train_acc  = train_acc_sum / n_batches

    # ---- EVAL: uses running mean/var (no updates) ----
    model.eval()
    test_acc_sum, test_batches = 0.0, 0

    with torch.no_grad():
        for x, y in test_loader:
            x, y = x.to(device), y.to(device)
            logits = model(x)
            test_acc_sum += accuracy_from_logits(logits, y)
            test_batches += 1

    avg_test_acc = test_acc_sum / test_batches
    epoch_time = time.time() - start_t

    print(f"[EPOCH {epoch}] train_loss={avg_train_loss:.4f} | "
          f"train_acc={avg_train_acc*100:.2f}% | "
          f"test_acc={avg_test_acc*100:.2f}% | "
          f"time={epoch_time:.2f}s")

# Optional: print final best accuracy estimate from last epoch
print(f"[RESULT] Final test accuracy after {EPOCHS} epochs: {avg_test_acc*100:.2f}%")


[INFO] Random Seed = 42
[INFO] Train size = 60000 | Test size = 10000 | Batch size = 128
[INFO] device: cpu
[INFO] Optimizer=Adam | lr=1e-3 | epochs=5
[EPOCH 1] train_loss=0.1358 | train_acc=95.87% | test_acc=98.70% | time=119.15s
[EPOCH 2] train_loss=0.0436 | train_acc=98.64% | test_acc=98.68% | time=118.15s
[EPOCH 3] train_loss=0.0332 | train_acc=98.95% | test_acc=98.86% | time=119.86s
[EPOCH 4] train_loss=0.0272 | train_acc=99.12% | test_acc=99.04% | time=119.25s
[EPOCH 5] train_loss=0.0212 | train_acc=99.32% | test_acc=98.81% | time=119.91s
[RESULT] Final test accuracy after 5 epochs: 98.81%
