# Buggy CNN — MNIST Classification with Intentional Issues

This notebook implements a CNN with **intentional problems** to test the ML diagnostics system:

- **Over-parameterized layers**: Unnecessarily large hidden layers
- **Redundant/duplicate layers**: Consecutive layers producing correlated outputs
- **Vanishing gradients**: Deep network (8 conv layers) without skip connections
- **Dead neurons**: Some layers initialized poorly, prone to dying ReLU
- **Longer runs**: 20 epochs to trigger early stopping/diminishing returns detection
- **Overfitting**: No dropout or regularization
- **Memory inefficiency**: Keeping unnecessary tensors around

In [None]:
import logging
import os
import sys
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# observer.py lives in the parent directory (neural_network/)
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))
from observer import Observer, ObserverConfig

## Configuration & Hyperparameters

Intentionally problematic settings:
- **20 epochs**: Way more than needed for MNIST (should converge by ~5)
- **High learning rate**: 0.01 can cause instability
- **No weight decay**: Encourages overfitting

In [None]:
batch_size = 64
num_epochs = 20  # BUG: Way too many epochs for MNIST
lr = 0.01  # BUG: High learning rate can cause instability
device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)

seed = 42
torch.manual_seed(seed)

print(f"Device: {device}")

## Observer Setup

In [None]:
observer_config = ObserverConfig(
    track_profiler=True,
    profile_every_n_steps=100,
    track_memory=True,
    track_throughput=True,
    track_loss=True,
    track_console_logs=True,
    track_error_logs=True,
    track_hyperparameters=True,
    track_system_resources=True,
    track_layer_graph=True,
    track_layer_health=True,       # Important for detecting dead neurons, vanishing gradients
    track_sustainability=True,     # Required for redundant layer detection
    track_carbon_emissions=True,   # Track energy/CO2
)

observer = Observer(
    project_id="1",
    run_name="buggy-cnn-mnist",
    config=observer_config,
)

observer.log_hyperparameters({
    "batch_size": batch_size,
    "num_epochs": num_epochs,
    "learning_rate": lr,
    "optimizer": "SGD",  # BUG: SGD without momentum is slower
    "weight_decay": 0,   # BUG: No regularization
    "dataset": "MNIST",
    "seed": seed,
    "device": device,
})

## Dataset

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)),
])

train_dataset = datasets.MNIST("data", train=True, download=True, transform=transform)
test_dataset = datasets.MNIST("data", train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Training samples: {len(train_dataset):,}")
print(f"Test samples:     {len(test_dataset):,}")
print(f"Batches per epoch: {len(train_loader)}")

## Model Definition — Intentionally Buggy

Issues embedded in this architecture:

1. **8 conv layers deep** — prone to vanishing gradients without skip connections
2. **Redundant conv pairs** — conv3 & conv4 have same config, same for conv5 & conv6
3. **Over-parameterized FC layer** — 2048 neurons is overkill for MNIST
4. **Duplicate FC layers** — fc2 & fc3 are redundant (same size)
5. **No dropout** — encourages overfitting
6. **Bad initialization on some layers** — constant initialization causes dead neurons

In [None]:
class BuggyCNN(nn.Module):
    """
    Intentionally problematic CNN for testing diagnostics.
    
    BUGS:
    - Too deep (8 conv layers) for MNIST
    - Redundant identical layer pairs
    - Over-parameterized fully connected layers
    - Poor initialization on some layers
    - No regularization
    """
    
    def __init__(self):
        super().__init__()
        
        # BUG: Too many conv layers for MNIST (vanishing gradients)
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        
        # BUG: Redundant pair - same input/output channels, produce correlated outputs
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(64, 64, kernel_size=3, padding=1)  # DUPLICATE config
        
        self.conv5 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        # BUG: Another redundant pair
        self.conv6 = nn.Conv2d(128, 128, kernel_size=3, padding=1)  # DUPLICATE config
        
        # BUG: Tiny bottleneck followed by large layer - creates gradient issues
        self.conv7 = nn.Conv2d(128, 16, kernel_size=1)  # Squeeze to 16 channels
        self.conv8 = nn.Conv2d(16, 256, kernel_size=3, padding=1)  # Expand to 256
        
        self.pool = nn.MaxPool2d(2)
        
        # BUG: Massively over-parameterized FC layers for MNIST
        # After pooling: 28 -> 14 -> 7 -> 3 (roughly), so 256 * 3 * 3 = 2304
        self.fc1 = nn.Linear(256 * 3 * 3, 2048)  # BUG: Too large
        
        # BUG: Redundant FC layers with same size
        self.fc2 = nn.Linear(2048, 512)
        self.fc3 = nn.Linear(512, 512)  # DUPLICATE - same in/out as identity
        self.fc4 = nn.Linear(512, 512)  # DUPLICATE - another redundant layer
        
        self.fc_out = nn.Linear(512, 10)
        
        # BUG: Bad initialization - some layers get constant init (causes dead neurons)
        self._bad_init()
    
    def _bad_init(self):
        """Intentionally bad initialization for some layers."""
        # Initialize conv7 with very small weights (vanishing gradients)
        nn.init.constant_(self.conv7.weight, 0.001)
        nn.init.zeros_(self.conv7.bias)
        
        # Initialize fc3 and fc4 with near-zero weights (redundant/dead layers)
        nn.init.normal_(self.fc3.weight, mean=0, std=0.001)
        nn.init.zeros_(self.fc3.bias)
        nn.init.normal_(self.fc4.weight, mean=0, std=0.001)
        nn.init.zeros_(self.fc4.bias)
    
    def forward(self, x, targets=None):
        # Block 1: 28x28 -> 14x14
        x = F.relu(self.conv1(x))
        x = self.pool(F.relu(self.conv2(x)))
        
        # Block 2: 14x14 -> 7x7 (redundant conv3, conv4)
        x = F.relu(self.conv3(x))
        x = self.pool(F.relu(self.conv4(x)))  # Redundant
        
        # Block 3: 7x7 -> 3x3 (redundant conv5, conv6)
        x = F.relu(self.conv5(x))
        x = self.pool(F.relu(self.conv6(x)))  # Redundant
        
        # Block 4: bottleneck (causes gradient issues)
        x = F.relu(self.conv7(x))  # Squeeze
        x = F.relu(self.conv8(x))  # Expand
        
        # Flatten
        x = x.view(x.size(0), -1)
        
        # Over-parameterized FC layers
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))  # Redundant
        x = F.relu(self.fc4(x))  # Redundant
        
        logits = self.fc_out(x)
        
        loss = None
        if targets is not None:
            loss = F.cross_entropy(logits, targets)
        
        return logits, loss

In [None]:
model = BuggyCNN().to(device)

num_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {num_params:,}")
print(f"\nThis is MASSIVELY over-parameterized for MNIST!")
print(f"A good MNIST model needs ~50k params, this has {num_params:,}")

observer.register_model(model)

## Training

Using SGD without momentum (slower convergence) and running for 20 epochs (way too many).

In [None]:
@torch.no_grad()
def evaluate(model, loader):
    """Compute average loss and accuracy on a DataLoader."""
    model.eval()
    total_loss = 0.0
    correct = 0
    total = 0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        logits, loss = model(x, y)
        total_loss += loss.item() * x.size(0)
        correct += (logits.argmax(dim=1) == y).sum().item()
        total += x.size(0)
    model.train()
    return total_loss / total, correct / total

In [None]:
# BUG: Using SGD without momentum (slower) and no weight decay (overfitting)
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

# BUG: Intentionally keeping track of losses in a list that grows (memory leak pattern)
all_losses = []  # Memory inefficiency

print(f"Starting training for {num_epochs} epochs...")
print(f"WARNING: This is intentionally buggy and will run longer than necessary!")
training_start = time.time()
global_step = 0

for epoch in range(num_epochs):
    epoch_losses = []
    for step, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)

        if observer.should_profile(global_step):
            logits, loss = observer.profile_step(model, x, y)
            optimizer.step()
            optimizer.zero_grad(set_to_none=True)
        else:
            logits, loss = model(x, y)
            optimizer.zero_grad(set_to_none=True)
            loss.backward()
            optimizer.step()

        # BUG: Storing full loss tensor (not .item()) - memory leak
        epoch_losses.append(loss.detach().cpu())
        observer.step(global_step, loss, batch_size=x.size(0))
        global_step += 1
    
    # BUG: Accumulating all epoch losses (memory growth)
    all_losses.extend(epoch_losses)

    # Validation at end of each epoch
    val_loss, val_acc = evaluate(model, test_loader)
    step_report = observer.flush(val_metrics={
        "val_loss": val_loss,
        "val_acc": val_acc,
    })

    elapsed = time.time() - training_start
    train_loss = step_report['loss']['train_mean']
    
    # Detect overfitting pattern
    overfit_warning = ""
    if val_loss > train_loss * 1.5:
        overfit_warning = " [OVERFITTING!]"
    
    print(
        f"Epoch {epoch:2d}: "
        f"train_loss={train_loss:.4f}  "
        f"val_loss={val_loss:.4f}  val_acc={val_acc:.4f}  "
        f"({elapsed:.1f}s){overfit_warning}"
    )

training_time = time.time() - training_start
print(f"\nTraining completed in {training_time:.2f}s ({training_time/60:.2f} min)")
print(f"Memory used by accumulated losses: {len(all_losses)} tensors")

## Evaluation

In [None]:
test_loss, test_acc = evaluate(model, test_loader)
print(f"Final test loss:     {test_loss:.4f}")
print(f"Final test accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")

print(f"\n" + "="*60)
print("EXPECTED ISSUES FOR DIAGNOSTICS TO DETECT:")
print("="*60)
print("1. Diminishing returns / Early stopping opportunity")
print("2. Over-parameterized layers (fc1 with 2048 neurons)")
print("3. Redundant layers (conv3/conv4, conv5/conv6, fc3/fc4)")
print("4. Vanishing gradients (deep network, bottleneck at conv7)")
print("5. Dead/near-zero weights (fc3, fc4 bad init)")
print("6. Loss plateau (after ~5 epochs)")
print("7. Overfitting (val_loss > train_loss)")
print("8. Memory growth (accumulated losses)")
print("9. CPU-only training (sustainability warning)")
print("="*60)

## Observer Report

In [None]:
report = observer.export(os.path.join("observer_reports", f"{observer.run_id}.json"))

# ── Print summary ──
summary = report["summary"]
print("=" * 60)
print("OBSERVER SUMMARY")
print("=" * 60)
print(f"Total steps recorded:   {summary.get('total_steps', 0)}")
print(f"Total training time:    {summary.get('total_duration_s', 0):.2f}s")

if "loss_trend" in summary:
    lt = summary["loss_trend"]
    print(f"\nLoss trend:")
    print(f"  First interval:  {lt['first']:.4f}")
    print(f"  Last interval:   {lt['last']:.4f}")
    print(f"  Best:            {lt['best']:.4f}")
    print(f"  Improved:        {lt['improved']}")

if "avg_tokens_per_sec" in summary:
    print(f"\nAvg throughput:  {summary['avg_tokens_per_sec']:.0f} tokens/sec")

print("=" * 60)
print(f"Full report saved to: observer_reports/{observer.run_id}.json")
print(f"\nRun diagnostics API: POST /diagnostics/sessions/{{session_id}}/run")

observer.close()

## How to Test Diagnostics

After running this notebook:

1. **Upload the report** to the backend database
2. **Call the diagnostics API**:
   ```bash
   curl -X POST http://localhost:8000/diagnostics/sessions/{session_id}/run
   ```
3. **Expected diagnostics findings**:
   - `sustainability`: Early stopping opportunity, wasted compute
   - `sustainability`: Over-parameterized layers
   - `sustainability`: Redundant layers (correlated outputs)
   - `sustainability`: Vanishing gradients
   - `sustainability`: Dead neurons / near-zero weights
   - `loss`: Plateau detection
   - `loss`: Overfitting warning
   - `memory`: Memory growth
   - `system`: CPU-only training