# Module 20: Debugging PyTorch Models

**Finding and Fixing Issues in Neural Networks**

---

## Objectives

By the end of this notebook, you will:
- Master model inspection techniques
- Debug shape mismatches systematically
- Identify common PyTorch errors and their fixes
- Use hooks for debugging forward/backward passes
- Profile memory and performance
- Handle device (CPU/GPU) issues
- Debug data pipelines

---

In [29]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import sys

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.9.0+cpu
CUDA available: False


---

# Part 1: Model Inspection

---

## 1.1 Viewing Model Architecture

In [30]:
class SampleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        x = self.pool(self.relu(self.bn1(self.conv1(x))))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)  # Flatten
        x = self.dropout(self.relu(self.fc1(x)))
        return self.fc2(x)

model = SampleNet()

# Method 1: Simple print
print("=" * 50)
print("MODEL ARCHITECTURE")
print("=" * 50)
print(model)

MODEL ARCHITECTURE
SampleNet(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=2048, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.5, inplace=False)
)


In [31]:
# Method 2: Named modules (shows hierarchy)
print("\n" + "=" * 50)
print("NAMED MODULES")
print("=" * 50)
for name, module in model.named_modules():
    if name:  # Skip root module
        print(f"{name}: {module.__class__.__name__}")


NAMED MODULES
conv1: Conv2d
bn1: BatchNorm2d
conv2: Conv2d
pool: MaxPool2d
fc1: Linear
fc2: Linear
relu: ReLU
dropout: Dropout


In [32]:
# Method 3: Named parameters with shapes
print("\n" + "=" * 50)
print("PARAMETERS")
print("=" * 50)
total_params = 0
trainable_params = 0

for name, param in model.named_parameters():
    print(f"{name:30} | Shape: {str(list(param.shape)):20} | Trainable: {param.requires_grad}")
    total_params += param.numel()
    if param.requires_grad:
        trainable_params += param.numel()

print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")


PARAMETERS
conv1.weight                   | Shape: [16, 3, 3, 3]        | Trainable: True
conv1.bias                     | Shape: [16]                 | Trainable: True
bn1.weight                     | Shape: [16]                 | Trainable: True
bn1.bias                       | Shape: [16]                 | Trainable: True
conv2.weight                   | Shape: [32, 16, 3, 3]       | Trainable: True
conv2.bias                     | Shape: [32]                 | Trainable: True
fc1.weight                     | Shape: [128, 2048]          | Trainable: True
fc1.bias                       | Shape: [128]                | Trainable: True
fc2.weight                     | Shape: [10, 128]            | Trainable: True
fc2.bias                       | Shape: [10]                 | Trainable: True

Total parameters: 268,682
Trainable parameters: 268,682


In [33]:
# Method 4: Named buffers (non-trainable state like BatchNorm running stats)
print("\n" + "=" * 50)
print("BUFFERS (non-trainable state)")
print("=" * 50)
for name, buffer in model.named_buffers():
    print(f"{name:30} | Shape: {list(buffer.shape)}")


BUFFERS (non-trainable state)
bn1.running_mean               | Shape: [16]
bn1.running_var                | Shape: [16]
bn1.num_batches_tracked        | Shape: []


## 1.2 Accessing Specific Layers

In [34]:
# Access layer by name
print("Access conv1:")
print(model.conv1)
print(f"conv1 weight shape: {model.conv1.weight.shape}")
print(f"conv1 bias shape: {model.conv1.bias.shape}")

# Access using getattr (useful for dynamic access)
layer_name = 'fc1'
layer = getattr(model, layer_name)
print(f"\n{layer_name} layer: {layer}")

Access conv1:
Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv1 weight shape: torch.Size([16, 3, 3, 3])
conv1 bias shape: torch.Size([16])

fc1 layer: Linear(in_features=2048, out_features=128, bias=True)


In [35]:
# Get children (immediate sub-modules)
print("Children (immediate sub-modules):")
for i, child in enumerate(model.children()):
    print(f"{i}: {child.__class__.__name__}")

Children (immediate sub-modules):
0: Conv2d
1: BatchNorm2d
2: Conv2d
3: MaxPool2d
4: Linear
5: Linear
6: ReLU
7: Dropout


## 1.3 Model Summary (like Keras)

In [36]:
def model_summary(model, input_size, batch_size=1):
    """
    Generate a Keras-like model summary.
    """
    def register_hook(module):
        def hook(module, input, output):
            class_name = module.__class__.__name__

            # Get output shape
            if isinstance(output, (list, tuple)):
                output_shape = [list(o.shape) for o in output]
            else:
                output_shape = list(output.shape)

            # Count parameters
            params = sum(p.numel() for p in module.parameters(recurse=False))

            summary.append({
                'name': class_name,
                'output_shape': output_shape,
                'params': params
            })

        if not isinstance(module, nn.Sequential) and \
           not isinstance(module, nn.ModuleList) and \
           module != model:
            hooks.append(module.register_forward_hook(hook))

    summary = []
    hooks = []

    model.apply(register_hook)

    # Create dummy input and run forward pass
    x = torch.zeros(batch_size, *input_size)
    model(x)

    # Remove hooks
    for hook in hooks:
        hook.remove()

    # Print summary
    print("=" * 70)
    print(f"{'Layer':20} {'Output Shape':25} {'Params':>15}")
    print("=" * 70)

    total_params = 0
    for layer in summary:
        print(f"{layer['name']:20} {str(layer['output_shape']):25} {layer['params']:>15,}")
        total_params += layer['params']

    print("=" * 70)
    print(f"Total params: {total_params:,}")
    print("=" * 70)

# Example usage
model_summary(model, input_size=(3, 32, 32))

Layer                Output Shape                       Params
Conv2d               [1, 16, 32, 32]                       448
BatchNorm2d          [1, 16, 32, 32]                        32
ReLU                 [1, 16, 32, 32]                         0
MaxPool2d            [1, 16, 16, 16]                         0
Conv2d               [1, 32, 16, 16]                     4,640
ReLU                 [1, 32, 16, 16]                         0
MaxPool2d            [1, 32, 8, 8]                           0
Linear               [1, 128]                          262,272
ReLU                 [1, 128]                                0
Dropout              [1, 128]                                0
Linear               [1, 10]                             1,290
Total params: 268,682


---

# Part 2: Shape Debugging

---

**The #1 source of PyTorch errors!**

## 2.1 Common Shape Errors

In [37]:
# Error 1: Matrix multiplication dimension mismatch
print("ERROR 1: Matrix dimension mismatch")
print("-" * 40)

try:
    A = torch.randn(3, 4)
    B = torch.randn(5, 6)  # Wrong! Should be (4, n)
    C = A @ B
except RuntimeError as e:
    print(f"Error: {e}")
    print("\nFIX: For A @ B, A.shape[1] must equal B.shape[0]")
    print(f"A.shape = {A.shape}, B.shape = {B.shape}")
    print(f"4 != 5, so this fails!")

ERROR 1: Matrix dimension mismatch
----------------------------------------
Error: mat1 and mat2 shapes cannot be multiplied (3x4 and 5x6)

FIX: For A @ B, A.shape[1] must equal B.shape[0]
A.shape = torch.Size([3, 4]), B.shape = torch.Size([5, 6])
4 != 5, so this fails!


In [38]:
# Error 2: Conv2d expects 4D input
print("\nERROR 2: Wrong input dimensions for Conv2d")
print("-" * 40)

conv = nn.Conv2d(3, 16, 3)

try:
    x = torch.randn(3, 32, 32)  # Missing batch dimension!
    out = conv(x)
except RuntimeError as e:
    print(f"Error: {e}")
    print("\nFIX: Conv2d expects (batch, channels, height, width)")
    print(f"Your input shape: {x.shape}")
    print(f"Should be: (batch_size, 3, 32, 32)")

    # Correct way
    x_correct = x.unsqueeze(0)  # Add batch dimension
    print(f"\nFixed shape: {x_correct.shape}")
    out = conv(x_correct)
    print(f"Output shape: {out.shape}")


ERROR 2: Wrong input dimensions for Conv2d
----------------------------------------


In [39]:
# Error 3: Flatten dimension mismatch with Linear
print("\nERROR 3: Wrong flatten size for Linear layer")
print("-" * 40)

class BrokenNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 16, 3)
        self.fc = nn.Linear(1000, 10)  # Wrong size!

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)  # Flatten
        return self.fc(x)

try:
    broken = BrokenNet()
    x = torch.randn(1, 3, 32, 32)
    out = broken(x)
except RuntimeError as e:
    print(f"Error: {e}")

    # Debug: find the actual flatten size
    conv_out = broken.conv(x)
    flatten_size = conv_out.view(1, -1).size(1)
    print(f"\nDEBUG INFO:")
    print(f"Conv output shape: {conv_out.shape}")
    print(f"Flattened size: {flatten_size}")
    print(f"\nFIX: Change nn.Linear(1000, 10) to nn.Linear({flatten_size}, 10)")


ERROR 3: Wrong flatten size for Linear layer
----------------------------------------
Error: mat1 and mat2 shapes cannot be multiplied (1x14400 and 1000x10)

DEBUG INFO:
Conv output shape: torch.Size([1, 16, 30, 30])
Flattened size: 14400

FIX: Change nn.Linear(1000, 10) to nn.Linear(14400, 10)


## 2.2 Shape Debugging Tools

In [40]:
def shape_debugger(model, input_tensor):
    """
    Track shape changes through each layer.
    """
    shapes = []

    def hook(module, input, output):
        if isinstance(output, torch.Tensor):
            shapes.append({
                'layer': module.__class__.__name__,
                'input_shape': list(input[0].shape) if isinstance(input, tuple) else list(input.shape),
                'output_shape': list(output.shape)
            })

    hooks = []
    for layer in model.modules():
        if layer != model:
            hooks.append(layer.register_forward_hook(hook))

    try:
        with torch.no_grad():
            output = model(input_tensor)
    except Exception as e:
        print(f"Forward pass failed: {e}")
        print("\nShapes before failure:")

    for hook in hooks:
        hook.remove()

    print(f"{'Layer':20} {'Input Shape':25} {'Output Shape':25}")
    print("=" * 70)
    for s in shapes:
        print(f"{s['layer']:20} {str(s['input_shape']):25} {str(s['output_shape']):25}")

    return shapes

# Use it
model = SampleNet()
x = torch.randn(2, 3, 32, 32)
shapes = shape_debugger(model, x)

Layer                Input Shape               Output Shape             
Conv2d               [2, 3, 32, 32]            [2, 16, 32, 32]          
BatchNorm2d          [2, 16, 32, 32]           [2, 16, 32, 32]          
ReLU                 [2, 16, 32, 32]           [2, 16, 32, 32]          
MaxPool2d            [2, 16, 32, 32]           [2, 16, 16, 16]          
Conv2d               [2, 16, 16, 16]           [2, 32, 16, 16]          
ReLU                 [2, 32, 16, 16]           [2, 32, 16, 16]          
MaxPool2d            [2, 32, 16, 16]           [2, 32, 8, 8]            
Linear               [2, 2048]                 [2, 128]                 
ReLU                 [2, 128]                  [2, 128]                 
Dropout              [2, 128]                  [2, 128]                 
Linear               [2, 128]                  [2, 10]                  


In [41]:
# Quick shape check decorator
def debug_shapes(func):
    """
    Decorator to print input/output shapes of forward method.
    """
    def wrapper(self, x, *args, **kwargs):
        print(f"Input shape: {x.shape}")
        output = func(self, x, *args, **kwargs)
        print(f"Output shape: {output.shape}")
        return output
    return wrapper

# Usage: Add @debug_shapes above forward method during debugging

---

# Part 3: Common PyTorch Errors & Fixes

---

In [42]:
# Error 4: Device mismatch
print("ERROR 4: Device mismatch")
print("-" * 40)

if torch.cuda.is_available():
    try:
        model_gpu = nn.Linear(10, 5).cuda()
        x_cpu = torch.randn(2, 10)  # On CPU!
        out = model_gpu(x_cpu)
    except RuntimeError as e:
        print(f"Error: {e}")
        print("\nFIX: Ensure model and data are on the same device")
        print("x = x.to(device)  OR  x = x.cuda()")
else:
    print("CUDA not available - skipping this example")
    print("\nBut the fix is: always use model.to(device) and x.to(device)")

ERROR 4: Device mismatch
----------------------------------------
CUDA not available - skipping this example

But the fix is: always use model.to(device) and x.to(device)


In [43]:
# Error 5: In-place operation on leaf variable
print("\nERROR 5: In-place operation on gradient-required tensor")
print("-" * 40)

try:
    x = torch.randn(3, requires_grad=True)
    x += 1  # In-place operation!
except RuntimeError as e:
    print(f"Error: {e}")
    print("\nFIX: Use out-of-place operations")
    print("Instead of: x += 1")
    print("Use: x = x + 1")


ERROR 5: In-place operation on gradient-required tensor
----------------------------------------
Error: a leaf Variable that requires grad is being used in an in-place operation.

FIX: Use out-of-place operations
Instead of: x += 1
Use: x = x + 1


In [44]:
# Error 6: Forgetting to call optimizer.zero_grad()
print("\nERROR 6: Gradient accumulation (forgetting zero_grad)")
print("-" * 40)

model = nn.Linear(5, 2)
optimizer = optim.SGD(model.parameters(), lr=0.1)
criterion = nn.MSELoss()

x = torch.randn(3, 5)
y = torch.randn(3, 2)

# Wrong: no zero_grad()
for i in range(3):
    loss = criterion(model(x), y)
    loss.backward()
    if i == 0:
        first_grad = model.weight.grad.clone()
    print(f"Iteration {i}: grad mean = {model.weight.grad.mean():.4f}")

print("\nNotice: gradients are ACCUMULATING, not being reset!")
print("FIX: Add optimizer.zero_grad() at the start of each iteration")


ERROR 6: Gradient accumulation (forgetting zero_grad)
----------------------------------------
Iteration 0: grad mean = -0.3570
Iteration 1: grad mean = -0.7141
Iteration 2: grad mean = -1.0711

Notice: gradients are ACCUMULATING, not being reset!
FIX: Add optimizer.zero_grad() at the start of each iteration


In [45]:
# Error 7: Model not in eval mode during inference
print("\nERROR 7: Forgetting model.eval() during inference")
print("-" * 40)

model = nn.Sequential(
    nn.Linear(10, 20),
    nn.Dropout(0.5),  # Behaves differently in train vs eval!
    nn.Linear(20, 5)
)

x = torch.randn(1, 10)

# In training mode
model.train()
outputs_train = [model(x).sum().item() for _ in range(5)]
print(f"Training mode outputs: {outputs_train}")
print("Notice: outputs vary due to dropout!")

# In eval mode
model.eval()
outputs_eval = [model(x).sum().item() for _ in range(5)]
print(f"\nEval mode outputs: {outputs_eval}")
print("Notice: outputs are consistent!")

print("\nFIX: Always use model.eval() before inference and model.train() for training")


ERROR 7: Forgetting model.eval() during inference
----------------------------------------
Training mode outputs: [-0.6422128081321716, -2.0309603214263916, -0.7500288486480713, -1.548919439315796, -0.830426812171936]
Notice: outputs vary due to dropout!

Eval mode outputs: [-1.3059141635894775, -1.3059141635894775, -1.3059141635894775, -1.3059141635894775, -1.3059141635894775]
Notice: outputs are consistent!

FIX: Always use model.eval() before inference and model.train() for training


In [46]:
# Error 8: Softmax + CrossEntropyLoss
print("\nERROR 8: Double softmax (common mistake!)")
print("-" * 40)

# WRONG
class WrongModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 3)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.fc(x)
        return self.softmax(x)  # DON'T DO THIS with CrossEntropyLoss!

print("CrossEntropyLoss ALREADY includes Softmax internally!")
print("If you apply Softmax in your model, you're doing it twice.")
print("\nFIX: Return raw logits from the model")
print("Apply Softmax only during inference for probabilities")


ERROR 8: Double softmax (common mistake!)
----------------------------------------
CrossEntropyLoss ALREADY includes Softmax internally!
If you apply Softmax in your model, you're doing it twice.

FIX: Return raw logits from the model
Apply Softmax only during inference for probabilities


In [47]:
# Error 9: Broadcasting confusion
print("\nERROR 9: Unexpected broadcasting")
print("-" * 40)

a = torch.randn(3, 4)
b = torch.randn(4)  # Will broadcast!

print(f"a.shape: {a.shape}")
print(f"b.shape: {b.shape}")
print(f"(a + b).shape: {(a + b).shape}")

# This can cause silent bugs!
c = torch.randn(3, 1)
d = torch.randn(1, 4)
print(f"\nc.shape: {c.shape}")
print(f"d.shape: {d.shape}")
print(f"(c + d).shape: {(c + d).shape}  # Broadcasted to 3x4!")

print("\nFIX: Be explicit about shapes. Use .unsqueeze() or .expand() to control broadcasting.")


ERROR 9: Unexpected broadcasting
----------------------------------------
a.shape: torch.Size([3, 4])
b.shape: torch.Size([4])
(a + b).shape: torch.Size([3, 4])

c.shape: torch.Size([3, 1])
d.shape: torch.Size([1, 4])
(c + d).shape: torch.Size([3, 4])  # Broadcasted to 3x4!

FIX: Be explicit about shapes. Use .unsqueeze() or .expand() to control broadcasting.


---

# Part 4: Using Hooks for Debugging

---

In [48]:
# Forward hooks: debug activations
class ActivationDebugger:
    def __init__(self, model):
        self.activations = {}
        self.hooks = []

        for name, module in model.named_modules():
            if name:  # Skip root
                hook = module.register_forward_hook(self._get_hook(name))
                self.hooks.append(hook)

    def _get_hook(self, name):
        def hook(module, input, output):
            if isinstance(output, torch.Tensor):
                self.activations[name] = {
                    'shape': output.shape,
                    'mean': output.mean().item(),
                    'std': output.std().item(),
                    'min': output.min().item(),
                    'max': output.max().item(),
                    'has_nan': torch.isnan(output).any().item(),
                    'has_inf': torch.isinf(output).any().item()
                }
        return hook

    def remove_hooks(self):
        for hook in self.hooks:
            hook.remove()

    def print_report(self):
        print(f"{'Layer':20} {'Shape':20} {'Mean':>10} {'Std':>10} {'NaN?':>6} {'Inf?':>6}")
        print("=" * 80)
        for name, stats in self.activations.items():
            print(f"{name:20} {str(list(stats['shape'])):20} {stats['mean']:>10.4f} {stats['std']:>10.4f} {str(stats['has_nan']):>6} {str(stats['has_inf']):>6}")

# Usage
model = SampleNet()
debugger = ActivationDebugger(model)

x = torch.randn(2, 3, 32, 32)
with torch.no_grad():
    out = model(x)

debugger.print_report()
debugger.remove_hooks()

Layer                Shape                      Mean        Std   NaN?   Inf?
conv1                [2, 16, 32, 32]         -0.0246     0.5847  False  False
bn1                  [2, 16, 32, 32]          0.0000     1.0000  False  False
relu                 [2, 128]                 0.1400     0.1968  False  False
pool                 [2, 32, 8, 8]            0.3886     0.4691  False  False
conv2                [2, 32, 16, 16]         -0.1615     0.7085  False  False
fc1                  [2, 128]                 0.0111     0.3366  False  False
dropout              [2, 128]                 0.1287     0.2877  False  False
fc2                  [2, 10]                  0.0044     0.1541  False  False


In [49]:
# Backward hooks: debug gradients
class GradientDebugger:
    def __init__(self, model):
        self.gradients = {}
        self.hooks = []

        for name, module in model.named_modules():
            if name:
                hook = module.register_full_backward_hook(self._get_hook(name))
                self.hooks.append(hook)

    def _get_hook(self, name):
        def hook(module, grad_input, grad_output):
            if grad_output[0] is not None:
                grad = grad_output[0]
                self.gradients[name] = {
                    'shape': grad.shape,
                    'mean': grad.abs().mean().item(),
                    'max': grad.abs().max().item(),
                    'has_nan': torch.isnan(grad).any().item(),
                    'has_inf': torch.isinf(grad).any().item()
                }
        return hook

    def remove_hooks(self):
        for hook in self.hooks:
            hook.remove()

    def print_report(self):
        print(f"{'Layer':20} {'Grad Mean':>12} {'Grad Max':>12} {'NaN?':>6} {'Inf?':>6}")
        print("=" * 60)
        for name, stats in self.gradients.items():
            print(f"{name:20} {stats['mean']:>12.6f} {stats['max']:>12.6f} {str(stats['has_nan']):>6} {str(stats['has_inf']):>6}")

# Usage
model = SampleNet()
grad_debugger = GradientDebugger(model)

x = torch.randn(2, 3, 32, 32)
y = torch.randint(0, 10, (2,))
loss = nn.CrossEntropyLoss()(model(x), y)
loss.backward()

print("\nGradient Flow Analysis:")
grad_debugger.print_report()
grad_debugger.remove_hooks()


Gradient Flow Analysis:
Layer                   Grad Mean     Grad Max   NaN?   Inf?
fc2                      0.089077     0.447347  False  False
dropout                  0.019701     0.047303  False  False
relu                     0.000220     0.004372  False  False
fc1                      0.008683     0.081869  False  False
pool                     0.000878     0.004372  False  False
conv2                    0.000504     0.011987  False  False
bn1                      0.000205     0.004372  False  False
conv1                    0.000385     0.008243  False  False


  loss.backward()


---

# Part 5: Memory Debugging

---

In [50]:
def memory_stats():
    """Print current GPU memory usage."""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1024**2
        reserved = torch.cuda.memory_reserved() / 1024**2
        print(f"GPU Memory: Allocated={allocated:.1f}MB, Reserved={reserved:.1f}MB")
    else:
        print("CUDA not available")

def count_tensors():
    """Count all tensors in memory."""
    import gc
    tensors = []
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                tensors.append((type(obj), obj.size(), obj.device))
        except:
            pass
    return len(tensors)

print(f"Tensors in memory: {count_tensors()}")
memory_stats()

Tensors in memory: 43
CUDA not available


In [51]:
# Common memory leaks
print("Common Memory Leak Patterns:\n")

print("1. Storing tensors in a list during training loop:")
print("   BAD: losses.append(loss)")
print("   GOOD: losses.append(loss.item())\n")

print("2. Not using torch.no_grad() during inference:")
print("   BAD: output = model(x)")
print("   GOOD: with torch.no_grad(): output = model(x)\n")

print("3. Not detaching tensors before numpy conversion:")
print("   BAD: x.numpy()")
print("   GOOD: x.detach().cpu().numpy()\n")

print("4. Keeping references to intermediate activations:")
print("   Use del to explicitly remove")

Common Memory Leak Patterns:

1. Storing tensors in a list during training loop:
   BAD: losses.append(loss)
   GOOD: losses.append(loss.item())

2. Not using torch.no_grad() during inference:
   BAD: output = model(x)
   GOOD: with torch.no_grad(): output = model(x)

3. Not detaching tensors before numpy conversion:
   BAD: x.numpy()
   GOOD: x.detach().cpu().numpy()

4. Keeping references to intermediate activations:
   Use del to explicitly remove


In [52]:
# Memory profiling per layer
def profile_memory(model, input_shape, device='cpu'):
    """
    Profile memory usage per layer.
    """
    model = model.to(device)
    x = torch.randn(*input_shape, device=device)

    memory_usage = []

    def hook(module, input, output):
        # Estimate memory of output tensor
        if isinstance(output, torch.Tensor):
            # 4 bytes per float32
            mem = output.numel() * 4 / 1024**2  # MB
            memory_usage.append((module.__class__.__name__, mem))

    hooks = []
    for module in model.modules():
        if module != model:
            hooks.append(module.register_forward_hook(hook))

    with torch.no_grad():
        _ = model(x)

    for h in hooks:
        h.remove()

    print(f"{'Layer':25} {'Activation Memory (MB)':>25}")
    print("=" * 55)
    total = 0
    for name, mem in memory_usage:
        print(f"{name:25} {mem:>25.4f}")
        total += mem
    print("=" * 55)
    print(f"{'TOTAL':25} {total:>25.4f}")

model = SampleNet()
profile_memory(model, (16, 3, 32, 32))  # Batch of 16

Layer                        Activation Memory (MB)
Conv2d                                       1.0000
BatchNorm2d                                  1.0000
ReLU                                         1.0000
MaxPool2d                                    0.2500
Conv2d                                       0.5000
ReLU                                         0.5000
MaxPool2d                                    0.1250
Linear                                       0.0078
ReLU                                         0.0078
Dropout                                      0.0078
Linear                                       0.0006
TOTAL                                        4.3990


---

# Part 6: Data Pipeline Debugging

---

In [53]:
# Debugging DataLoader
from torch.utils.data import Dataset

class DebuggableDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

# Create sample data
X = torch.randn(100, 3, 32, 32)
y = torch.randint(0, 10, (100,))

dataset = DebuggableDataset(X, y)
loader = DataLoader(dataset, batch_size=16, shuffle=True)

# Debug first batch
print("DataLoader Debug Info:")
print("=" * 50)
print(f"Dataset size: {len(dataset)}")
print(f"Number of batches: {len(loader)}")
print(f"Batch size: {loader.batch_size}")

# Inspect first batch
batch_x, batch_y = next(iter(loader))
print(f"\nFirst batch:")
print(f"  X shape: {batch_x.shape}")
print(f"  y shape: {batch_y.shape}")
print(f"  X dtype: {batch_x.dtype}")
print(f"  y dtype: {batch_y.dtype}")
print(f"  X range: [{batch_x.min():.2f}, {batch_x.max():.2f}]")
print(f"  y values: {batch_y.unique().tolist()}")

DataLoader Debug Info:
Dataset size: 100
Number of batches: 7
Batch size: 16

First batch:
  X shape: torch.Size([16, 3, 32, 32])
  y shape: torch.Size([16])
  X dtype: torch.float32
  y dtype: torch.int64
  X range: [-4.26, 4.21]
  y values: [0, 1, 2, 3, 4, 5, 6, 8, 9]


In [54]:
# Common data pipeline issues
print("Common Data Pipeline Issues:\n")

print("1. Data type mismatch:")
print("   FIX: Ensure tensors are float32 for inputs, long for class labels")
print("   x = x.float(), y = y.long()\n")

print("2. Data not normalized:")
print("   FIX: x = (x - mean) / std")
print("   Or use transforms.Normalize()\n")

print("3. Wrong label format:")
print("   CrossEntropyLoss expects class indices (0, 1, 2, ...), not one-hot")
print("   BCELoss expects float targets\n")

print("4. Image channels in wrong order:")
print("   PyTorch expects (C, H, W), not (H, W, C)")
print("   FIX: x = x.permute(2, 0, 1) or np.transpose(x, (2, 0, 1))")

Common Data Pipeline Issues:

1. Data type mismatch:
   FIX: Ensure tensors are float32 for inputs, long for class labels
   x = x.float(), y = y.long()

2. Data not normalized:
   FIX: x = (x - mean) / std
   Or use transforms.Normalize()

3. Wrong label format:
   CrossEntropyLoss expects class indices (0, 1, 2, ...), not one-hot
   BCELoss expects float targets

4. Image channels in wrong order:
   PyTorch expects (C, H, W), not (H, W, C)
   FIX: x = x.permute(2, 0, 1) or np.transpose(x, (2, 0, 1))


---

# Part 7: Training Loop Debugging

---

In [55]:
def debug_training_step(model, criterion, optimizer, x, y):
    """
    A single training step with full debugging.
    """
    print("=" * 60)
    print("TRAINING STEP DEBUG")
    print("=" * 60)

    # 1. Input check
    print(f"\n1. INPUT CHECK")
    print(f"   x.shape: {x.shape}, dtype: {x.dtype}, device: {x.device}")
    print(f"   y.shape: {y.shape}, dtype: {y.dtype}, device: {y.device}")
    print(f"   x range: [{x.min():.2f}, {x.max():.2f}]")

    # 2. Forward pass
    print(f"\n2. FORWARD PASS")
    output = model(x)
    print(f"   output.shape: {output.shape}")
    print(f"   output range: [{output.min():.4f}, {output.max():.4f}]")
    print(f"   output has NaN: {torch.isnan(output).any().item()}")

    # 3. Loss computation
    print(f"\n3. LOSS COMPUTATION")
    loss = criterion(output, y)
    print(f"   loss value: {loss.item():.6f}")
    print(f"   loss is NaN: {torch.isnan(loss).item()}")
    print(f"   loss is Inf: {torch.isinf(loss).item()}")

    # 4. Backward pass
    print(f"\n4. BACKWARD PASS")
    optimizer.zero_grad()
    loss.backward()

    # Check gradients
    grad_norms = []
    for name, param in model.named_parameters():
        if param.grad is not None:
            grad_norm = param.grad.norm().item()
            grad_norms.append((name, grad_norm))
            if torch.isnan(param.grad).any():
                print(f"   WARNING: NaN gradient in {name}")

    # Print largest gradients
    grad_norms.sort(key=lambda x: x[1], reverse=True)
    print(f"   Top 3 gradient norms:")
    for name, norm in grad_norms[:3]:
        print(f"      {name}: {norm:.6f}")

    # 5. Optimizer step
    print(f"\n5. OPTIMIZER STEP")
    optimizer.step()
    print(f"   Step completed successfully")

    return loss.item()

# Test it
model = nn.Sequential(
    nn.Linear(10, 32),
    nn.ReLU(),
    nn.Linear(32, 5)
)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

x = torch.randn(4, 10)
y = torch.randint(0, 5, (4,))

loss = debug_training_step(model, criterion, optimizer, x, y)

TRAINING STEP DEBUG

1. INPUT CHECK
   x.shape: torch.Size([4, 10]), dtype: torch.float32, device: cpu
   y.shape: torch.Size([4]), dtype: torch.int64, device: cpu
   x range: [-1.76, 2.57]

2. FORWARD PASS
   output.shape: torch.Size([4, 5])
   output range: [-0.2530, 0.4381]
   output has NaN: False

3. LOSS COMPUTATION
   loss value: 1.590008
   loss is NaN: False
   loss is Inf: False

4. BACKWARD PASS
   Top 3 gradient norms:
      2.weight: 1.116569
      0.weight: 0.578876
      2.bias: 0.428131

5. OPTIMIZER STEP
   Step completed successfully


---

# Part 8: Debugging Checklist

---

In [56]:
def run_debug_checklist(model, x, y, criterion):
    """
    Run a comprehensive debug checklist.
    """
    issues = []

    print("üîç RUNNING DEBUG CHECKLIST...\n")

    # 1. Model checks
    print("1. Model Checks:")

    # Check if model has parameters
    num_params = sum(p.numel() for p in model.parameters())
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"   ‚úì Total params: {num_params:,}, Trainable: {trainable:,}")

    if trainable == 0:
        issues.append("‚ö†Ô∏è No trainable parameters!")

    # 2. Data checks
    print("\n2. Data Checks:")
    print(f"   ‚úì Input shape: {x.shape}")
    print(f"   ‚úì Label shape: {y.shape}")

    if torch.isnan(x).any():
        issues.append("‚ö†Ô∏è NaN in input data!")
    else:
        print("   ‚úì No NaN in input")

    # 3. Forward pass
    print("\n3. Forward Pass:")
    try:
        with torch.no_grad():
            out = model(x)
        print(f"   ‚úì Output shape: {out.shape}")

        if torch.isnan(out).any():
            issues.append("‚ö†Ô∏è NaN in model output!")
        else:
            print("   ‚úì No NaN in output")
    except Exception as e:
        issues.append(f"‚ö†Ô∏è Forward pass failed: {e}")

    # 4. Loss computation
    print("\n4. Loss Check:")
    try:
        out = model(x)
        loss = criterion(out, y)
        print(f"   ‚úì Loss value: {loss.item():.6f}")

        if torch.isnan(loss):
            issues.append("‚ö†Ô∏è Loss is NaN!")
        if torch.isinf(loss):
            issues.append("‚ö†Ô∏è Loss is Inf!")
    except Exception as e:
        issues.append(f"‚ö†Ô∏è Loss computation failed: {e}")

    # 5. Backward pass
    print("\n5. Backward Pass:")
    try:
        loss.backward()
        print("   ‚úì Backward pass successful")

        # Check for vanishing gradients
        small_grads = 0
        for name, param in model.named_parameters():
            if param.grad is not None:
                if param.grad.abs().mean() < 1e-7:
                    small_grads += 1

        if small_grads > 0:
            issues.append(f"‚ö†Ô∏è {small_grads} layers have very small gradients (vanishing?)")
        else:
            print("   ‚úì All gradients look healthy")
    except Exception as e:
        issues.append(f"‚ö†Ô∏è Backward pass failed: {e}")

    # Summary
    print("\n" + "=" * 50)
    if issues:
        print("‚ùå ISSUES FOUND:")
        for issue in issues:
            print(f"   {issue}")
    else:
        print("‚úÖ ALL CHECKS PASSED!")

    return issues

# Test it
model = nn.Linear(10, 5)
x = torch.randn(4, 10)
y = torch.randint(0, 5, (4,))
criterion = nn.CrossEntropyLoss()

issues = run_debug_checklist(model, x, y, criterion)

üîç RUNNING DEBUG CHECKLIST...

1. Model Checks:
   ‚úì Total params: 55, Trainable: 55

2. Data Checks:
   ‚úì Input shape: torch.Size([4, 10])
   ‚úì Label shape: torch.Size([4])
   ‚úì No NaN in input

3. Forward Pass:
   ‚úì Output shape: torch.Size([4, 5])
   ‚úì No NaN in output

4. Loss Check:
   ‚úì Loss value: 1.437672

5. Backward Pass:
   ‚úì Backward pass successful
   ‚úì All gradients look healthy

‚úÖ ALL CHECKS PASSED!


---

# Summary: PyTorch Debugging Toolkit

---

| Issue | Debugging Tool |
|-------|---------------|
| **Model architecture** | `print(model)`, `model.named_modules()` |
| **Shape problems** | Shape debugging hooks, systematic tracing |
| **Gradient issues** | `tensor.grad`, backward hooks |
| **NaN/Inf values** | `torch.isnan()`, `torch.isinf()` |
| **Memory leaks** | `torch.cuda.memory_allocated()` |
| **Device errors** | Check `.device` attribute |
| **Data pipeline** | Inspect first batch manually |

---

## Quick Reference Functions

```python
# Print model summary
print(model)

# Count parameters
sum(p.numel() for p in model.parameters())

# Check for NaN
torch.isnan(tensor).any()

# Get tensor device
tensor.device

# Clear GPU cache
torch.cuda.empty_cache()

# Debug mode for autograd
torch.autograd.set_detect_anomaly(True)
```

---

# Interview Tips

---

**Q: How would you debug a model that produces NaN loss?**

A:
1. Check input data for NaN/Inf
2. Use `torch.autograd.set_detect_anomaly(True)` to find where NaN appears
3. Check for numerical instability (log of zero, division by zero)
4. Lower learning rate
5. Add gradient clipping

**Q: Model trains but accuracy stays at random chance. What's wrong?**

A:
1. Verify data-label alignment (labels match correct data)
2. Check loss function matches task (CrossEntropy for classification)
3. Check if gradients are flowing (no vanishing)
4. Verify model is in train mode
5. Check data normalization

**Q: How do you find which layer is causing issues?**

A:
1. Register forward/backward hooks on all layers
2. Track activation statistics (mean, std, NaN count)
3. Visualize gradient magnitudes per layer
4. Progressively test smaller parts of the model

---

## Back to: [README](../README.md)