# Debugging Strategies: PyTorch vs TensorFlow

**Learning Objectives:**
- Master debugging techniques specific to each framework
- Learn to identify and fix common deep learning issues
- Understand profiling and optimization strategies
- Develop systematic approaches to troubleshooting

**Prerequisites:** Computational graphs, gradients, tensor operations

**Estimated Time:** 40 minutes

---

Debugging deep learning models is often more challenging than traditional software debugging. This notebook covers:
- **Framework-specific debugging tools**
- **Common issues and their solutions**
- **Performance profiling techniques**
- **Best practices for systematic debugging**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import time
import sys
import os
import warnings

# Add src to path for our utilities
sys.path.append(os.path.join('..', '..', 'src'))

from utils.comparison_tools import FrameworkComparison, create_side_by_side_comparison

# Try to import frameworks
try:
    import torch
    import torch.nn as nn
    import torch.optim as optim
    PYTORCH_AVAILABLE = True
    print(f"✅ PyTorch {torch.__version__} available")
except ImportError:
    PYTORCH_AVAILABLE = False
    print("❌ PyTorch not available")

try:
    import tensorflow as tf
    TENSORFLOW_AVAILABLE = True
    print(f"✅ TensorFlow {tf.__version__} available")
except ImportError:
    TENSORFLOW_AVAILABLE = False
    print("❌ TensorFlow not available")

# Set random seeds
np.random.seed(42)
if PYTORCH_AVAILABLE:
    torch.manual_seed(42)
if TENSORFLOW_AVAILABLE:
    tf.random.set_seed(42)

## 1. Common Deep Learning Issues

Let's start by identifying common problems and their symptoms.

In [None]:
print("=" * 60)
print("COMMON DEEP LEARNING ISSUES")
print("=" * 60)

print("""
Common Issues and Symptoms:

1. 🔥 EXPLODING GRADIENTS
   • Loss becomes NaN or infinity
   • Gradients have very large magnitudes
   • Training becomes unstable

2. 🌊 VANISHING GRADIENTS
   • Loss stops decreasing
   • Gradients become very small
   • Early layers don't learn

3. 💀 DEAD NEURONS
   • ReLU neurons output zero
   • Gradients are zero
   • Network capacity reduced

4. 🐌 SLOW CONVERGENCE
   • Loss decreases very slowly
   • Poor learning rate choice
   • Bad initialization

5. 📈 OVERFITTING
   • Training loss << validation loss
   • Model memorizes training data
   • Poor generalization
""")

# Demonstrate exploding gradients
print("\n1. Exploding Gradients Example:")

if PYTORCH_AVAILABLE:
    print("\n🔥 PyTorch Exploding Gradients:")
    
    def demonstrate_exploding_gradients():
        # Create a deep network with poor initialization
        layers = []
        for i in range(10):  # Very deep network
            layer = nn.Linear(50, 50)
            # Bad initialization - too large weights
            nn.init.normal_(layer.weight, mean=0, std=2.0)  # Large std
            layers.append(layer)
            layers.append(nn.ReLU())
        
        model = nn.Sequential(*layers)
        
        # Forward pass
        x = torch.randn(32, 50)
        target = torch.randn(32, 50)
        
        prediction = model(x)
        loss = nn.MSELoss()(prediction, target)
        
        print(f"Loss: {loss.item():.4f}")
        
        # Check for NaN/Inf
        if torch.isnan(loss) or torch.isinf(loss):
            print("⚠️  Loss is NaN or Inf - likely exploding gradients!")
            return
        
        # Backward pass
        loss.backward()
        
        # Check gradient magnitudes
        total_norm = 0
        for param in model.parameters():
            if param.grad is not None:
                param_norm = param.grad.data.norm(2)
                total_norm += param_norm.item() ** 2
        total_norm = total_norm ** (1. / 2)
        
        print(f"Total gradient norm: {total_norm:.4f}")
        
        if total_norm > 100:
            print("⚠️  Large gradient norm - exploding gradients detected!")
            print("💡 Solutions: Gradient clipping, better initialization, skip connections")
        
        return total_norm
    
    try:
        grad_norm = demonstrate_exploding_gradients()
    except Exception as e:
        print(f"Error during training: {e}")
        print("This might be due to exploding gradients!")

# Demonstrate vanishing gradients
print("\n2. Vanishing Gradients Example:")

if PYTORCH_AVAILABLE:
    print("\n🔥 PyTorch Vanishing Gradients:")
    
    def demonstrate_vanishing_gradients():
        # Deep network with sigmoid activations (prone to vanishing gradients)
        layers = []
        for i in range(8):  # Deep network
            layers.append(nn.Linear(20, 20))
            layers.append(nn.Sigmoid())  # Sigmoid causes vanishing gradients
        
        model = nn.Sequential(*layers)
        
        # Forward pass
        x = torch.randn(16, 20)
        target = torch.randn(16, 20)
        
        prediction = model(x)
        loss = nn.MSELoss()(prediction, target)
        
        # Backward pass
        loss.backward()
        
        # Analyze gradient magnitudes by layer
        print("Gradient magnitudes by layer:")
        layer_idx = 0
        for name, param in model.named_parameters():
            if 'weight' in name and param.grad is not None:
                grad_norm = param.grad.norm().item()
                print(f"Layer {layer_idx}: {grad_norm:.8f}")
                
                if grad_norm < 1e-6:
                    print(f"  ⚠️  Very small gradients - vanishing gradient problem!")
                
                layer_idx += 1
        
        print("💡 Solutions: ReLU activations, residual connections, better initialization")
    
    demonstrate_vanishing_gradients()

# Dead ReLU neurons
print("\n3. Dead ReLU Neurons:")

if PYTORCH_AVAILABLE:
    print("\n🔥 PyTorch Dead ReLU Detection:")
    
    def detect_dead_neurons():
        model = nn.Sequential(
            nn.Linear(10, 100),
            nn.ReLU(),
            nn.Linear(100, 50),
            nn.ReLU(),
            nn.Linear(50, 1)
        )
        
        # Initialize with large negative bias (causes dead ReLUs)
        for layer in model:
            if isinstance(layer, nn.Linear):
                if layer.bias is not None:
                    nn.init.constant_(layer.bias, -10)  # Large negative bias
        
        x = torch.randn(100, 10)
        target = torch.randn(100, 1)
        
        # Forward pass with hooks to capture activations
        activations = {}
        
        def hook_fn(name):
            def hook(module, input, output):
                activations[name] = output.detach()
            return hook
        
        # Register hooks
        model[1].register_forward_hook(hook_fn('relu1'))
        model[3].register_forward_hook(hook_fn('relu2'))
        
        prediction = model(x)
        loss = nn.MSELoss()(prediction, target)
        loss.backward()
        
        # Check for dead neurons
        for name, activation in activations.items():
            # Count neurons that are always zero
            always_zero = (activation == 0).all(dim=0)
            dead_count = always_zero.sum().item()
            total_neurons = activation.shape[1]
            
            print(f"{name}: {dead_count}/{total_neurons} dead neurons ({dead_count/total_neurons:.1%})")
            
            if dead_count > total_neurons * 0.1:  # More than 10% dead
                print(f"  ⚠️  High percentage of dead neurons!")
                print(f"  💡 Solutions: Lower learning rate, better initialization, Leaky ReLU")
    
    detect_dead_neurons()

## 2. Framework-Specific Debugging Tools

Each framework provides specific tools for debugging.

In [None]:
print("\n" + "=" * 60)
print("FRAMEWORK-SPECIFIC DEBUGGING TOOLS")
print("=" * 60)

# PyTorch debugging tools
if PYTORCH_AVAILABLE:
    print("\n🔥 PyTorch Debugging Tools:")
    
    # 1. Tensor inspection
    print("\n1. Tensor Inspection:")
    
    def pytorch_tensor_inspection():
        x = torch.randn(5, 3)
        
        print(f"Tensor shape: {x.shape}")
        print(f"Tensor dtype: {x.dtype}")
        print(f"Tensor device: {x.device}")
        print(f"Requires grad: {x.requires_grad}")
        print(f"Is leaf: {x.is_leaf}")
        print(f"Memory format: {x.is_contiguous()}")
        
        # Check for problematic values
        print(f"Contains NaN: {torch.isnan(x).any()}")
        print(f"Contains Inf: {torch.isinf(x).any()}")
        print(f"Min value: {x.min().item():.4f}")
        print(f"Max value: {x.max().item():.4f}")
        print(f"Mean: {x.mean().item():.4f}")
        print(f"Std: {x.std().item():.4f}")
    
    pytorch_tensor_inspection()
    
    # 2. Model inspection
    print("\n2. Model Inspection:")
    
    def pytorch_model_inspection():
        model = nn.Sequential(
            nn.Linear(10, 20),
            nn.ReLU(),
            nn.Linear(20, 1)
        )
        
        print("Model architecture:")
        print(model)
        
        print("\nModel parameters:")
        total_params = 0
        for name, param in model.named_parameters():
            print(f"{name}: {param.shape} ({param.numel()} parameters)")
            total_params += param.numel()
        
        print(f"Total parameters: {total_params}")
        
        # Check parameter statistics
        print("\nParameter statistics:")
        for name, param in model.named_parameters():
            print(f"{name}:")
            print(f"  Mean: {param.data.mean().item():.6f}")
            print(f"  Std: {param.data.std().item():.6f}")
            print(f"  Min: {param.data.min().item():.6f}")
            print(f"  Max: {param.data.max().item():.6f}")
    
    pytorch_model_inspection()
    
    # 3. Gradient inspection
    print("\n3. Gradient Inspection:")
    
    def pytorch_gradient_inspection():
        model = nn.Linear(5, 1)
        x = torch.randn(10, 5)
        target = torch.randn(10, 1)
        
        prediction = model(x)
        loss = nn.MSELoss()(prediction, target)
        loss.backward()
        
        print("Gradient inspection:")
        for name, param in model.named_parameters():
            if param.grad is not None:
                grad = param.grad
                print(f"{name} gradient:")
                print(f"  Shape: {grad.shape}")
                print(f"  Norm: {grad.norm().item():.6f}")
                print(f"  Mean: {grad.mean().item():.6f}")
                print(f"  Contains NaN: {torch.isnan(grad).any()}")
                print(f"  Contains Inf: {torch.isinf(grad).any()}")
            else:
                print(f"{name}: No gradient computed")
    
    pytorch_gradient_inspection()

# TensorFlow debugging tools
if TENSORFLOW_AVAILABLE:
    print("\n🟠 TensorFlow Debugging Tools:")
    
    # 1. Tensor inspection
    print("\n1. Tensor Inspection:")
    
    def tensorflow_tensor_inspection():
        x = tf.random.normal((5, 3))
        
        print(f"Tensor shape: {x.shape}")
        print(f"Tensor dtype: {x.dtype}")
        print(f"Tensor device: {x.device}")
        
        # Check for problematic values
        print(f"Contains NaN: {tf.reduce_any(tf.math.is_nan(x))}")
        print(f"Contains Inf: {tf.reduce_any(tf.math.is_inf(x))}")
        print(f"Min value: {tf.reduce_min(x).numpy():.4f}")
        print(f"Max value: {tf.reduce_max(x).numpy():.4f}")
        print(f"Mean: {tf.reduce_mean(x).numpy():.4f}")
        print(f"Std: {tf.math.reduce_std(x).numpy():.4f}")
    
    tensorflow_tensor_inspection()
    
    # 2. Model inspection
    print("\n2. Model Inspection:")
    
    def tensorflow_model_inspection():
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(20, activation='relu', input_shape=(10,)),
            tf.keras.layers.Dense(1)
        ])
        
        # Build the model
        model.build((None, 10))
        
        print("Model summary:")
        model.summary()
        
        print("\nModel weights:")
        for i, layer in enumerate(model.layers):
            if layer.weights:
                print(f"Layer {i} ({layer.name}):")
                for j, weight in enumerate(layer.weights):
                    print(f"  Weight {j}: {weight.shape}")
                    print(f"    Mean: {tf.reduce_mean(weight).numpy():.6f}")
                    print(f"    Std: {tf.math.reduce_std(weight).numpy():.6f}")
    
    tensorflow_model_inspection()
    
    # 3. Debugging with tf.debugging
    print("\n3. TensorFlow Debugging Utilities:")
    
    def tensorflow_debugging_utilities():
        # Create some problematic data
        x = tf.constant([1.0, 2.0, float('nan'), 4.0])
        
        print("Debugging utilities:")
        
        # Check for NaN/Inf
        try:
            tf.debugging.check_numerics(x, "Input contains NaN or Inf")
        except tf.errors.InvalidArgumentError as e:
            print(f"Caught error: {e}")
        
        # Assert operations
        y = tf.constant([1.0, 2.0, 3.0, 4.0])
        
        # This will pass
        tf.debugging.assert_all_finite(y, "y should be finite")
        print("All finite assertion passed")
        
        # Assert shapes
        tf.debugging.assert_equal(tf.shape(y), [4], "Shape should be [4]")
        print("Shape assertion passed")
        
        # Assert ranges
        tf.debugging.assert_greater_equal(y, 0.0, "All values should be >= 0")
        print("Range assertion passed")
    
    tensorflow_debugging_utilities()

# Side-by-side debugging comparison
pytorch_debug_code = """
# PyTorch debugging workflow
import torch
import torch.nn as nn

# 1. Check tensor properties
x = torch.randn(5, 3)
print(f"Shape: {x.shape}, dtype: {x.dtype}")
print(f"NaN: {torch.isnan(x).any()}")
print(f"Inf: {torch.isinf(x).any()}")

# 2. Model inspection
model = nn.Linear(3, 1)
for name, param in model.named_parameters():
    print(f"{name}: {param.shape}")

# 3. Gradient checking
loss = nn.MSELoss()(model(x), torch.randn(5, 1))
loss.backward()

for name, param in model.named_parameters():
    if param.grad is not None:
        print(f"{name} grad norm: {param.grad.norm()}")

# 4. Hook for intermediate values
def hook_fn(module, input, output):
    print(f"Output shape: {output.shape}")

model.register_forward_hook(hook_fn)
"""

tensorflow_debug_code = """
# TensorFlow debugging workflow
import tensorflow as tf

# 1. Check tensor properties
x = tf.random.normal((5, 3))
print(f"Shape: {x.shape}, dtype: {x.dtype}")
print(f"NaN: {tf.reduce_any(tf.math.is_nan(x))}")
print(f"Inf: {tf.reduce_any(tf.math.is_inf(x))}")

# 2. Model inspection
model = tf.keras.layers.Dense(1)
model.build((None, 3))
model.summary()

# 3. Gradient checking
with tf.GradientTape() as tape:
    prediction = model(x)
    loss = tf.reduce_mean(tf.square(prediction - tf.random.normal((5, 1))))

gradients = tape.gradient(loss, model.trainable_variables)
for i, grad in enumerate(gradients):
    print(f"Gradient {i} norm: {tf.norm(grad)}")

# 4. Debugging assertions
tf.debugging.check_numerics(x, "Input check")
tf.debugging.assert_all_finite(prediction, "Prediction check")

# 5. Print debugging in graph mode
@tf.function
def debug_function(x):
    tf.print("Debug info:", tf.shape(x))
    return tf.reduce_mean(x)
"""

print(create_side_by_side_comparison(
    pytorch_debug_code, tensorflow_debug_code, "Debugging Workflows"
))

## 3. Performance Profiling

Understanding where your model spends time and memory.

In [None]:
print("\n" + "=" * 60)
print("PERFORMANCE PROFILING")
print("=" * 60)

# PyTorch profiling
if PYTORCH_AVAILABLE:
    print("\n🔥 PyTorch Profiling:")
    
    def pytorch_profiling_example():
        # Create a model for profiling
        model = nn.Sequential(
            nn.Linear(100, 200),
            nn.ReLU(),
            nn.Linear(200, 100),
            nn.ReLU(),
            nn.Linear(100, 10)
        )
        
        x = torch.randn(32, 100)
        target = torch.randint(0, 10, (32,))
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.Adam(model.parameters())
        
        # Simple timing
        print("1. Simple Timing:")
        
        # Warm up
        for _ in range(5):
            prediction = model(x)
            loss = criterion(prediction, target)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        # Time forward pass
        start_time = time.time()
        for _ in range(100):
            prediction = model(x)
        forward_time = (time.time() - start_time) / 100
        
        # Time backward pass
        start_time = time.time()
        for _ in range(100):
            prediction = model(x)
            loss = criterion(prediction, target)
            optimizer.zero_grad()
            loss.backward()
        backward_time = (time.time() - start_time) / 100 - forward_time
        
        print(f"Forward pass: {forward_time*1000:.2f} ms")
        print(f"Backward pass: {backward_time*1000:.2f} ms")
        
        # Memory usage
        print("\n2. Memory Usage:")
        
        def get_memory_usage():
            if torch.cuda.is_available():
                return torch.cuda.memory_allocated() / 1024**2  # MB
            else:
                # Approximate CPU memory (not exact)
                total_params = sum(p.numel() * p.element_size() for p in model.parameters())
                return total_params / 1024**2
        
        memory_before = get_memory_usage()
        prediction = model(x)
        memory_after = get_memory_usage()
        
        print(f"Memory before forward: {memory_before:.2f} MB")
        print(f"Memory after forward: {memory_after:.2f} MB")
        print(f"Memory increase: {memory_after - memory_before:.2f} MB")
        
        # Model size analysis
        print("\n3. Model Analysis:")
        
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        model_size = sum(p.numel() * p.element_size() for p in model.parameters()) / 1024**2
        
        print(f"Total parameters: {total_params:,}")
        print(f"Trainable parameters: {trainable_params:,}")
        print(f"Model size: {model_size:.2f} MB")
        
        # FLOPs estimation (rough)
        def estimate_flops():
            flops = 0
            for layer in model:
                if isinstance(layer, nn.Linear):
                    # Matrix multiplication: 2 * input_size * output_size * batch_size
                    flops += 2 * layer.in_features * layer.out_features * x.size(0)
            return flops
        
        estimated_flops = estimate_flops()
        print(f"Estimated FLOPs per forward pass: {estimated_flops:,}")
        print(f"Estimated FLOP/s: {estimated_flops / forward_time / 1e9:.2f} GFLOP/s")
    
    pytorch_profiling_example()

# TensorFlow profiling
if TENSORFLOW_AVAILABLE:
    print("\n🟠 TensorFlow Profiling:")
    
    def tensorflow_profiling_example():
        # Create a model for profiling
        model = tf.keras.Sequential([
            tf.keras.layers.Dense(200, activation='relu', input_shape=(100,)),
            tf.keras.layers.Dense(100, activation='relu'),
            tf.keras.layers.Dense(10, activation='softmax')
        ])
        
        x = tf.random.normal((32, 100))
        target = tf.random.uniform((32,), maxval=10, dtype=tf.int32)
        target_onehot = tf.one_hot(target, 10)
        
        optimizer = tf.keras.optimizers.Adam()
        loss_fn = tf.keras.losses.CategoricalCrossentropy()
        
        # Compile for better performance
        @tf.function
        def train_step(x, y):
            with tf.GradientTape() as tape:
                prediction = model(x, training=True)
                loss = loss_fn(y, prediction)
            
            gradients = tape.gradient(loss, model.trainable_variables)
            optimizer.apply_gradients(zip(gradients, model.trainable_variables))
            return loss
        
        # Warm up
        for _ in range(5):
            train_step(x, target_onehot)
        
        print("1. Simple Timing:")
        
        # Time forward pass
        start_time = time.time()
        for _ in range(100):
            prediction = model(x, training=False)
        forward_time = (time.time() - start_time) / 100
        
        # Time full training step
        start_time = time.time()
        for _ in range(100):
            train_step(x, target_onehot)
        train_time = (time.time() - start_time) / 100
        
        print(f"Forward pass: {forward_time*1000:.2f} ms")
        print(f"Full training step: {train_time*1000:.2f} ms")
        print(f"Backward pass (estimated): {(train_time - forward_time)*1000:.2f} ms")
        
        # Model analysis
        print("\n2. Model Analysis:")
        
        total_params = model.count_params()
        model_size = sum([tf.size(var).numpy() * 4 for var in model.trainable_variables]) / 1024**2  # Assume float32
        
        print(f"Total parameters: {total_params:,}")
        print(f"Model size: {model_size:.2f} MB")
        
        # Layer-wise analysis
        print("\n3. Layer-wise Analysis:")
        for i, layer in enumerate(model.layers):
            if hasattr(layer, 'count_params'):
                layer_params = layer.count_params()
                print(f"Layer {i} ({layer.name}): {layer_params:,} parameters")
    
    tensorflow_profiling_example()

# Profiling best practices
print("\n📋 Profiling Best Practices:")
profiling_tips = [
    "🔥 Always warm up before timing (JIT compilation, caching)",
    "📊 Profile both forward and backward passes separately",
    "💾 Monitor memory usage, especially for large models",
    "🎯 Profile on target hardware (CPU vs GPU)",
    "📈 Use built-in profilers for detailed analysis",
    "⚡ Compare eager vs compiled execution (TensorFlow)",
    "🔍 Profile individual layers to find bottlenecks",
    "📏 Measure actual throughput (samples/second)",
    "🎛️ Profile different batch sizes",
    "🔄 Profile data loading pipeline separately"
]

for tip in profiling_tips:
    print(f"  {tip}")

## 4. Systematic Debugging Approach

A step-by-step methodology for debugging deep learning issues.

In [None]:
print("\n" + "=" * 60)
print("SYSTEMATIC DEBUGGING APPROACH")
print("=" * 60)

print("""
🔍 SYSTEMATIC DEBUGGING METHODOLOGY

1. 📋 REPRODUCE THE ISSUE
   • Create minimal reproducible example
   • Fix random seeds for consistency
   • Document exact environment and versions

2. 🎯 ISOLATE THE PROBLEM
   • Test individual components
   • Use simple synthetic data
   • Start with smallest possible model

3. 📊 GATHER INFORMATION
   • Check tensor shapes and dtypes
   • Monitor loss curves
   • Inspect gradients and activations
   • Profile performance

4. 🔬 FORM HYPOTHESES
   • Based on symptoms, what could be wrong?
   • List possible causes in order of likelihood
   • Consider common issues first

5. 🧪 TEST HYPOTHESES
   • Test one hypothesis at a time
   • Make minimal changes
   • Document what works and what doesn't

6. ✅ VERIFY THE FIX
   • Test on original problem
   • Ensure no regression
   • Add tests to prevent future issues
""")

# Debugging checklist
print("\n📝 DEBUGGING CHECKLIST:")

debugging_checklist = {
    "Data Issues": [
        "✓ Check data shapes and types",
        "✓ Verify data preprocessing",
        "✓ Look for NaN/Inf in inputs",
        "✓ Check data distribution",
        "✓ Verify labels are correct"
    ],
    "Model Issues": [
        "✓ Verify model architecture",
        "✓ Check parameter initialization",
        "✓ Ensure proper activation functions",
        "✓ Verify loss function choice",
        "✓ Check for parameter updates"
    ],
    "Training Issues": [
        "✓ Check learning rate",
        "✓ Verify optimizer settings",
        "✓ Monitor gradient magnitudes",
        "✓ Check for gradient clipping",
        "✓ Verify batch size effects"
    ],
    "Implementation Issues": [
        "✓ Check tensor operations",
        "✓ Verify device placement",
        "✓ Check memory usage",
        "✓ Verify random seed setting",
        "✓ Check framework versions"
    ]
}

for category, items in debugging_checklist.items():
    print(f"\n{category}:")
    for item in items:
        print(f"  {item}")

# Common solutions
print("\n💡 COMMON SOLUTIONS:")

common_solutions = {
    "Loss not decreasing": [
        "• Check learning rate (try 1e-3, 1e-4)",
        "• Verify data preprocessing",
        "• Check model capacity",
        "• Try different optimizer",
        "• Check for label issues"
    ],
    "Loss becomes NaN": [
        "• Reduce learning rate",
        "• Add gradient clipping",
        "• Check for division by zero",
        "• Use more stable loss function",
        "• Check input data for NaN/Inf"
    ],
    "Training too slow": [
        "• Increase batch size",
        "• Use GPU acceleration",
        "• Optimize data loading",
        "• Use mixed precision training",
        "• Profile and optimize bottlenecks"
    ],
    "Memory issues": [
        "• Reduce batch size",
        "• Use gradient accumulation",
        "• Clear unused variables",
        "• Use gradient checkpointing",
        "• Optimize model architecture"
    ]
}

for problem, solutions in common_solutions.items():
    print(f"\n{problem}:")
    for solution in solutions:
        print(f"  {solution}")

# Framework-specific debugging commands
print("\n🛠️ QUICK DEBUGGING COMMANDS:")

pytorch_debug_commands = """
# PyTorch Quick Debug Commands

# Check tensor properties
print(f"Shape: {tensor.shape}, dtype: {tensor.dtype}")
print(f"Device: {tensor.device}, requires_grad: {tensor.requires_grad}")
print(f"NaN: {torch.isnan(tensor).any()}, Inf: {torch.isinf(tensor).any()}")

# Check gradients
for name, param in model.named_parameters():
    if param.grad is not None:
        print(f"{name}: grad_norm={param.grad.norm():.6f}")

# Memory debugging
print(f"CUDA memory: {torch.cuda.memory_allocated()/1e6:.1f}MB")

# Gradient checking
torch.autograd.gradcheck(model, input_tensor)

# Hook for intermediate values
def debug_hook(module, input, output):
    print(f"{module.__class__.__name__}: {output.shape}")
model.register_forward_hook(debug_hook)
"""

tensorflow_debug_commands = """
# TensorFlow Quick Debug Commands

# Check tensor properties
print(f"Shape: {tensor.shape}, dtype: {tensor.dtype}")
print(f"Device: {tensor.device}")
print(f"NaN: {tf.reduce_any(tf.math.is_nan(tensor))}")
print(f"Inf: {tf.reduce_any(tf.math.is_inf(tensor))}")

# Debugging assertions
tf.debugging.check_numerics(tensor, "Tensor check")
tf.debugging.assert_all_finite(tensor, "Finite check")

# Print in graph mode
tf.print("Debug:", tensor)

# Model summary
model.summary()

# Check gradients
with tf.GradientTape() as tape:
    loss = model(x)
grads = tape.gradient(loss, model.trainable_variables)
for i, grad in enumerate(grads):
    tf.print(f"Grad {i} norm:", tf.norm(grad))
"""

print(create_side_by_side_comparison(
    pytorch_debug_commands, tensorflow_debug_commands, "Quick Debug Commands"
))

## Summary and Key Takeaways

**What we've learned:**

1. **Common Issues**: Exploding/vanishing gradients, dead neurons, slow convergence, overfitting
2. **Framework Tools**: PyTorch's dynamic debugging vs TensorFlow's assertion system
3. **Profiling**: Performance analysis and bottleneck identification
4. **Systematic Approach**: Methodical debugging workflow
5. **Best Practices**: Checklists and common solutions

**Framework-Specific Debugging:**

| Aspect | PyTorch | TensorFlow |
|--------|---------|------------|
| **Tensor Inspection** | `.shape`, `.dtype`, `torch.isnan()` | `.shape`, `.dtype`, `tf.math.is_nan()` |
| **Model Inspection** | `.named_parameters()`, hooks | `.summary()`, `.trainable_variables` |
| **Gradient Checking** | `torch.autograd.gradcheck()` | Manual numerical checking |
| **Assertions** | Python `assert` | `tf.debugging.assert_*()` |
| **Print Debugging** | Standard `print()` | `tf.print()` for graphs |
| **Profiling** | Manual timing, hooks | `tf.profiler`, manual timing |

**Debugging Workflow:**

1. **Reproduce** → Create minimal example
2. **Isolate** → Test components separately
3. **Gather** → Collect diagnostic information
4. **Hypothesize** → Form theories about the issue
5. **Test** → Verify hypotheses systematically
6. **Verify** → Ensure the fix works

**Common Issue Patterns:**

**Exploding Gradients:**
- Symptoms: NaN loss, very large gradient norms
- Solutions: Gradient clipping, lower learning rate, better initialization

**Vanishing Gradients:**
- Symptoms: Loss plateaus, small gradients in early layers
- Solutions: ReLU activations, residual connections, proper initialization

**Dead Neurons:**
- Symptoms: Many ReLU outputs are zero, zero gradients
- Solutions: Lower learning rate, Leaky ReLU, better initialization

**Performance Issues:**
- Symptoms: Slow training, high memory usage
- Solutions: Profiling, batch size optimization, GPU utilization

**Best Practices:**

**Prevention:**
- Use proper initialization (Xavier, He)
- Choose appropriate learning rates
- Monitor training metrics continuously
- Use gradient clipping for RNNs
- Validate data preprocessing

**Debugging:**
- Start with simple cases
- Check one thing at a time
- Use version control for experiments
- Document findings and solutions
- Create reproducible examples

**Tools and Resources:**

**PyTorch:**
- TensorBoard for visualization
- `torch.autograd.gradcheck()` for gradient verification
- Hooks for intermediate inspection
- `torch.profiler` for detailed profiling

**TensorFlow:**
- TensorBoard for visualization
- `tf.debugging.*` for assertions
- `tf.profiler` for performance analysis
- Eager execution for easier debugging

**Next Steps:**
- Practice debugging on real projects
- Learn advanced profiling techniques
- Study framework-specific optimization
- Explore distributed training debugging

Effective debugging is a crucial skill for deep learning practitioners. Both PyTorch and TensorFlow provide powerful tools, but the key is developing a systematic approach to problem-solving and knowing which tools to use when.