[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ruliana/pytorch-katas/blob/main/dan_2/kata_01_temple_scroll_authenticity_guardian_unrevised.ipynb)

## 🏮 The Ancient Scroll Unfurls 🏮

# THE FORBIDDEN SCROLL GUARDIAN: MASTERING THE DEEPER MYSTERIES

**Dan Level: 2 (Temple Guardian) | Time: 45 minutes | Sacred Arts: Multi-layer Networks, Regularization, Validation**

## 📜 THE CHALLENGE

*The temple library's most ancient chamber stands before you, its heavy doors carved with warnings in languages long forgotten. Master Pai-Torch approaches with uncharacteristic gravity.*

**Master Pai-Torch**: "Grasshopper, you have mastered the simple linear arts, but now you seek the deeper mysteries. The temple archives contain scrolls of immense power - some authentic treasures from the founding masters, others clever forgeries created by those who would steal our sacred knowledge."

*The ancient master's eyes glow with inner wisdom as mysterious symbols appear floating in the air around them.*

**Master Pai-Torch**: "The Art of Deep Authentication requires multiple layers of understanding - surface patterns, hidden meanings, and the subtle energies that flow between them. But beware! This knowledge is forbidden to those who lack proper discipline. Without the Sacred Safeguards, even the wisest student falls to the Curse of Overfitting."

*From the shadows, Master Ao-Tougrad materializes with an ethereal whisper.*

**Master Ao-Tougrad**: "I have walked the gradient paths of the deep networks for centuries. The untrained mind seeks complexity without wisdom, creating models that memorize rather than understand. Learn well the arts of Dropout and Validation, young one, for they are your only protection against the madness that comes with forbidden power."

### 🎯 THE SACRED OBJECTIVES

Your quest requires mastering these deeper mysteries:

- [ ] **Deep Network Architecture**: Create a multi-layer neural network with hidden layers
- [ ] **Regularization Mastery**: Implement dropout to prevent overfitting
- [ ] **Validation Wisdom**: Split data properly and monitor validation loss
- [ ] **Optimization Arts**: Use advanced optimizers beyond simple SGD
- [ ] **Guardian's Vigilance**: Detect and prevent overfitting through early stopping
- [ ] **Threshold Mastery**: Optimize decision boundaries for classification

**Master Pai-Torch**: "Remember, young guardian - a model that achieves perfect accuracy on training data but fails on new scrolls is no guardian at all. True wisdom lies in generalization, not memorization."

In [None]:
# 🔮 THE SACRED IMPORTS

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, List, Dict
import warnings
warnings.filterwarnings('ignore')

# Set the sacred seed for reproducible mystical results
torch.manual_seed(42)
np.random.seed(42)

print("🏮 The sacred libraries have been summoned...")
print(f"🧙 PyTorch version: {torch.__version__}")
print("⚡ The gradient spirits await your command!")

## 📚 THE SACRED DATA GENERATION SCROLL

*Master Pai-Torch produces an ancient scroll covered in complex diagrams and mathematical symbols.*

**Master Pai-Torch**: "The authentication of sacred scrolls requires understanding multiple layers of complexity. Surface features like ink density and paper texture are but the beginning. True authentication depends on hidden relationships - the flow of meaning, the rhythm of brush strokes, the subtle harmonies that only genuine masters can create."

**Master Ao-Tougrad**: "These patterns exist in higher dimensions than the simple linear world you have known. Observe how each scroll contains multiple measurements - some obvious, others hidden, all interconnected through the Deep Mystery."

In [None]:
# 📜 THE SACRED SCROLL AUTHENTICATION DATA

def generate_scroll_authentication_data(n_scrolls: int = 1000, 
                                       complexity_level: float = 0.3,
                                       sacred_seed: int = 42) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Generate data for authenticating ancient temple scrolls.
    
    Ancient wisdom suggests that scroll authenticity depends on multiple factors:
    - Surface features: ink_density, paper_texture, age_marks
    - Hidden features: brush_flow, spiritual_resonance, master_signature_energy
    - Deep relationships: Non-linear interactions between these features
    
    Authentic scrolls follow complex patterns that cannot be captured by simple linear models.
    
    Args:
        n_scrolls: Number of scrolls to analyze
        complexity_level: How complex the hidden patterns are (0.0 = simple, 1.0 = very complex)
        sacred_seed: Ensures consistent mystical randomness
        
    Returns:
        Tuple of (scroll_features, authenticity_labels)
        scroll_features: shape (n_scrolls, 6) - six measurements per scroll
        authenticity_labels: shape (n_scrolls, 1) - 1 for authentic, 0 for forgery
    """
    torch.manual_seed(sacred_seed)
    
    # Generate the six sacred measurements for each scroll
    # Surface features (these are somewhat predictable)
    ink_density = torch.rand(n_scrolls, 1) * 100  # 0-100 density units
    paper_texture = torch.rand(n_scrolls, 1) * 50  # 0-50 roughness units
    age_marks = torch.rand(n_scrolls, 1) * 20  # 0-20 age indicators
    
    # Hidden features (these contain the deeper mysteries)
    brush_flow = torch.rand(n_scrolls, 1) * 30  # 0-30 flow harmony units
    spiritual_resonance = torch.rand(n_scrolls, 1) * 40  # 0-40 spiritual energy
    master_signature = torch.rand(n_scrolls, 1) * 60  # 0-60 master's unique energy
    
    # Combine all features
    scroll_features = torch.cat([
        ink_density, paper_texture, age_marks,
        brush_flow, spiritual_resonance, master_signature
    ], dim=1)
    
    # The Sacred Formula for Authenticity (complex non-linear relationships)
    # This requires deep networks to learn properly!
    
    # First layer of hidden patterns
    surface_harmony = (ink_density.squeeze() * 0.3 + 
                      paper_texture.squeeze() * 0.4 + 
                      age_marks.squeeze() * 0.2)
    
    hidden_wisdom = (brush_flow.squeeze() * 0.5 + 
                    spiritual_resonance.squeeze() * 0.3 +
                    master_signature.squeeze() * 0.4)
    
    # Second layer: Non-linear interactions (this is why we need deep networks!)
    deep_pattern = torch.tanh(surface_harmony / 20) * torch.sigmoid(hidden_wisdom / 15)
    
    # Final authentication score with complexity
    base_score = (surface_harmony * 0.3 + hidden_wisdom * 0.4 + deep_pattern * 50)
    
    # Add complexity-based noise and non-linearity
    complexity_noise = torch.randn(n_scrolls) * complexity_level * base_score.std()
    final_score = base_score + complexity_noise
    
    # Add some additional non-linear terms to make it truly challenging
    interaction_term = (ink_density.squeeze() * brush_flow.squeeze() / 1000 + 
                       paper_texture.squeeze() * spiritual_resonance.squeeze() / 800)
    
    final_score = final_score + interaction_term * complexity_level * 10
    
    # Convert to binary authentication labels
    # Authentic scrolls have scores above the 60th percentile
    threshold = torch.quantile(final_score, 0.6)
    authenticity_labels = (final_score > threshold).float().unsqueeze(1)
    
    return scroll_features, authenticity_labels

def visualize_scroll_mysteries(features: torch.Tensor, labels: torch.Tensor):
    """
    Reveal the hidden patterns in the scroll authentication data.
    """
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    feature_names = ['Ink Density', 'Paper Texture', 'Age Marks', 
                    'Brush Flow', 'Spiritual Resonance', 'Master Signature']
    
    authentic_mask = labels.squeeze() == 1
    forgery_mask = labels.squeeze() == 0
    
    for i, (ax, feature_name) in enumerate(zip(axes.flat, feature_names)):
        # Plot authentic scrolls
        ax.hist(features[authentic_mask, i].numpy(), bins=20, alpha=0.7, 
                color='gold', label='Authentic Scrolls', density=True)
        
        # Plot forgeries
        ax.hist(features[forgery_mask, i].numpy(), bins=20, alpha=0.7, 
                color='red', label='Forgeries', density=True)
        
        ax.set_xlabel(feature_name)
        ax.set_ylabel('Density')
        ax.set_title(f'📜 {feature_name} Distribution')
        ax.legend()
        ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.suptitle('🔮 The Six Sacred Measurements of Scroll Authentication', 
                 fontsize=16, y=1.02)
    plt.show()
    
    # Show correlation mysteries
    plt.figure(figsize=(10, 8))
    correlation_matrix = torch.corrcoef(features.T)
    im = plt.imshow(correlation_matrix.numpy(), cmap='RdBu', vmin=-1, vmax=1)
    
    plt.colorbar(im, label='Correlation Strength')
    plt.xticks(range(6), feature_names, rotation=45)
    plt.yticks(range(6), feature_names)
    plt.title('🧙 The Hidden Connections Between Sacred Measurements')
    
    # Add correlation values as text
    for i in range(6):
        for j in range(6):
            plt.text(j, i, f'{correlation_matrix[i, j]:.2f}', 
                    ha='center', va='center', 
                    color='white' if abs(correlation_matrix[i, j]) > 0.5 else 'black')
    
    plt.tight_layout()
    plt.show()
    
    print(f"📊 Total scrolls analyzed: {len(features)}")
    print(f"✨ Authentic scrolls: {authentic_mask.sum().item()} ({authentic_mask.float().mean()*100:.1f}%)")
    print(f"🔴 Forgeries detected: {forgery_mask.sum().item()} ({forgery_mask.float().mean()*100:.1f}%)")
    print("\n🧙 Master Pai-Torch whispers: 'Notice how the patterns interweave... linear models cannot capture such complexity.'")

# Test the sacred data generation
print("🔮 Generating sacred scroll authentication data...")
scroll_features, authenticity_labels = generate_scroll_authentication_data(n_scrolls=800, complexity_level=0.3)

print(f"📜 Generated {len(scroll_features)} scroll measurements")
print(f"🎯 Feature shape: {scroll_features.shape}")
print(f"🎯 Label shape: {authenticity_labels.shape}")

visualize_scroll_mysteries(scroll_features, authenticity_labels)

## 🛡️ THE TEMPLE GUARDIAN'S NEURAL ARCHITECTURE

*Master Pai-Torch's expression grows serious as the forbidden knowledge chamber opens before you.*

**Master Pai-Torch**: "The linear arts you have mastered are but the foundation, young guardian. To authenticate these ancient scrolls, you must learn the Deep Architecture - networks with hidden layers that can perceive patterns invisible to simpler models."

**Master Ao-Tougrad**: "But beware the Curse of Overfitting! Without proper discipline, your deep network will memorize every scroll in the training chamber but fail completely when faced with new ones. The Sacred Dropout technique must be your constant companion."

*The masters gesture toward a complex training apparatus with multiple levels and protective barriers.*

**Master Pai-Torch**: "This is the Guardian's Trial. You must create a network with multiple hidden layers, each protected by the Dropout Shields. Only through this disciplined approach can you achieve true authentication wisdom."

In [None]:
# 🛡️ THE GUARDIAN'S DEEP NETWORK ARCHITECTURE

class ScrollAuthenticationGuardian(nn.Module):
    """
    A deep neural network guardian for authenticating ancient temple scrolls.
    
    This network uses multiple hidden layers to detect complex patterns,
    with dropout regularization to prevent overfitting.
    """
    
    def __init__(self, input_features: int = 6, dropout_rate: float = 0.3):
        super(ScrollAuthenticationGuardian, self).__init__()
        
        # TODO: Create the first hidden layer
        # Hint: Transform 6 input features to 128 hidden neurons
        # This layer learns to detect basic patterns in the six measurements
        self.hidden1 = None
        
        # TODO: Create the first dropout layer
        # Hint: Use nn.Dropout with the specified dropout_rate
        # This prevents overfitting by randomly "forgetting" some neurons during training
        self.dropout1 = None
        
        # TODO: Create the second hidden layer
        # Hint: Transform 128 features to 64 hidden neurons
        # This layer learns to combine basic patterns into more complex ones
        self.hidden2 = None
        
        # TODO: Create the second dropout layer
        # Hint: Use the same dropout_rate as the first layer
        self.dropout2 = None
        
        # TODO: Create the third hidden layer
        # Hint: Transform 64 features to 32 hidden neurons
        # This layer learns the most refined patterns
        self.hidden3 = None
        
        # TODO: Create the third dropout layer
        self.dropout3 = None
        
        # TODO: Create the output layer
        # Hint: Transform 32 features to 1 output (authenticity score)
        # This layer produces the final authentication decision
        self.output = None
        
    def forward(self, features: torch.Tensor) -> torch.Tensor:
        """
        Channel the scroll features through the deep authentication network.
        
        The network follows this pattern:
        Input → Hidden Layer 1 → ReLU → Dropout → Hidden Layer 2 → ReLU → Dropout → 
        Hidden Layer 3 → ReLU → Dropout → Output Layer → Sigmoid
        """
        # TODO: Pass input through first hidden layer and apply ReLU activation
        # Hint: Use F.relu() activation function
        x = None
        
        # TODO: Apply first dropout layer
        # Hint: Only apply dropout during training (self.training)
        x = None
        
        # TODO: Pass through second hidden layer with ReLU activation
        x = None
        
        # TODO: Apply second dropout layer
        x = None
        
        # TODO: Pass through third hidden layer with ReLU activation
        x = None
        
        # TODO: Apply third dropout layer
        x = None
        
        # TODO: Pass through output layer and apply sigmoid activation
        # Hint: Use torch.sigmoid() to get values between 0 and 1
        # This gives us the probability that the scroll is authentic
        output = None
        
        return output

def split_sacred_data(features: torch.Tensor, labels: torch.Tensor, 
                     train_ratio: float = 0.7, val_ratio: float = 0.2) -> Tuple:
    """
    Split the sacred scroll data into training, validation, and test sets.
    
    This is crucial for the Guardian's training - we must test our model
    on scrolls it has never seen before!
    """
    n_samples = len(features)
    
    # TODO: Calculate the split indices
    # Hint: train_end = int(n_samples * train_ratio)
    # Hint: val_end = int(n_samples * (train_ratio + val_ratio))
    train_end = None
    val_end = None
    
    # TODO: Split the features and labels
    # Hint: Use tensor slicing like features[:train_end]
    train_features = None
    train_labels = None
    
    val_features = None
    val_labels = None
    
    test_features = None
    test_labels = None
    
    return (train_features, train_labels, val_features, val_labels, test_features, test_labels)

def train_guardian(model: nn.Module, train_features: torch.Tensor, train_labels: torch.Tensor,
                  val_features: torch.Tensor, val_labels: torch.Tensor,
                  epochs: int = 1000, learning_rate: float = 0.001, 
                  patience: int = 50) -> Dict[str, List[float]]:
    """
    Train the Guardian network with proper validation and early stopping.
    
    Returns:
        Dictionary containing training history
    """
    # TODO: Choose the loss function
    # Hint: For binary classification, use nn.BCELoss() (Binary Cross Entropy)
    criterion = None
    
    # TODO: Choose the optimizer
    # Hint: Try optim.Adam() with the specified learning_rate
    # Adam is more sophisticated than SGD and works well with deep networks
    optimizer = None
    
    # Initialize tracking variables
    train_losses = []
    val_losses = []
    train_accuracies = []
    val_accuracies = []
    
    best_val_loss = float('inf')
    patience_counter = 0
    
    for epoch in range(epochs):
        # Training phase
        model.train()  # Enable dropout
        
        # TODO: Clear gradients
        # Hint: The gradient spirits must be banished before each training cycle
        
        # TODO: Forward pass
        train_predictions = None
        
        # TODO: Compute training loss
        train_loss = None
        
        # TODO: Backward pass
        
        # TODO: Update parameters
        
        # Validation phase
        model.eval()  # Disable dropout
        with torch.no_grad():
            val_predictions = model(val_features)
            val_loss = criterion(val_predictions, val_labels)
        
        # Calculate accuracies
        train_acc = ((train_predictions > 0.5) == train_labels).float().mean()
        val_acc = ((val_predictions > 0.5) == val_labels).float().mean()
        
        # Store metrics
        train_losses.append(train_loss.item())
        val_losses.append(val_loss.item())
        train_accuracies.append(train_acc.item())
        val_accuracies.append(val_acc.item())
        
        # Early stopping check
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
        else:
            patience_counter += 1
            
        if patience_counter >= patience:
            print(f"🛡️ Early stopping at epoch {epoch+1} - validation loss stopped improving")
            break
        
        # Report progress
        if (epoch + 1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{epochs}]')
            print(f'  Train Loss: {train_loss.item():.4f}, Train Acc: {train_acc:.4f}')
            print(f'  Val Loss: {val_loss.item():.4f}, Val Acc: {val_acc:.4f}')
            
            if val_acc > 0.85:
                print("  ✨ The Guardian grows stronger!")
            elif val_loss < train_loss * 1.1:
                print("  🛡️ Training remains disciplined!")
            else:
                print("  ⚠️ Beware the whispers of overfitting...")
    
    return {
        'train_losses': train_losses,
        'val_losses': val_losses,
        'train_accuracies': train_accuracies,
        'val_accuracies': val_accuracies
    }

## ⚡ THE TRIALS OF MASTERY

*Master Pai-Torch watches as you prepare for the Guardian's Trial, while Master Ao-Tougrad observes from the shadows.*

**Master Pai-Torch**: "The true test of a Guardian is not perfect performance on known scrolls, but wise discernment of unknown ones. Your network must learn the essence of authenticity, not mere memorization."

**Master Ao-Tougrad**: "The validation loss is your guide through the darkness. When it begins to rise while training loss falls, you approach the dangerous territory of overfitting. The wise Guardian knows when to stop."

In [None]:
# ⚡ THE TRIALS OF MASTERY

## Trial 1: The Guardian's Architecture
# - [ ] Network has 3 hidden layers with proper dimensions (128, 64, 32)
# - [ ] Dropout layers are properly implemented with 0.3 dropout rate
# - [ ] Forward pass follows the correct architecture pattern
# - [ ] Output uses sigmoid activation for binary classification

## Trial 2: The Sacred Data Split
# - [ ] Data is properly split into train/validation/test sets (70%/20%/10%)
# - [ ] Validation set is used for early stopping
# - [ ] Test set remains untouched until final evaluation

## Trial 3: Training Discipline
# - [ ] Uses Adam optimizer with learning rate 0.001
# - [ ] Implements early stopping with patience=50
# - [ ] Validation accuracy reaches at least 80%
# - [ ] Gap between training and validation loss remains reasonable (<0.1)

def test_guardian_wisdom(model: nn.Module, test_features: torch.Tensor, test_labels: torch.Tensor):
    """
    The ultimate test of the Guardian's wisdom on completely unseen scrolls.
    """
    model.eval()
    with torch.no_grad():
        test_predictions = model(test_features)
        test_binary_predictions = (test_predictions > 0.5).float()
        
        # Calculate comprehensive metrics
        accuracy = (test_binary_predictions == test_labels).float().mean()
        
        # True/False Positives and Negatives
        true_positives = ((test_binary_predictions == 1) & (test_labels == 1)).sum().item()
        false_positives = ((test_binary_predictions == 1) & (test_labels == 0)).sum().item()
        true_negatives = ((test_binary_predictions == 0) & (test_labels == 0)).sum().item()
        false_negatives = ((test_binary_predictions == 0) & (test_labels == 1)).sum().item()
        
        # Precision and Recall
        precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
        recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
        f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        
        print("🏮 THE GUARDIAN'S FINAL EVALUATION 🏮")
        print(f"📊 Test Accuracy: {accuracy:.4f}")
        print(f"🎯 Precision: {precision:.4f} (When model says 'authentic', how often is it right?)")
        print(f"🔍 Recall: {recall:.4f} (Of all authentic scrolls, how many did we find?)")
        print(f"⚖️ F1 Score: {f1_score:.4f} (Balanced measure of performance)")
        
        print("\n📈 Detailed Results:")
        print(f"✅ True Positives: {true_positives} (Correctly identified authentic scrolls)")
        print(f"❌ False Positives: {false_positives} (Mistakenly identified forgeries as authentic)")
        print(f"✅ True Negatives: {true_negatives} (Correctly identified forgeries)")
        print(f"❌ False Negatives: {false_negatives} (Mistakenly identified authentic as forgeries)")
        
        # Guardian's Assessment
        if accuracy >= 0.85 and precision >= 0.80 and recall >= 0.80:
            print("\n🏆 Master Pai-Torch nods with deep approval:")
            print("    'You have achieved the wisdom of a true Guardian. Your network")
            print("     discerns authentic scrolls with both precision and recall.'")
        elif accuracy >= 0.80:
            print("\n🛡️ Master Pai-Torch speaks with measured approval:")
            print("    'Your Guardian skills are solid, but strive for greater balance")
            print("     between precision and recall in your future training.'")
        else:
            print("\n⚠️ Master Pai-Torch's expression grows serious:")
            print("    'More training is needed, young Guardian. The sacred scrolls")
            print("     require deeper understanding to authenticate properly.'")
        
        # Check for overfitting signs
        print("\n🔮 Master Ao-Tougrad's Overfitting Assessment:")
        if hasattr(model, 'training_history'):
            final_train_loss = model.training_history['train_losses'][-1]
            final_val_loss = model.training_history['val_losses'][-1]
            gap = final_val_loss - final_train_loss
            
            if gap < 0.05:
                print("    'Your training shows excellent discipline. The validation")
                print("     and training losses remain in harmony.'")
            elif gap < 0.15:
                print("    'Acceptable generalization. Some overfitting is present but")
                print("     within reasonable bounds.'")
            else:
                print("    'Beware! Significant overfitting detected. Your model memorizes")
                print("     rather than understands. Consider more dropout or early stopping.'")
        
        return {
            'accuracy': accuracy.item(),
            'precision': precision,
            'recall': recall,
            'f1_score': f1_score,
            'confusion_matrix': {
                'true_positives': true_positives,
                'false_positives': false_positives,
                'true_negatives': true_negatives,
                'false_negatives': false_negatives
            }
        }

def visualize_guardian_training(history: Dict[str, List[float]]):
    """
    Visualize the Guardian's training progress.
    """
    epochs = range(1, len(history['train_losses']) + 1)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot losses
    ax1.plot(epochs, history['train_losses'], 'b-', label='Training Loss', linewidth=2)
    ax1.plot(epochs, history['val_losses'], 'r-', label='Validation Loss', linewidth=2)
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('🛡️ Guardian Training: Loss Curves')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot accuracies
    ax2.plot(epochs, history['train_accuracies'], 'b-', label='Training Accuracy', linewidth=2)
    ax2.plot(epochs, history['val_accuracies'], 'r-', label='Validation Accuracy', linewidth=2)
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.set_title('🎯 Guardian Training: Accuracy Curves')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Check for overfitting
    final_train_loss = history['train_losses'][-1]
    final_val_loss = history['val_losses'][-1]
    
    if final_val_loss > final_train_loss * 1.2:
        print("⚠️ Master Ao-Tougrad observes: 'The validation loss diverges from training loss - overfitting may be present.'")
    else:
        print("✅ Master Ao-Tougrad nods approvingly: 'The training remains disciplined and well-generalized.'")

# Execute the Guardian's Trial
print("🛡️ Beginning the Guardian's Trial...")
print("📜 Preparing the sacred scroll data...")

# Split the data
train_features, train_labels, val_features, val_labels, test_features, test_labels = split_sacred_data(
    scroll_features, authenticity_labels
)

print(f"⚔️ Training set: {len(train_features)} scrolls")
print(f"🛡️ Validation set: {len(val_features)} scrolls")
print(f"🔍 Test set: {len(test_features)} scrolls")

# Create and train the guardian
guardian = ScrollAuthenticationGuardian(input_features=6, dropout_rate=0.3)
print(f"\n🏗️ Guardian architecture created with {sum(p.numel() for p in guardian.parameters())} parameters")

# Train the guardian
print("\n🏋️ Beginning Guardian training...")
training_history = train_guardian(guardian, train_features, train_labels, 
                                val_features, val_labels, 
                                epochs=1000, learning_rate=0.001, patience=50)

# Store training history for later analysis
guardian.training_history = training_history

# Visualize training progress
visualize_guardian_training(training_history)

# Final evaluation
final_results = test_guardian_wisdom(guardian, test_features, test_labels)

## 🌸 THE FOUR PATHS OF MASTERY: PROGRESSIVE EXTENSIONS

*With the basic Guardian skills mastered, the ancient masters reveal deeper mysteries that separate novice guardians from true masters.*

**Master Pai-Torch**: "Your foundation is solid, young Guardian, but true mastery requires exploring the deeper chambers of knowledge. Each path ahead will challenge you in new ways."

### Extension 1: Cook Oh-Pai-Timizer's Batch Recipe Mastery
**"Just as a master chef optimizes cooking for many portions, so must a Guardian optimize training for many scrolls!"**

*Cook Oh-Pai-Timizer appears with a massive cauldron and hundreds of ingredients.*

**Cook Oh-Pai-Timizer**: "Ah, young Guardian! I see you've learned to authenticate scrolls one by one, but what happens when thousands of scrolls arrive at once? In my kitchen, we don't cook each grain of rice individually - we batch them for efficiency!"

**NEW CONCEPTS**: Mini-batch training, batch normalization, training efficiency optimization  
**DIFFICULTY**: +15% (still Dan 2, but with advanced training techniques)

In [None]:
# 🍜 COOK OH-PAI-TIMIZER'S BATCH RECIPE MASTERY

def create_data_loader(features: torch.Tensor, labels: torch.Tensor, batch_size: int = 32, shuffle: bool = True):
    """
    Cook Oh-Pai-Timizer's recipe for serving data in perfect batches.
    
    Args:
        features: The scroll features tensor
        labels: The authenticity labels tensor
        batch_size: How many scrolls to process at once
        shuffle: Whether to mix up the order (like tossing a salad!)
    
    Returns:
        DataLoader that serves perfectly sized batches
    """
    # TODO: Create a TensorDataset combining features and labels
    # Hint: Use torch.utils.data.TensorDataset
    dataset = None
    
    # TODO: Create a DataLoader with the specified batch_size and shuffle
    # Hint: Use torch.utils.data.DataLoader
    dataloader = None
    
    return dataloader

class BatchNormalizedGuardian(nn.Module):
    """
    An enhanced Guardian that uses batch normalization for more stable training.
    """
    
    def __init__(self, input_features: int = 6, dropout_rate: float = 0.3):
        super(BatchNormalizedGuardian, self).__init__()
        
        self.hidden1 = nn.Linear(input_features, 128)
        # TODO: Add batch normalization after the first layer
        # Hint: Use nn.BatchNorm1d with 128 features
        self.bn1 = None
        self.dropout1 = nn.Dropout(dropout_rate)
        
        self.hidden2 = nn.Linear(128, 64)
        # TODO: Add batch normalization after the second layer
        self.bn2 = None
        self.dropout2 = nn.Dropout(dropout_rate)
        
        self.hidden3 = nn.Linear(64, 32)
        # TODO: Add batch normalization after the third layer
        self.bn3 = None
        self.dropout3 = nn.Dropout(dropout_rate)
        
        self.output = nn.Linear(32, 1)
        
    def forward(self, features: torch.Tensor) -> torch.Tensor:
        # TODO: Implement forward pass with batch normalization
        # Pattern: Linear → BatchNorm → ReLU → Dropout
        # Hint: Apply batch normalization before the activation function
        
        x = self.hidden1(features)
        x = self.bn1(x)  # Normalize the batch
        x = F.relu(x)
        x = self.dropout1(x)
        
        # TODO: Continue the pattern for remaining layers
        x = None
        
        # Final output layer (no batch norm here)
        output = torch.sigmoid(self.output(x))
        return output

def train_with_batches(model: nn.Module, train_loader, val_loader, 
                      epochs: int = 100, learning_rate: float = 0.001):
    """
    Train the Guardian using Cook Oh-Pai-Timizer's batch cooking method.
    """
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    train_losses = []
    val_losses = []
    
    for epoch in range(epochs):
        # Training phase
        model.train()
        epoch_train_loss = 0
        
        for batch_features, batch_labels in train_loader:
            # TODO: Implement batch training loop
            # Hint: Clear gradients, forward pass, backward pass, update parameters
            optimizer.zero_grad()
            predictions = model(batch_features)
            loss = criterion(predictions, batch_labels)
            loss.backward()
            optimizer.step()
            
            epoch_train_loss += loss.item()
        
        # Validation phase
        model.eval()
        epoch_val_loss = 0
        
        with torch.no_grad():
            for batch_features, batch_labels in val_loader:
                predictions = model(batch_features)
                loss = criterion(predictions, batch_labels)
                epoch_val_loss += loss.item()
        
        # Average losses
        avg_train_loss = epoch_train_loss / len(train_loader)
        avg_val_loss = epoch_val_loss / len(val_loader)
        
        train_losses.append(avg_train_loss)
        val_losses.append(avg_val_loss)
        
        if (epoch + 1) % 20 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}')
    
    return {'train_losses': train_losses, 'val_losses': val_losses}

# TRIAL: Implement batch training with batch normalization
print("🍜 Cook Oh-Pai-Timizer's Batch Training Trial")
print("TODO: Implement the batch training system and compare with single-batch training")

# SUCCESS: Batch training converges faster and more stably than single-batch training
# MASTERY: Understanding how batch normalization stabilizes training

### Extension 2: He-Ao-World's Messy Archive Challenge
**"Oh dear! I'm afraid I've made quite the mess of the scroll archive..."**

*He-Ao-World shuffles over with an apologetic expression, surrounded by scrolls covered in tea stains and torn edges.*

**He-Ao-World**: "So sorry, young Guardian! While organizing the ancient archive, I had a little... accident. Some scrolls got damaged, others have missing measurements, and a few might have been mislabeled. The real world is never as clean as our training chambers, I'm afraid."

**Master Pai-Torch**: "This is actually a valuable lesson. Real authentication data is never perfect - you must learn to handle noise, missing values, and mislabeled examples."

**NEW CONCEPTS**: Noise robustness, missing data handling, label noise, data augmentation  
**DIFFICULTY**: +25% (still Dan 2, but with real-world data challenges)

In [None]:
# 🧹 HE-AO-WORLD'S MESSY ARCHIVE CHALLENGE

def create_messy_archive_data(features: torch.Tensor, labels: torch.Tensor, 
                            noise_level: float = 0.2, missing_rate: float = 0.1, 
                            label_noise_rate: float = 0.05) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    He-Ao-World's "accidental" data corruption that makes training more realistic.
    
    Args:
        features: Original clean scroll features
        labels: Original clean authenticity labels
        noise_level: How much measurement noise to add (0.0 = clean, 1.0 = very noisy)
        missing_rate: Fraction of measurements that are "missing" (set to 0)
        label_noise_rate: Fraction of labels that are incorrectly flipped
    
    Returns:
        Tuple of (messy_features, messy_labels)
    """
    messy_features = features.clone()
    messy_labels = labels.clone()
    
    # TODO: Add measurement noise to features
    # Hint: Add Gaussian noise scaled by noise_level and the feature's standard deviation
    # Example: messy_features += torch.randn_like(messy_features) * noise_level * messy_features.std()
    
    # TODO: Simulate missing data by randomly setting some features to 0
    # Hint: Create a random mask and set those positions to 0
    # Example: missing_mask = torch.rand_like(messy_features) < missing_rate
    
    # TODO: Add label noise by randomly flipping some labels
    # Hint: Create a random mask and flip labels (1 becomes 0, 0 becomes 1)
    
    return messy_features, messy_labels

class RobustGuardian(nn.Module):
    """
    A Guardian trained to handle messy, real-world data.
    """
    
    def __init__(self, input_features: int = 6, dropout_rate: float = 0.4):
        super(RobustGuardian, self).__init__()
        
        # TODO: Design a more robust architecture
        # Hint: Use higher dropout rate and consider skip connections
        # Hint: Skip connections help with training stability
        
        self.hidden1 = nn.Linear(input_features, 128)
        self.dropout1 = nn.Dropout(dropout_rate)
        
        self.hidden2 = nn.Linear(128, 128)  # Same size for skip connection
        self.dropout2 = nn.Dropout(dropout_rate)
        
        self.hidden3 = nn.Linear(128, 64)
        self.dropout3 = nn.Dropout(dropout_rate)
        
        self.output = nn.Linear(64, 1)
        
    def forward(self, features: torch.Tensor) -> torch.Tensor:
        # TODO: Implement forward pass with skip connection
        # Hint: Save the output of hidden1, then add it to the output of hidden2
        
        x1 = F.relu(self.hidden1(features))
        x1 = self.dropout1(x1)
        
        x2 = F.relu(self.hidden2(x1))
        x2 = self.dropout2(x2)
        
        # Skip connection: add the input to this layer's output
        x2 = x2 + x1  # This helps with gradient flow and training stability
        
        x3 = F.relu(self.hidden3(x2))
        x3 = self.dropout3(x3)
        
        output = torch.sigmoid(self.output(x3))
        return output

def train_robust_guardian(clean_features: torch.Tensor, clean_labels: torch.Tensor):
    """
    Train a Guardian that can handle He-Ao-World's messy data.
    """
    print("🧹 He-Ao-World apologizes: 'Let me show you how messy real data can be...'")
    
    # Create messy training data
    messy_features, messy_labels = create_messy_archive_data(
        clean_features, clean_labels, 
        noise_level=0.2, missing_rate=0.1, label_noise_rate=0.05
    )
    
    # TODO: Compare clean vs messy data visually
    # TODO: Train RobustGuardian on messy data
    # TODO: Test on both clean and messy test sets
    
    print("TODO: Implement robust training with noisy data")
    
# TRIAL: Train on messy data and test on both clean and messy test sets
print("🧹 He-Ao-World's Messy Archive Challenge")
print("TODO: Compare performance on clean vs messy data")

# SUCCESS: Robust model maintains >75% accuracy even on messy data
# MASTERY: Understanding the trade-off between robustness and accuracy

### Extension 3: Master Pai-Torch's Learning Rate Wisdom
**"The path to enlightenment is not walked at constant speed, young Guardian."**

*Master Pai-Torch sits in deep meditation, surrounded by floating symbols that pulse with varying intensity.*

**Master Pai-Torch**: "Observe the rhythm of your heartbeat, the cycle of seasons, the ebb and flow of tides. All wisdom follows patterns of change. So too must your learning rate adapt to the journey of training."

**Master Pai-Torch**: "In the beginning, bold steps are needed to escape local valleys. As wisdom grows, gentler steps preserve the knowledge gained. The master knows when to leap and when to tread carefully."

**NEW CONCEPTS**: Learning rate scheduling, adaptive learning rates, cyclical training, warm restarts  
**DIFFICULTY**: +35% (still Dan 2, but with advanced optimization techniques)

In [None]:
# 🧙 MASTER PAI-TORCH'S LEARNING RATE WISDOM

def create_learning_rate_scheduler(optimizer, schedule_type: str = 'cosine', 
                                 warmup_epochs: int = 10, max_epochs: int = 100):
    """
    Master Pai-Torch's wisdom on adapting learning rates during training.
    
    Args:
        optimizer: The optimizer to schedule
        schedule_type: Type of scheduling ('cosine', 'step', 'exponential')
        warmup_epochs: Number of epochs to warm up the learning rate
        max_epochs: Total number of training epochs
    
    Returns:
        Learning rate scheduler
    """
    if schedule_type == 'cosine':
        # TODO: Implement cosine annealing scheduler
        # Hint: Use torch.optim.lr_scheduler.CosineAnnealingLR
        scheduler = None
    elif schedule_type == 'step':
        # TODO: Implement step scheduler (reduce LR every 30 epochs)
        # Hint: Use torch.optim.lr_scheduler.StepLR
        scheduler = None
    elif schedule_type == 'exponential':
        # TODO: Implement exponential decay scheduler
        # Hint: Use torch.optim.lr_scheduler.ExponentialLR
        scheduler = None
    else:
        raise ValueError(f"Unknown schedule type: {schedule_type}")
    
    return scheduler

def train_with_adaptive_learning(model: nn.Module, train_features: torch.Tensor, 
                               train_labels: torch.Tensor, val_features: torch.Tensor, 
                               val_labels: torch.Tensor, schedule_type: str = 'cosine'):
    """
    Train the Guardian using Master Pai-Torch's adaptive learning rate wisdom.
    """
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.01)  # Start with higher learning rate
    
    scheduler = create_learning_rate_scheduler(optimizer, schedule_type, 
                                             warmup_epochs=20, max_epochs=200)
    
    train_losses = []
    val_losses = []
    learning_rates = []
    
    for epoch in range(200):
        # Training phase
        model.train()
        optimizer.zero_grad()
        
        predictions = model(train_features)
        train_loss = criterion(predictions, train_labels)
        train_loss.backward()
        optimizer.step()
        
        # Validation phase
        model.eval()
        with torch.no_grad():
            val_predictions = model(val_features)
            val_loss = criterion(val_predictions, val_labels)
        
        # Update learning rate
        scheduler.step()
        
        # Record metrics
        train_losses.append(train_loss.item())
        val_losses.append(val_loss.item())
        learning_rates.append(optimizer.param_groups[0]['lr'])
        
        if (epoch + 1) % 40 == 0:
            current_lr = optimizer.param_groups[0]['lr']
            print(f'Epoch [{epoch+1}/200], Train Loss: {train_loss.item():.4f}, '
                  f'Val Loss: {val_loss.item():.4f}, LR: {current_lr:.6f}')
    
    return {
        'train_losses': train_losses,
        'val_losses': val_losses,
        'learning_rates': learning_rates
    }

def visualize_learning_rate_effect(histories: Dict[str, Dict]):
    """
    Visualize the effect of different learning rate schedules.
    """
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Plot learning rates
    for schedule_name, history in histories.items():
        epochs = range(len(history['learning_rates']))
        axes[0, 0].plot(epochs, history['learning_rates'], label=schedule_name, linewidth=2)
    
    axes[0, 0].set_xlabel('Epoch')
    axes[0, 0].set_ylabel('Learning Rate')
    axes[0, 0].set_title('🧙 Learning Rate Schedules')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot validation losses
    for schedule_name, history in histories.items():
        epochs = range(len(history['val_losses']))
        axes[0, 1].plot(epochs, history['val_losses'], label=schedule_name, linewidth=2)
    
    axes[0, 1].set_xlabel('Epoch')
    axes[0, 1].set_ylabel('Validation Loss')
    axes[0, 1].set_title('🎯 Validation Loss Comparison')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# TODO: Implement comparison of different learning rate schedules
print("🧙 Master Pai-Torch's Learning Rate Wisdom")
print("TODO: Compare cosine, step, and exponential learning rate schedules")

# TRIAL: Compare different learning rate schedules
# SUCCESS: Understand how different schedules affect convergence speed and final performance
# MASTERY: Choose the right schedule based on the problem characteristics

### Extension 4: Master Ao-Tougrad's Gradient Flow Mastery
**"The deepest networks require the most careful cultivation of gradient flow."**

*Master Ao-Tougrad emerges from the shadows, ethereal gradients flowing around their form like mystical rivers.*

**Master Ao-Tougrad**: "You have learned to build deep networks, young Guardian, but do you understand the sacred flow of gradients through these depths? As networks grow deeper, the gradient spirits grow weaker, sometimes vanishing entirely before reaching the early layers."

**Master Ao-Tougrad**: "The ancient masters discovered techniques to preserve gradient strength: the Residual Paths, the Gradient Clipping, and the careful Weight Initialization. Master these arts, and you shall build networks deeper than the temple itself."

**NEW CONCEPTS**: Gradient clipping, residual connections, weight initialization, gradient flow analysis  
**DIFFICULTY**: +45% (still Dan 2, but approaching Dan 3 complexity)

In [None]:
# ⚡ MASTER AO-TOUGRAD'S GRADIENT FLOW MASTERY

class DeepResidualGuardian(nn.Module):
    """
    A very deep Guardian network with residual connections to preserve gradient flow.
    """
    
    def __init__(self, input_features: int = 6, dropout_rate: float = 0.3):
        super(DeepResidualGuardian, self).__init__()
        
        # Input projection
        self.input_proj = nn.Linear(input_features, 128)
        
        # TODO: Create multiple residual blocks
        # Each block should have: Linear -> BatchNorm -> ReLU -> Dropout -> Linear
        # with a skip connection around the entire block
        
        self.res_block1_1 = nn.Linear(128, 128)
        self.res_block1_2 = nn.Linear(128, 128)
        self.bn1 = nn.BatchNorm1d(128)
        self.dropout1 = nn.Dropout(dropout_rate)
        
        # TODO: Add more residual blocks
        self.res_block2_1 = nn.Linear(128, 128)
        self.res_block2_2 = nn.Linear(128, 128)
        self.bn2 = nn.BatchNorm1d(128)
        self.dropout2 = nn.Dropout(dropout_rate)
        
        # Output layers
        self.output_proj = nn.Linear(128, 64)
        self.output = nn.Linear(64, 1)
        
        # TODO: Initialize weights using Xavier/He initialization
        self._initialize_weights()
    
    def _initialize_weights(self):
        """
        Master Ao-Tougrad's wisdom on proper weight initialization.
        """
        for module in self.modules():
            if isinstance(module, nn.Linear):
                # TODO: Initialize weights with Xavier/He initialization
                # Hint: Use nn.init.xavier_uniform_ or nn.init.kaiming_uniform_
                nn.init.xavier_uniform_(module.weight)
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0)
    
    def forward(self, features: torch.Tensor) -> torch.Tensor:
        x = F.relu(self.input_proj(features))
        
        # TODO: Implement residual block 1
        # Pattern: identity = x, then transform x, then add identity back
        identity1 = x
        x = self.res_block1_1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.dropout1(x)
        x = self.res_block1_2(x)
        x = x + identity1  # Skip connection!
        x = F.relu(x)
        
        # TODO: Implement residual block 2
        identity2 = x
        # TODO: Complete the second residual block
        
        # Output
        x = F.relu(self.output_proj(x))
        output = torch.sigmoid(self.output(x))
        
        return output

def train_with_gradient_clipping(model: nn.Module, train_features: torch.Tensor, 
                               train_labels: torch.Tensor, val_features: torch.Tensor, 
                               val_labels: torch.Tensor, clip_value: float = 1.0):
    """
    Train with gradient clipping to prevent gradient explosion.
    """
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    train_losses = []
    val_losses = []
    gradient_norms = []
    
    for epoch in range(200):
        model.train()
        optimizer.zero_grad()
        
        predictions = model(train_features)
        train_loss = criterion(predictions, train_labels)
        train_loss.backward()
        
        # TODO: Calculate gradient norm before clipping
        # Hint: Use torch.nn.utils.clip_grad_norm_
        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
        gradient_norms.append(grad_norm.item())
        
        optimizer.step()
        
        # Validation
        model.eval()
        with torch.no_grad():
            val_predictions = model(val_features)
            val_loss = criterion(val_predictions, val_labels)
        
        train_losses.append(train_loss.item())
        val_losses.append(val_loss.item())
        
        if (epoch + 1) % 50 == 0:
            print(f'Epoch [{epoch+1}/200], Train Loss: {train_loss.item():.4f}, '
                  f'Val Loss: {val_loss.item():.4f}, Grad Norm: {grad_norm:.4f}')
    
    return {
        'train_losses': train_losses,
        'val_losses': val_losses,
        'gradient_norms': gradient_norms
    }

def analyze_gradient_flow(model: nn.Module, sample_input: torch.Tensor):
    """
    Analyze how gradients flow through the deep network.
    """
    model.eval()
    
    # TODO: Perform forward pass and compute gradients
    # TODO: Analyze gradient magnitudes at different layers
    # TODO: Visualize gradient flow
    
    print("🔍 Master Ao-Tougrad's Gradient Flow Analysis")
    print("TODO: Implement gradient flow analysis")
    
    # Show which layers have strong vs weak gradients
    for name, param in model.named_parameters():
        if param.grad is not None:
            grad_magnitude = param.grad.abs().mean().item()
            print(f"{name}: Gradient magnitude = {grad_magnitude:.6f}")

# TODO: Implement deep residual network training
print("⚡ Master Ao-Tougrad's Gradient Flow Mastery")
print("TODO: Compare shallow vs deep networks, with and without residual connections")

# TRIAL: Train very deep networks (6+ layers) with and without residual connections
# SUCCESS: Residual connections enable training of much deeper networks
# MASTERY: Understanding gradient flow and why deep networks are hard to train

## 🔥 CORRECTING YOUR FORM: A STANCE IMBALANCE

*Master Pai-Torch observes your training ritual with a careful eye, while Master Ao-Tougrad materializes from the shadows with a knowing expression.*

**Master Pai-Torch**: "Your eager mind races ahead of your disciplined form, grasshopper. I sense a disturbance in your validation discipline - your network memorizes when it should generalize."

**Master Ao-Tougrad**: "The gradient flow speaks of poor technique. Without proper regularization, even the most skilled Guardian falls to the curse of overfitting. Observe this flawed training ritual left by a previous disciple."

*The masters gesture toward a complex training apparatus that pulses with unstable energy.*

**Master Pai-Torch**: "Can you restore balance to this chaotic training? The errors are subtle but deadly - your form needs correction."

In [None]:
# 🔥 CORRECTING YOUR FORM: A STANCE IMBALANCE

class FlawedGuardian(nn.Module):
    """
    A Guardian network with subtle but critical flaws in its training discipline.
    Can you identify and correct the stance imbalances?
    """
    
    def __init__(self, input_features: int = 6):
        super(FlawedGuardian, self).__init__()
        
        # The architecture looks correct...
        self.hidden1 = nn.Linear(input_features, 256)  # Suspiciously large for this problem
        self.hidden2 = nn.Linear(256, 256)
        self.hidden3 = nn.Linear(256, 256)
        self.hidden4 = nn.Linear(256, 128)
        self.hidden5 = nn.Linear(128, 64)
        self.output = nn.Linear(64, 1)
        
        # No dropout layers - could this be the issue?
        
    def forward(self, features: torch.Tensor) -> torch.Tensor:
        x = F.relu(self.hidden1(features))
        x = F.relu(self.hidden2(x))
        x = F.relu(self.hidden3(x))
        x = F.relu(self.hidden4(x))
        x = F.relu(self.hidden5(x))
        output = torch.sigmoid(self.output(x))
        return output

def flawed_training_ritual(model: nn.Module, features: torch.Tensor, labels: torch.Tensor):
    """
    This training ritual has lost its balance - your form needs correction! 🥋
    
    Multiple critical errors lurk within this seemingly correct training loop.
    """
    criterion = nn.BCELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.1)  # Learning rate seems high...
    
    # No train/validation split - training on all data!
    train_losses = []
    
    for epoch in range(2000):  # Very long training...
        model.train()
        
        # Forward pass
        predictions = model(features)
        loss = criterion(predictions, labels)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        # No gradient clearing between epochs!
        
        train_losses.append(loss.item())
        
        if epoch % 200 == 0:
            accuracy = ((predictions > 0.5) == labels).float().mean()
            print(f'Epoch {epoch}: Loss = {loss.item():.4f}, Accuracy = {accuracy:.4f}')
            
            if accuracy > 0.95:  # Suspiciously high accuracy
                print("🎉 Perfect accuracy achieved! Training complete!")
                break
    
    return model, train_losses

def test_flawed_guardian():
    """
    Test the flawed Guardian and reveal the problems.
    """
    print("🔥 Testing the Flawed Guardian's Training Ritual...")
    print("\n⚠️ WARNING: This code contains multiple critical errors!")
    
    # Create model and data
    flawed_model = FlawedGuardian()
    features, labels = generate_scroll_authentication_data(n_scrolls=500, complexity_level=0.2)
    
    # Train with flawed method
    trained_model, losses = flawed_training_ritual(flawed_model, features, labels)
    
    # Test on the SAME data (another error!)
    trained_model.eval()
    with torch.no_grad():
        test_predictions = trained_model(features)
        test_accuracy = ((test_predictions > 0.5) == labels).float().mean()
    
    print(f"\n📊 Final Training Accuracy: {test_accuracy:.4f}")
    print(f"📊 Final Training Loss: {losses[-1]:.4f}")
    
    # Now test on NEW data (the real test)
    new_features, new_labels = generate_scroll_authentication_data(n_scrolls=200, complexity_level=0.2, sacred_seed=123)
    
    with torch.no_grad():
        new_predictions = trained_model(new_features)
        new_accuracy = ((new_predictions > 0.5) == new_labels).float().mean()
    
    print(f"\n🚨 REAL TEST (New Data) Accuracy: {new_accuracy:.4f}")
    print(f"📉 Performance Drop: {(test_accuracy - new_accuracy):.4f}")
    
    if new_accuracy < test_accuracy * 0.8:
        print("\n💥 CRITICAL OVERFITTING DETECTED!")
        print("🧙 Master Pai-Torch speaks: 'Your network memorizes, it does not understand!'")
        print("⚡ Master Ao-Tougrad whispers: 'The gradient discipline has been abandoned...'")
    
    return trained_model, losses

# DEBUGGING CHALLENGE: Can you identify ALL the critical errors?
print("🔍 DEBUGGING CHALLENGE: Find and fix the flawed training ritual!")
print("\nErrors to identify:")
print("1. Missing gradient clearing (optimizer.zero_grad())")
print("2. No train/validation split")
print("3. Testing on training data")
print("4. Network too large for the problem (overfitting)")
print("5. No regularization (dropout)")
print("6. No early stopping")
print("7. Learning rate too high")
print("8. Training too long without validation")

# MASTER'S WISDOM: "The undisciplined mind accumulates old thoughts,
# just as the untrained gradient accumulates old directions."

# HINT: The most critical error causes gradient accumulation across epochs!
# HINT: Perfect training accuracy with poor test accuracy signals overfitting!
# HINT: A true Guardian tests on data they've never seen before!

# Run the flawed training to see the problems
test_flawed_guardian()

## 🏆 THE GUARDIAN'S MASTERY ACHIEVED

*As your training completes, both Master Pai-Torch and Master Ao-Tougrad approach with expressions of deep approval.*

**Master Pai-Torch**: "Rise, Guardian. You have learned to balance the complexity of deep networks with the wisdom of regularization. Your models now generalize rather than memorize, a distinction that separates true masters from mere practitioners."

**Master Ao-Tougrad**: "The gradient flow bends to your will, yet you have learned restraint. Dropout shields your networks from overfitting, while validation guides your training with wisdom. You understand that the path to mastery lies not in perfect training accuracy, but in the harmony between learning and generalization."

**Master Pai-Torch**: "You have earned the title of Temple Guardian. Your next challenge awaits in the weapons chamber, where specialized architectures forge tools for specific battles. But that is a trial for another day."

*The masters bow deeply as the sacred scroll of mastery appears before you, glowing with the wisdom of the deep networks.*

### 🎓 Sacred Knowledge Gained:
- **Deep Architecture Mastery**: Multi-layer networks with proper structure
- **Regularization Wisdom**: Dropout and early stopping to prevent overfitting
- **Validation Discipline**: Proper train/validation/test splits
- **Optimization Arts**: Adam optimizer and learning rate scheduling
- **Gradient Flow Understanding**: Residual connections and gradient clipping
- **Real-World Robustness**: Handling noisy and imperfect data

**The Guardian's Oath**: *"I swear to validate before I deploy, to regularize before I optimize, and to generalize beyond mere memorization. May my networks serve the temple with wisdom and restraint."*

---

**🌟 Next Quest**: Dan 3 - The Weapon Master awaits to teach you the specialized architectures: CNNs, RNNs, and the mystical attention mechanisms that can perceive patterns across space and time...