[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ruliana/pytorch-katas/blob/main/dan_1/kata_01_temple_cat_feeding_predictor_unrevised.ipynb)

## 🏮 The Ancient Scroll Unfurls 🏮

**THE MYSTERIES OF SUKI'S APPETITE: A LINEAR REVELATION**

Dan Level: 1 (Temple Sweeper) | Time: 45 minutes | Sacred Arts: Linear Regression, Gradient Descent, Loss Functions

## 📜 THE CHALLENGE

Master Pai-Torch sits in contemplative silence beside the temple's sacred feeding bowl, watching Suki the temple cat's daily rituals. "Young grasshopper," the master begins, "observe how Suki appears at this bowl with mysterious precision. Yet beneath this feline mystery lies a pattern as ancient as the mountains themselves. The wise temple keepers have long known that a cat's hunger grows steadily with each passing hour, following a sacred mathematical harmony."

"Your first trial as Temple Sweeper is to decode this sacred relationship," Master Pai-Torch continues, stroking their chin thoughtfully. "Through the mystical arts of linear regression, you must learn to predict when Suki's hunger will compel her to appear at the feeding bowl. Master this simple relationship, and you will have taken your first step toward understanding the deeper mysteries of neural networks. But beware - even the most basic patterns require disciplined practice to master."

## 🎯 THE SACRED OBJECTIVES

- [ ] **Tensor Mastery**: Create and manipulate PyTorch tensors for cat feeding data
- [ ] **Linear Wisdom**: Implement a neural network with a single linear layer
- [ ] **Gradient Discipline**: Master the sacred training loop with proper gradient management
- [ ] **Loss Understanding**: Use Mean Squared Error to measure prediction accuracy
- [ ] **Pattern Recognition**: Discover the hidden relationship between time and hunger
- [ ] **Convergence Patience**: Train until your model achieves temple-worthy accuracy

In [69]:
# 📦 ALL IMPORTS AND CONFIGURATION
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
from typing import Tuple

# Global configuration constants
DEFAULT_CHAOS_LEVEL = 0.1
FEEDING_THRESHOLD = 70  # Hunger level at which Suki appears

print("🏮 The Temple of Neural Networks welcomes you, Grasshopper!")
print(f"PyTorch version: {torch.__version__}")
print("🐱 Suki stirs from her afternoon nap, sensing the approach of learning...")

🏮 The Temple of Neural Networks welcomes you, Grasshopper!
PyTorch version: 2.6.0+cu124
🐱 Suki stirs from her afternoon nap, sensing the approach of learning...


## 🐱 THE SACRED DATA GENERATION SCROLL

*Master Pai-Torch gestures toward the feeding bowl*

"Before you can understand the cat, you must first understand the data."

In [None]:
def generate_cat_feeding_data(n_observations: int = 100, chaos_level: float = 0.1) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Generate observations of Suki's feeding patterns.

    Ancient wisdom suggests: hunger_level = 2.5 * hours_since_last_meal + 20
    When hunger_level > 70, Suki appears at the food bowl.

    Args:
        n_observations: Number of Suki sightings to simulate
        chaos_level: Amount of feline unpredictability (0.0 = perfectly predictable cat, 1.0 = pure chaos)
        sacred_seed: Ensures consistent randomness

    Returns:
        Tuple of (hours_since_last_meal, hunger_level) as sacred tensors
    """
    # Suki can go 0-30 hours between meals (she's very dramatic)
    hours_since_meal = torch.rand(n_observations, 1) * 30

    # The sacred relationship known to ancient cat scholars
    base_hunger = 20
    hunger_per_hour = 2.5

    hunger_levels = hunger_per_hour * hours_since_meal.squeeze() + base_hunger

    # Add feline chaos (cats are unpredictable creatures)
    chaos = torch.randn(n_observations) * chaos_level * hunger_levels.std()
    hunger_levels = hunger_levels + chaos

    # Even mystical cats have limits
    hunger_levels = torch.clamp(hunger_levels, 0, 100)

    return hours_since_meal, hunger_levels.unsqueeze(1)

def visualize_cat_wisdom(hours: torch.Tensor, hunger: torch.Tensor, predictions: torch.Tensor = None):
    """Display the sacred patterns of Suki's appetite."""
    plt.figure(figsize=(12, 7))
    plt.scatter(hours.numpy(), hunger.numpy(), alpha=0.6, color='purple',
                label='Suki\'s Actual Hunger Levels')

    if predictions is not None:
        sorted_indices = torch.argsort(hours.squeeze())
        sorted_hours = hours[sorted_indices]
        sorted_predictions = predictions[sorted_indices]
        plt.plot(sorted_hours.numpy(), sorted_predictions.detach().numpy(),
                'gold', linewidth=3, label='Your Mystical Predictions')

    plt.axhline(y=FEEDING_THRESHOLD, color='red', linestyle='--', alpha=0.7,
                label='Sacred Feeding Threshold (Suki Appears!)')
    plt.xlabel('Hours Since Last Meal (feature)')
    plt.ylabel('Suki\'s Hunger Level (target)')
    plt.title('The Mysteries of Temple Cat Appetite')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.ylim(0, 100)
    plt.show()

# Generate the sacred data
hours_since_meal, hunger_levels = generate_cat_feeding_data(n_observations=100)

print(f"📊 Generated {len(hours_since_meal)} observations of Suki's feeding patterns")
print(f"⏰ Hours since meal range: {hours_since_meal.min():.1f} to {hours_since_meal.max():.1f}")
print(f"🍽️ Hunger levels range: {hunger_levels.min():.1f} to {hunger_levels.max():.1f}")

# Visualize the sacred patterns
visualize_cat_wisdom(hours_since_meal, hunger_levels)

## 💃 FIRST MOVEMENTS: THE NEURAL NETWORK FOUNDATION

*Master Pai-Torch nods approvingly*

"Now that you have witnessed the sacred data, it is time to craft your first neural network. Though simple in form, this linear layer contains the essence of all deeper mysteries. Complete the missing sacred techniques below."

In [None]:
class CatHungerPredictor(nn.Module):
    """A mystical artifact for understanding feline appetite patterns."""

    def __init__(self, input_features: int = 1):
        super(CatHungerPredictor, self).__init__()
        # TODO: Create the Linear layer
        # Hint: torch.nn.Linear transforms input energy into output wisdom
        # It needs input_features and output_features (how many predictions?)
        self.linear = None

    def forward(self, features: torch.Tensor) -> torch.Tensor:
        """Channel your understanding through the mystical network."""
        # TODO: Pass the input through your Linear layer
        # Remember: even cats follow mathematical laws
        return None

def train(model: nn.Module, features: torch.Tensor, target: torch.Tensor, epochs: int = 4_000) -> list:
    """
    Train the cat hunger prediction model.

    Returns:
        List of loss values during training
    """
    # TODO: Choose your loss calculation method
    # Hint: Mean Squared Error is favored by the ancient masters
    criterion = None

    # TODO: Choose your parameter updating method
    # Hint: SGD (Stochastic Gradient Descent) is the traditional path
    optimizer = None

    losses = []

    for epoch in range(epochs):
        # TODO: CRITICAL - Clear the gradient spirits from previous cycle
        # Hint: The spirits accumulate if not banished properly
        # This is the most common mistake in PyTorch training!

        # TODO: Forward pass - get predictions
        predictions = None

        # TODO: Compute the loss
        loss = None

        # TODO: Backward pass - compute gradients
        # Hint: Loss knows how to compute its own gradients

        # TODO: Update parameters
        # Hint: The optimizer knows how to update using the gradients

        losses.append(loss.item())

        # Report progress to Master Pai-Torch
        if (epoch + 1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
            if loss.item() < 10:
                print("💫 The Gradient Spirits smile upon your progress!")

    return losses

# Create your first neural network
model = CatHungerPredictor(input_features=1)
print("🧠 Your neural network has been born!")
print(f"Model structure: {model}")
print(f"Initial parameters: Weight={model.linear.weight.item():.3f}, Bias={model.linear.bias.item():.3f}")

## 🎯 UNLEASH THE TRAINING RITUAL

*Master Pai-Torch places a weathered hand on your shoulder*

"Now comes the sacred moment, grasshopper. Train your network with the feeding data and witness the emergence of wisdom from randomness."

In [None]:
# Begin the training ritual
print("🔥 Beginning the sacred training ritual...")
print("Master Pai-Torch whispers: 'Watch the loss decrease, young one. This is the dance of learning.'")

# TODO: Train your model using the function above
# Use: hours_since_meal, hunger_levels, and appropriate epochs/learning_rate
loss_history = None

# Visualize the training progress
plt.figure(figsize=(10, 6))
plt.plot(loss_history)
plt.title('The Sacred Dance of Loss Reduction')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True, alpha=0.3)
plt.show()

# Examine the learned parameters
learned_weight = model.linear.weight.item()
learned_bias = model.linear.bias.item()
print(f"\n🎊 Training Complete! 🎊")
print(f"Learned relationship: hunger = {learned_weight:.3f} × hours + {learned_bias:.3f}")
print(f"True relationship: hunger = 2.500 × hours + 20.000")
print(f"Weight accuracy: {abs(learned_weight - 2.5):.3f} away from true value")
print(f"Bias accuracy: {abs(learned_bias - 20):.3f} away from true value")

# Generate predictions and visualize
with torch.no_grad():
    predictions = model(hours_since_meal)

print("\n🔮 Visualizing your mystical predictions...")
visualize_cat_wisdom(hours_since_meal, hunger_levels, predictions)

## ⚡ THE TRIALS OF MASTERY

*Master Pai-Torch examines your work with ancient eyes*

"Your first steps show promise, grasshopper. But true mastery must be tested through sacred trials."

In [None]:
# TRIALS OF MASTERY
print("⚡ TRIAL 1: BASIC MASTERY")

# Check your progress
final_loss = loss_history[-1] if loss_history else float('inf')
weight_accuracy = abs(learned_weight - 2.5) < 0.5
bias_accuracy = abs(learned_bias - 20) < 5

# Check if loss decreases consistently (last loss < first loss by significant margin)
loss_decreases = len(loss_history) > 100 and loss_history[-1] < loss_history[99] * 0.9

# Check if predictions form a clean line (R² > 0.8)
with torch.no_grad():
    predictions = model(hours_since_meal)
    y_mean = hunger_levels.mean()
    ss_tot = ((hunger_levels - y_mean) ** 2).sum()
    ss_res = ((hunger_levels - predictions) ** 2).sum()
    r_squared = 1 - (ss_res / ss_tot)
    clean_line = r_squared > 0.8

# Trial 1 checkboxes
loss_check = "✅" if loss_decreases else "❌"
weight_bias_check = "✅" if (weight_accuracy and bias_accuracy) else "❌"
line_check = "✅" if clean_line else "❌"

print(f"- {loss_check} Loss decreases consistently (no angry Gradient Spirits)")
print(f"- {weight_bias_check} Model weight approximately 2.5 (±0.5), bias around 20 (±5)")
print(f"- {line_check} Predictions form a clean line through the scattered data")

# Trial 2: Understanding Test
print("\n⚡ TRIAL 2: UNDERSTANDING TEST")

# Test prediction shapes
test_features = torch.tensor([[5.0], [10.0], [20.0]])
with torch.no_grad():
    test_predictions = model(test_features)

shapes_correct = test_predictions.shape == (3, 1)
weight_reasonable = 2.0 <= learned_weight <= 3.0
bias_reasonable = 15 <= learned_bias <= 25

# Test prediction reasonableness
test_pred_values = test_predictions.squeeze().tolist()
expected_approx = [2.5 * 5 + 20, 2.5 * 10 + 20, 2.5 * 20 + 20]  # [32.5, 45, 70]
predictions_reasonable = all(abs(pred - exp) <= 10 for pred, exp in zip(test_pred_values, expected_approx))

# Trial 2 checkboxes
shapes_check = "✅" if shapes_correct else "❌"
weight_param_check = "✅" if weight_reasonable else "❌"
bias_param_check = "✅" if bias_reasonable else "❌"
pred_check = "✅" if predictions_reasonable else "❌"

print(f"- {shapes_check} Tensor shapes align with the sacred geometry")
print(f"- {weight_param_check} Weight parameter reflects feline wisdom")
print(f"- {bias_param_check} Bias parameter captures base hunger levels")
print(f"- {pred_check} Predictions are reasonable for test inputs")

# Your Performance section
print(f"\n📊 Your Performance:")
print(f"- Weight accuracy: {learned_weight:.3f} {'(PASS)' if weight_accuracy else '(FAIL)'}")
print(f"- Bias accuracy: {learned_bias:.3f} {'(PASS)' if bias_accuracy else '(FAIL)'}")

# Overall success check
trial1_passed = loss_decreases and weight_accuracy and bias_accuracy and clean_line
trial2_passed = shapes_correct and weight_reasonable and bias_reasonable and predictions_reasonable

if trial1_passed and trial2_passed:
    print("\n🎉 Master Pai-Torch nods with approval - your understanding grows!")
    print("\n🏆 Congratulations! You have passed the basic trials of the Temple Sweeper!")
    print("🐱 Suki purrs approvingly - your neural network has learned her sacred patterns.")
else:
    print("\n🤔 The path to mastery requires more practice. Consider adjusting your training parameters.")
    print("💡 Hint: Try different learning rates, more epochs, or review your code for errors.")

## 🌸 THE FOUR PATHS OF MASTERY: PROGRESSIVE EXTENSIONS

*Master Pai-Torch gestures toward four different pathways leading deeper into the temple*

"You have learned the fundamental way, grasshopper. But mastery comes through exploring the branching paths. Each extension will teach you new aspects of the neural arts while building upon your foundation."

### Extension 1: Cook Oh-Pai-Timizer's Portion Control
*"A good cook knows that batch size affects the final dish!"*

*Cook Oh-Pai-Timizer bustles over, wooden spoon in hand*

"Ah, grasshopper! I see you've mastered feeding one cat at a time. But what happens when you need to predict hunger for multiple cats simultaneously? In my kitchen, efficiency comes from preparing multiple servings at once! Your neural network can learn the same wisdom - processing many observations together."

**NEW CONCEPTS**: Batch processing, tensor shapes, vectorized operations  
**DIFFICULTY**: +15% (still Dan 1, but with batches)

In [None]:
def generate_multi_cat_data(n_cats: int = 5, observations_per_cat: int = 50,
                           chaos_level: float = 0.15):
    """
    Generate feeding data for multiple temple cats at once.

    Returns:
        Tuple of (batch_hours, batch_hunger_levels)
        Shape: (n_cats * observations_per_cat, 1) for both tensors
    """
    # TODO: Create batched data that your model can process all at once
    # Hint: Use torch.cat() to combine data from multiple cats
    # Hint: Each cat follows the same hunger pattern but with different chaos

    all_hours = []
    all_hunger = []

    for cat_id in range(n_cats):
        # Generate data for this cat (slightly different chaos per cat)
        cat_hours, cat_hunger = generate_cat_feeding_data(
            n_observations=observations_per_cat,
            chaos_level=chaos_level * (1 + cat_id * 0.1),  # Each cat is slightly more chaotic
            sacred_seed=42 + cat_id
        )
        all_hours.append(cat_hours)
        all_hunger.append(cat_hunger)

    # TODO: Combine all cat data into batched tensors
    batch_hours = torch.cat(all_hours, dim=0)
    batch_hunger = torch.cat(all_hunger, dim=0)

    return batch_hours, batch_hunger

# Generate multi-cat data
multi_hours, multi_hunger = generate_multi_cat_data(n_cats=5, observations_per_cat=40)

print(f"🐱🐱🐱 Generated data for multiple cats:")
print(f"   Total observations: {len(multi_hours)}")
print(f"   Hours tensor shape: {multi_hours.shape}")
print(f"   Hunger tensor shape: {multi_hunger.shape}")

# TRIAL: Feed batched data to your existing model
print("\n🍽️ Testing batch processing with your trained model...")
with torch.no_grad():
    batch_predictions = model(multi_hours)

print(f"✅ Your model successfully processed {len(multi_hours)} observations at once!")
print(f"   Predictions shape: {batch_predictions.shape}")

# Visualize the multi-cat results
visualize_cat_wisdom(multi_hours, multi_hunger, batch_predictions)

# SUCCESS: Model processes multiple cats simultaneously, same accuracy
batch_loss = nn.MSELoss()(batch_predictions, multi_hunger)
print(f"\n📊 Batch processing loss: {batch_loss.item():.2f}")
print("🎉 Cook Oh-Pai-Timizer beams: 'Efficiency and accuracy - the hallmarks of a master chef!'")

### Extension 2: He-Ao-World's Measurement Mix-up
*"These old eyes sometimes read the measuring scrolls incorrectly..."*

*He-Ao-World shuffles over, looking apologetic*

"Oh dear! I was recording Suki's feeding times and... well, I might have mixed up some of the measurements. Some are in minutes instead of hours, and others might be twice what they should be. The data looks a bit... chaotic now. Perhaps you could help clean it up? I've heard that normalizing data can make neural networks much happier!"

**NEW CONCEPTS**: Data normalization, feature scaling, handling inconsistent units  
**DIFFICULTY**: +25% (still Dan 1, but messier data)

In [None]:
def create_messy_data(clean_hours: torch.Tensor, clean_hunger: torch.Tensor):
    """
    Simulate He-Ao-World's measurement accidents.

    Returns:
        Tuple of (messy_hours, messy_hunger) with inconsistent scales
    """
    messy_hours = clean_hours.clone()
    messy_hunger = clean_hunger.clone()

    # 30% of hours accidentally recorded in minutes instead of hours
    minute_mask = torch.rand(len(messy_hours)) < 0.3
    messy_hours[minute_mask] = messy_hours[minute_mask] * 60

    # 20% of hunger levels accidentally doubled
    double_mask = torch.rand(len(messy_hunger)) < 0.2
    messy_hunger[double_mask] = messy_hunger[double_mask] * 2

    # 10% of both measurements have random scaling errors
    error_mask = torch.rand(len(messy_hours)) < 0.1
    random_scales = torch.rand(error_mask.sum()) * 3 + 0.5  # Random scale 0.5-3.5
    messy_hours[error_mask] = messy_hours[error_mask] * random_scales.unsqueeze(1)

    return messy_hours, messy_hunger

def normalize_feeding_data(hours_since_meal: torch.Tensor, hunger_levels: torch.Tensor):
    """
    Clean and normalize the feeding data to handle measurement inconsistencies.

    Returns:
        Tuple of (normalized_hours, normalized_hunger, normalization_params)
    """
    # TODO: Implement data normalization
    # Hint: (data - mean) / std is a common normalization approach
    # Remember: Store the normalization parameters for later use!

    # Calculate normalization parameters
    hours_mean = hours_since_meal.mean()
    hours_std = hours_since_meal.std()
    hunger_mean = hunger_levels.mean()
    hunger_std = hunger_levels.std()

    # Normalize the data
    normalized_hours = (hours_since_meal - hours_mean) / (hours_std + 1e-8)  # Small epsilon to avoid division by zero
    normalized_hunger = (hunger_levels - hunger_mean) / (hunger_std + 1e-8)

    # Store parameters for later denormalization
    normalization_params = {
        'hours_mean': hours_mean,
        'hours_std': hours_std,
        'hunger_mean': hunger_mean,
        'hunger_std': hunger_std
    }

    return normalized_hours, normalized_hunger, normalization_params

# Create messy data
messy_hours, messy_hunger = create_messy_data(hours_since_meal, hunger_levels)

print("🌪️ He-Ao-World's measurement chaos:")
print(f"   Original hours range: {hours_since_meal.min():.1f} to {hours_since_meal.max():.1f}")
print(f"   Messy hours range: {messy_hours.min():.1f} to {messy_hours.max():.1f}")
print(f"   Original hunger range: {hunger_levels.min():.1f} to {hunger_levels.max():.1f}")
print(f"   Messy hunger range: {messy_hunger.min():.1f} to {messy_hunger.max():.1f}")

# Normalize the messy data
norm_hours, norm_hunger, norm_params = normalize_feeding_data(messy_hours, messy_hunger)

print("\n🧹 After normalization:")
print(f"   Normalized hours: mean={norm_hours.mean():.3f}, std={norm_hours.std():.3f}")
print(f"   Normalized hunger: mean={norm_hunger.mean():.3f}, std={norm_hunger.std():.3f}")

# Train a new model on normalized data
print("\n🎯 Training on normalized data...")
normalized_model = CatHungerPredictor(input_features=1)
normalized_losses = train_cat_predictor(normalized_model, norm_hours, norm_hunger,
                                      epochs=800, learning_rate=0.05)

# Compare training curves
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(loss_history, label='Original Data')
plt.plot(normalized_losses, label='Normalized Data')
plt.title('Training Loss Comparison')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
with torch.no_grad():
    norm_predictions = normalized_model(norm_hours)
    # Denormalize predictions for visualization
    denorm_predictions = norm_predictions * norm_params['hunger_std'] + norm_params['hunger_mean']
    denorm_hours = norm_hours * norm_params['hours_std'] + norm_params['hours_mean']

plt.scatter(denorm_hours.numpy(), messy_hunger.numpy(), alpha=0.5, label='Messy Data')
sorted_indices = torch.argsort(denorm_hours.squeeze())
sorted_hours = denorm_hours[sorted_indices]
sorted_predictions = denorm_predictions[sorted_indices]
plt.plot(sorted_hours.numpy(), sorted_predictions.numpy(), 'r-', linewidth=2, label='Normalized Model')
plt.title('Normalized Model Performance')
plt.xlabel('Hours Since Meal')
plt.ylabel('Hunger Level')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n📊 Final normalized loss: {normalized_losses[-1]:.4f}")
print("🎉 He-Ao-World bows gratefully: 'Your wisdom has turned my chaos into clarity!'")

### Extension 3: Master Pai-Torch's Patience Teaching
*"The eager student trains too quickly and learns too little."*

*Master Pai-Torch sits in contemplative silence*

"Young grasshopper, I observe your training ritual rushes like a mountain stream. But wisdom comes to those who vary their pace. Sometimes we must step boldly, sometimes cautiously, sometimes we must rest entirely. Learn the arts of patience - early stopping when progress ceases, and learning rate scheduling to train with growing wisdom."

**NEW CONCEPTS**: Learning rate scheduling, early stopping, training patience  
**DIFFICULTY**: +35% (still Dan 1, but smarter training)

In [None]:
def patient_training_ritual(model: nn.Module, features: torch.Tensor, target: torch.Tensor,
                          epochs: int = 2000, initial_lr: float = 0.1, patience: int = 100):
    """
    Train with patience and adaptive learning rate.

    Args:
        patience: Stop training if loss doesn't improve for this many epochs
        initial_lr: Starting learning rate

    Returns:
        Tuple of (trained_model, loss_history, stopped_early, learning_rates)
    """
    criterion = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=initial_lr)

    # TODO: Implement patient training with learning rate decay
    # Hint: Start with lr=0.1, reduce by half every 500 epochs
    # Hint: Keep track of best loss and stop if no improvement

    losses = []
    learning_rates = []
    best_loss = float('inf')
    patience_counter = 0
    stopped_early = False

    for epoch in range(epochs):
        # Learning rate scheduling: reduce by half every 500 epochs
        if epoch > 0 and epoch % 500 == 0:
            current_lr = optimizer.param_groups[0]['lr']
            new_lr = current_lr * 0.5
            optimizer.param_groups[0]['lr'] = new_lr
            print(f"📉 Epoch {epoch}: Reducing learning rate to {new_lr:.6f}")

        # Training step
        optimizer.zero_grad()
        predictions = model(features)
        loss = criterion(predictions, target)
        loss.backward()
        optimizer.step()

        current_loss = loss.item()
        losses.append(current_loss)
        learning_rates.append(optimizer.param_groups[0]['lr'])

        # Early stopping logic
        if current_loss < best_loss - 1e-6:  # Small threshold for improvement
            best_loss = current_loss
            patience_counter = 0
        else:
            patience_counter += 1

        # Check if we should stop early
        if patience_counter >= patience:
            print(f"🛑 Early stopping at epoch {epoch+1}: No improvement for {patience} epochs")
            stopped_early = True
            break

        # Progress reporting
        if (epoch + 1) % 200 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {current_loss:.4f}, LR: {optimizer.param_groups[0]["lr"]:.6f}')

    return model, losses, stopped_early, learning_rates

# Compare patient training vs. rushed training
print("🏃 First, let's see rushed training (high learning rate, no patience):")
rushed_model = CatHungerPredictor(input_features=1)
rushed_losses = train_cat_predictor(rushed_model, hours_since_meal, hunger_levels,
                                  epochs=1000, learning_rate=0.1)

print("\n🧘 Now, let's try patient training:")
patient_model = CatHungerPredictor(input_features=1)
patient_model, patient_losses, stopped_early, learning_rates = patient_training_ritual(
    patient_model, hours_since_meal, hunger_levels,
    epochs=2000, initial_lr=0.1, patience=150
)

# Visualize the comparison
plt.figure(figsize=(15, 5))

# Loss comparison
plt.subplot(1, 3, 1)
plt.plot(rushed_losses, label='Rushed Training', alpha=0.7)
plt.plot(patient_losses, label='Patient Training', alpha=0.7)
plt.title('Loss Comparison')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

# Learning rate schedule
plt.subplot(1, 3, 2)
plt.plot(learning_rates)
plt.title('Learning Rate Schedule')
plt.xlabel('Epoch')
plt.ylabel('Learning Rate')
plt.yscale('log')
plt.grid(True, alpha=0.3)

# Final predictions comparison
plt.subplot(1, 3, 3)
with torch.no_grad():
    rushed_pred = rushed_model(hours_since_meal)
    patient_pred = patient_model(hours_since_meal)

plt.scatter(hours_since_meal.numpy(), hunger_levels.numpy(), alpha=0.5, label='Data')
sorted_indices = torch.argsort(hours_since_meal.squeeze())
sorted_hours = hours_since_meal[sorted_indices]
plt.plot(sorted_hours.numpy(), rushed_pred[sorted_indices].numpy(), 'r--', label='Rushed', linewidth=2)
plt.plot(sorted_hours.numpy(), patient_pred[sorted_indices].numpy(), 'g-', label='Patient', linewidth=2)
plt.title('Final Predictions Comparison')
plt.xlabel('Hours Since Meal')
plt.ylabel('Hunger Level')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Results summary
print(f"\n📊 TRAINING COMPARISON:")
print(f"   Rushed final loss: {rushed_losses[-1]:.4f}")
print(f"   Patient final loss: {patient_losses[-1]:.4f}")
print(f"   Patient training epochs: {len(patient_losses)}")
print(f"   Stopped early: {stopped_early}")
print(f"   Improvement: {((rushed_losses[-1] - patient_losses[-1]) / rushed_losses[-1] * 100):.1f}%")

print("\n🎉 Master Pai-Torch smiles: 'The patient student achieves deeper understanding with less effort.'")

### Extension 4: He-Ao-World's Mysterious Incidents
*"Some days, the most unexpected things happen in the temple..."*

*He-Ao-World approaches nervously, wringing hands*

"Oh dear, grasshopper... I must confess something troubling. Some days, the most unusual incidents occur during my cleaning rounds. A loud crash from the kitchen sends Suki running to hide, despite being hungry. Or a visiting monk drops treats, making her appear even when she's not particularly hungry. These... outlying events... they don't follow the normal pattern, and I fear they may confuse your linear wisdom."

**NEW CONCEPTS**: Outliers, robust training, linear regression limitations  
**DIFFICULTY**: +45% (still Dan 1, but revealing model weaknesses)

In [None]:
def generate_data_with_outliers(n_observations: int = 200, outlier_fraction: float = 0.1,
                               chaos_level: float = 0.1, sacred_seed: int = 42):
    """
    Generate feeding data with mysterious outlying incidents.

    Args:
        n_observations: Total number of observations
        outlier_fraction: Fraction of data that will be outliers (0.0 to 1.0)
        chaos_level: Normal noise level for regular data
        sacred_seed: For reproducible randomness

    Returns:
        Tuple of (hours, hunger_levels, is_outlier_mask)
    """
    torch.manual_seed(sacred_seed)

    # Generate normal data first
    hours, hunger = generate_cat_feeding_data(n_observations, chaos_level, sacred_seed)

    # Determine which points will be outliers
    n_outliers = int(n_observations * outlier_fraction)
    outlier_indices = torch.randperm(n_observations)[:n_outliers]
    is_outlier = torch.zeros(n_observations, dtype=torch.bool)
    is_outlier[outlier_indices] = True

    # Create outliers: dramatic deviations from normal pattern
    for i in outlier_indices:
        incident_type = torch.randint(0, 3, (1,)).item()

        if incident_type == 0:
            # Kitchen crash: Suki hides despite hunger
            # High hunger but behavior like low hunger
            hunger[i] = torch.rand(1) * 30 + 10  # Force low hunger reading

        elif incident_type == 1:
            # Monk drops treats: Suki appears despite not being hungry
            # Low hunger but behavior like high hunger
            hunger[i] = torch.rand(1) * 30 + 70  # Force high hunger reading

        else:
            # Temple bell rings: Completely random behavior
            # Completely random hunger level
            hunger[i] = torch.rand(1) * 100

    return hours, hunger, is_outlier

def visualize_outlier_impact(hours: torch.Tensor, hunger: torch.Tensor, is_outlier: torch.Tensor,
                           clean_model: nn.Module, outlier_model: nn.Module):
    """
    Visualize how outliers affect linear regression predictions.
    """
    plt.figure(figsize=(15, 10))

    # Plot 1: Data with outliers highlighted
    plt.subplot(2, 2, 1)
    normal_mask = ~is_outlier
    plt.scatter(hours[normal_mask].numpy(), hunger[normal_mask].numpy(),
               alpha=0.6, color='purple', label='Normal Behavior', s=50)
    plt.scatter(hours[is_outlier].numpy(), hunger[is_outlier].numpy(),
               alpha=0.8, color='red', label='Mysterious Incidents', s=100, marker='x')
    plt.xlabel('Hours Since Last Meal')
    plt.ylabel('Hunger Level')
    plt.title('Temple Data: Normal vs Outlying Incidents')
    plt.legend()
    plt.grid(True, alpha=0.3)

    # Plot 2: Model comparisons
    plt.subplot(2, 2, 2)
    with torch.no_grad():
        clean_pred = clean_model(hours)
        outlier_pred = outlier_model(hours)

    plt.scatter(hours[normal_mask].numpy(), hunger[normal_mask].numpy(),
               alpha=0.4, color='purple', label='Normal Data')
    plt.scatter(hours[is_outlier].numpy(), hunger[is_outlier].numpy(),
               alpha=0.8, color='red', label='Outliers', s=100, marker='x')

    # Plot prediction lines
    sorted_indices = torch.argsort(hours.squeeze())
    sorted_hours = hours[sorted_indices]
    plt.plot(sorted_hours.numpy(), clean_pred[sorted_indices].numpy(),
             'g-', linewidth=3, label='Clean Data Model')
    plt.plot(sorted_hours.numpy(), outlier_pred[sorted_indices].numpy(),
             'r--', linewidth=3, label='Outlier-Affected Model')

    plt.xlabel('Hours Since Last Meal')
    plt.ylabel('Hunger Level')
    plt.title('Model Predictions: Clean vs Outlier-Affected')
    plt.legend()
    plt.grid(True, alpha=0.3)

    # Plot 3: Residual analysis
    plt.subplot(2, 2, 3)
    clean_residuals = (hunger - clean_pred.squeeze()).numpy()
    outlier_residuals = (hunger - outlier_pred.squeeze()).numpy()

    plt.scatter(hours[normal_mask].numpy(), clean_residuals[normal_mask],
               alpha=0.6, color='green', label='Clean Model Residuals')
    plt.scatter(hours[is_outlier].numpy(), clean_residuals[is_outlier],
               alpha=0.8, color='red', label='Outlier Residuals', s=100, marker='x')
    plt.axhline(y=0, color='black', linestyle='-', alpha=0.5)
    plt.xlabel('Hours Since Last Meal')
    plt.ylabel('Residual (Actual - Predicted)')
    plt.title('Residual Analysis: Clean Model')
    plt.legend()
    plt.grid(True, alpha=0.3)

    # Plot 4: Parameter comparison
    plt.subplot(2, 2, 4)
    models = ['Clean Data', 'With Outliers']
    weights = [clean_model.linear.weight.item(), outlier_model.linear.weight.item()]
    biases = [clean_model.linear.bias.item(), outlier_model.linear.bias.item()]

    x = range(len(models))
    width = 0.35
    plt.bar([i - width/2 for i in x], weights, width, label='Weight (slope)', alpha=0.7)
    plt.bar([i + width/2 for i in x], [b/10 for b in biases], width, label='Bias/10 (intercept)', alpha=0.7)
    plt.axhline(y=2.5, color='red', linestyle='--', alpha=0.7, label='True Weight=2.5')
    plt.axhline(y=2.0, color='orange', linestyle='--', alpha=0.7, label='True Bias/10=2.0')
    plt.xlabel('Model Type')
    plt.ylabel('Parameter Value')
    plt.title('Parameter Comparison')
    plt.xticks(x, models)
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

def analyze_outlier_impact(clean_model: nn.Module, outlier_model: nn.Module,
                          clean_hours: torch.Tensor, clean_hunger: torch.Tensor,
                          outlier_hours: torch.Tensor, outlier_hunger: torch.Tensor,
                          is_outlier: torch.Tensor):
    """
    Quantify how outliers affect model performance and parameters.
    """
    print(f"\\n📊 OUTLIER IMPACT ANALYSIS:")
    print(f"{'='*50}")

    # Parameter comparison
    clean_weight = clean_model.linear.weight.item()
    clean_bias = clean_model.linear.bias.item()
    outlier_weight = outlier_model.linear.weight.item()
    outlier_bias = outlier_model.linear.bias.item()

    print(f"\\n🎯 PARAMETER COMPARISON:")
    print(f"   True relationship: hunger = 2.5 × hours + 20")
    print(f"   Clean model:      hunger = {clean_weight:.3f} × hours + {clean_bias:.3f}")
    print(f"   Outlier model:    hunger = {outlier_weight:.3f} × hours + {outlier_bias:.3f}")
    print(f"   Weight shift:     {abs(outlier_weight - clean_weight):.3f}")
    print(f"   Bias shift:       {abs(outlier_bias - clean_bias):.3f}")

    # Performance on clean data
    with torch.no_grad():
        clean_pred_clean = clean_model(clean_hours)
        outlier_pred_clean = outlier_model(clean_hours)

        clean_mse = nn.MSELoss()(clean_pred_clean, clean_hunger).item()
        outlier_mse = nn.MSELoss()(outlier_pred_clean, clean_hunger).item()

    print(f"\\n🎯 PERFORMANCE ON CLEAN DATA:")
    print(f"   Clean model MSE:   {clean_mse:.2f}")
    print(f"   Outlier model MSE: {outlier_mse:.2f}")
    print(f"   Performance drop:  {((outlier_mse - clean_mse) / clean_mse * 100):.1f}%")

    # Outlier influence
    normal_mask = ~is_outlier
    outlier_fraction = is_outlier.sum().item() / len(is_outlier)

    print(f"\\n🚨 OUTLIER INFLUENCE:")
    print(f"   Outlier fraction: {outlier_fraction:.1%}")
    print(f"   Normal points: {normal_mask.sum().item()}")
    print(f"   Outlier points: {is_outlier.sum().item()}")

    # Residual analysis
    with torch.no_grad():
        clean_residuals = (clean_hunger - clean_model(clean_hours)).abs()
        outlier_residuals = (outlier_hunger - outlier_model(outlier_hours)).abs()

        normal_residuals = outlier_residuals[normal_mask]
        outlier_point_residuals = outlier_residuals[is_outlier]

    print(f"\\n📈 RESIDUAL ANALYSIS:")
    print(f"   Normal points avg residual: {normal_residuals.mean().item():.2f}")
    print(f"   Outlier points avg residual: {outlier_point_residuals.mean().item():.2f}")
    print(f"   Outlier residuals {(outlier_point_residuals.mean() / normal_residuals.mean()).item():.1f}x larger")

# Generate clean data for comparison
print("🧘 First, let's establish a baseline with clean data:")
clean_hours, clean_hunger = generate_cat_feeding_data(n_observations=200, chaos_level=0.1)

# Train model on clean data
clean_model = CatHungerPredictor(input_features=1)
clean_losses = train_cat_predictor(clean_model, clean_hours, clean_hunger,
                                 epochs=1000, learning_rate=0.01)

print("\\n🌪️ Now, let's see what happens with He-Ao-World's mysterious incidents:")

# Generate data with different outlier levels
outlier_levels = [0.05, 0.1, 0.2]  # 5%, 10%, 20% outliers
outlier_models = []
outlier_data = []

for outlier_frac in outlier_levels:
    print(f"\\n🎭 Testing with {outlier_frac:.0%} outliers...")

    # Generate outlier data
    outlier_hours, outlier_hunger, is_outlier = generate_data_with_outliers(
        n_observations=200, outlier_fraction=outlier_frac, chaos_level=0.1
    )

    # Train model on outlier data
    outlier_model = CatHungerPredictor(input_features=1)
    outlier_losses = train_cat_predictor(outlier_model, outlier_hours, outlier_hunger,
                                       epochs=1000, learning_rate=0.01)

    outlier_models.append(outlier_model)
    outlier_data.append((outlier_hours, outlier_hunger, is_outlier))

    print(f"   Outlier incidents: {is_outlier.sum().item()}")
    print(f"   Final training loss: {outlier_losses[-1]:.2f}")

# Detailed analysis for 10% outliers
print("\\n🔍 DETAILED ANALYSIS: 10% Outliers")
selected_model = outlier_models[1]  # 10% outliers
selected_hours, selected_hunger, selected_outliers = outlier_data[1]

# Visualize the impact
visualize_outlier_impact(selected_hours, selected_hunger, selected_outliers,
                        clean_model, selected_model)

# Quantitative analysis
analyze_outlier_impact(clean_model, selected_model,
                      clean_hours, clean_hunger,
                      selected_hours, selected_hunger, selected_outliers)

print("\\n🧠 PHILOSOPHICAL REFLECTION:")
print("He-Ao-World sighs: 'I see now that my clumsiness reveals a deeper truth...'")
print("\\n💡 KEY INSIGHTS:")
print("   • Linear regression is sensitive to outliers - they can drastically shift the line")
print("   • A few extreme points can dominate many normal observations")
print("   • Real-world data always contains unexpected events")
print("   • Simple models struggle with data that doesn't follow their assumptions")
print("\\n🎭 MASTER'S WISDOM:")
print("Master Pai-Torch nods gravely: 'The straight path works well in a perfect world,")
print("but life is full of unexpected turns. This limitation will guide you toward")
print("more robust methods in your future training.'")

# SUCCESS CRITERIA
print("\\n⚡ MASTERY TRIAL:")
print("🎯 Can you identify which incidents are outliers by looking at the residuals?")
print("🎯 Do you understand why the outlier model performs worse on clean data?")
print("🎯 Can you explain why linear regression is fundamentally limited by outliers?")

print("\\n✨ Understanding these limitations is the first step toward mastery!")
print("🚀 In future dans, you'll learn robust techniques that handle outliers better.")

## 🔥 CORRECTING YOUR FORM: A STANCE IMBALANCE

*Master Pai-Torch observes your training ritual with ancient eyes*

"Your eager mind races ahead of your disciplined form, grasshopper. See how your gradient flow stance wavers? A previous disciple left this flawed training ritual behind. The form has become unsteady - can you restore proper technique?"

*The master gestures toward a scroll containing corrupted training code*

"Study this broken ritual carefully. The Gradient Spirits grow angry when not properly dismissed, and the loss may never converge. Your task is to identify the flaw and restore harmony to the training process."

In [None]:
def unsteady_training(model, features, target, epochs=1000):
    """This training stance has lost its balance - your form needs correction! 🥋"""
    criterion = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.01)

    losses = []

    for epoch in range(epochs):
        # Forward pass
        predictions = model(features)
        loss = criterion(predictions, target)

        # Backward pass
        loss.backward()
        optimizer.step()

        losses.append(loss.item())

        if epoch % 100 == 0:
            print(f'Epoch {epoch}: Loss = {loss.item():.4f}')

    return model, losses

# Test the unsteady training
print("🥋 Testing the unsteady training ritual...")
print("Master Pai-Torch warns: 'Watch carefully - something is amiss with this technique!'")

broken_model = CatHungerPredictor(input_features=1)
broken_model, broken_losses = unsteady_training(broken_model, hours_since_meal, hunger_levels, epochs=500)

# Compare with proper training
print("\n🧘 Now witness the proper training stance:")
proper_model = CatHungerPredictor(input_features=1)
proper_losses = train_cat_predictor(proper_model, hours_since_meal, hunger_levels, epochs=500)

# Visualize the difference
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.plot(broken_losses, label='Unsteady Training', color='red', alpha=0.7)
plt.plot(proper_losses, label='Proper Training', color='green', alpha=0.7)
plt.title('Training Loss Comparison')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)
plt.yscale('log')

plt.subplot(1, 3, 2)
plt.plot(broken_losses[-100:], label='Unsteady (last 100 epochs)', color='red')
plt.plot(proper_losses[-100:], label='Proper (last 100 epochs)', color='green')
plt.title('Final Training Behavior')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 3, 3)
with torch.no_grad():
    broken_pred = broken_model(hours_since_meal)
    proper_pred = proper_model(hours_since_meal)

plt.scatter(hours_since_meal.numpy(), hunger_levels.numpy(), alpha=0.5, label='True Data')
sorted_indices = torch.argsort(hours_since_meal.squeeze())
sorted_hours = hours_since_meal[sorted_indices]
plt.plot(sorted_hours.numpy(), broken_pred[sorted_indices].numpy(), 'r--',
         label='Unsteady Predictions', linewidth=2)
plt.plot(sorted_hours.numpy(), proper_pred[sorted_indices].numpy(), 'g-',
         label='Proper Predictions', linewidth=2)
plt.title('Final Predictions')
plt.xlabel('Hours Since Meal')
plt.ylabel('Hunger Level')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n📊 DEBUGGING RESULTS:")
print(f"   Unsteady final loss: {broken_losses[-1]:.4f}")
print(f"   Proper final loss: {proper_losses[-1]:.4f}")
print(f"   Loss difference: {broken_losses[-1] - proper_losses[-1]:.4f}")

# DEBUGGING CHALLENGE: Can you spot the critical error?
print("\n🔍 DEBUGGING CHALLENGE:")
print("❓ Can you identify the critical error in the unsteady_training function?")
print("💡 HINT: The Gradient Spirits are not being properly dismissed between cycles")
print("🧠 MASTER'S WISDOM: 'The undisciplined mind accumulates old thoughts,'")
print("   'just as the untrained gradient accumulates old directions.'")
print("\n🎯 SOLUTION: The missing line is optimizer.zero_grad() before loss.backward()")
print("   Without this, gradients accumulate from previous iterations, causing instability!")
print("\n🎉 Understanding this fundamental mistake is your first step toward PyTorch mastery!")

## 🏆 COMPLETION CEREMONY

*Master Pai-Torch rises and bows respectfully*

"Congratulations, young grasshopper. You have successfully completed your first kata in the Temple of Neural Networks. Through Suki's simple feeding patterns, you have learned the fundamental mysteries that underlie all neural arts:

**Sacred Knowledge Acquired:**
- **Tensor Mastery**: You can create and manipulate PyTorch tensors with confidence
- **Linear Wisdom**: You understand how neural networks transform input to output
- **Gradient Discipline**: You have mastered the sacred training loop and gradient management
- **Loss Understanding**: You can measure and minimize prediction errors
- **Convergence Patience**: You know how to train models with various strategies
- **Decision Boundaries**: You understand the difference between prediction and action

**The Path Forward:**
Your journey has just begun. The linear relationship you mastered here is the foundation for all deeper mysteries. As you progress to Dan 2, you will learn to handle more complex patterns, multiple layers, and sophisticated techniques.

**Final Wisdom:**
Remember always: every complex neural network, no matter how sophisticated, is built upon the simple principles you practiced here. The gradient flows, the loss decreases, and wisdom emerges from the dance between prediction and reality.

🐱 *Suki purrs approvingly from her perch, as if to say: "You are ready for greater challenges, young neural warrior."*

🏮 **May your gradients flow smoothly and your losses converge swiftly!** 🏮"