# **ADALINE: Code Walkthrough**
## *From Theory to Implementation - Understanding the Delta Rule in Practice*

---

### **📚 Learning Objectives**

By the end of this notebook, you will understand:

1. **Implementation Details**: How ADALINE is coded in PyTorch
2. **Delta Rule in Practice**: Seeing the algorithm step-by-step
3. **Training Process**: Understanding the complete learning workflow
4. **Comparison Code**: How our implementation differs from Perceptron
5. **Educational Insights**: What the code teaches us about continuous learning

---

### **🎯 Overview**

This notebook walks through our ADALINE implementation, explaining each component and how it implements the Delta Rule. We'll see how theory translates into working code.

**What We'll Cover**:
- Model architecture and initialization
- The Delta Rule training loop
- Comparison with Perceptron implementation
- Key design decisions and their implications


In [None]:
# Let's start by examining our ADALINE implementation
# First, let's look at the model architecture

import sys
from pathlib import Path
import torch
import torch.nn as nn

# Add the src directory to our path
sys.path.append('../src')

# Import our ADALINE implementation
from model import ADALINE
from config import ADALINEConfig

# Create a simple configuration for demonstration
config = ADALINEConfig(
    name="code_demo",
    description="Code walkthrough demonstration",
    input_size=2,
    output_size=1,
    learning_rate=0.01,
    epochs=10
)

print("🏗️ ADALINE Configuration:")
print(f"  - Input size: {config.input_size}")
print(f"  - Output size: {config.output_size}")  
print(f"  - Learning rate: {config.learning_rate}")
print(f"  - Architecture: Single linear layer (no activation)")
print(f"  - Learning rule: Delta Rule (continuous updates)")

# Create the model
model = ADALINE(config)
print(f"\n📊 Model Information:")
model_info = model.get_model_info()
for key, value in model_info.items():
    if key != 'config':
        print(f"  - {key}: {value}")

print(f"\n🔢 Model Parameters:")
print(f"  - Total parameters: {sum(p.numel() for p in model.parameters())}")
print(f"  - Weights: {model.linear.weight.shape}")
print(f"  - Bias: {model.linear.bias.shape}")
print(f"  - Initial weights: {model.linear.weight.data}")
print(f"  - Initial bias: {model.linear.bias.data}")


## **🏗️ Architecture Analysis**

### **Key Design Decisions**

Our ADALINE implementation makes several important design choices:

#### **1. Linear Layer Only**
```python
self.linear = nn.Linear(config.input_size, config.output_size, bias=True)
```
- **No activation function** - this is crucial for ADALINE
- Uses PyTorch's Linear layer for efficiency
- Includes bias term (essential for learning)

#### **2. Small Random Weight Initialization**
```python
def _initialize_weights(self):
    with torch.no_grad():
        self.linear.weight.normal_(0, 0.1)  # Small random weights
        self.linear.bias.zero_()            # Zero bias
```
- **Why small weights?** Large weights can cause unstable learning
- **Why zero bias?** Let the algorithm learn the appropriate bias

#### **3. Training History Tracking**
```python
self.training_history = {
    "loss": [],      # MSE at each epoch
    "mse": [],       # Same as loss (for clarity)
    "epochs_trained": 0
}
```
- Essential for visualizing learning progress
- Enables comparison with other algorithms


In [None]:
# Let's examine the Delta Rule implementation step by step
print("🔍 DELTA RULE IMPLEMENTATION WALKTHROUGH")
print("="*50)

# Create some sample data to trace through the algorithm
import torch
torch.manual_seed(42)  # For reproducible results

# Simple 2D data: 4 points
x_data = torch.tensor([
    [1.0, 1.0],   # Point 1
    [1.0, -1.0],  # Point 2  
    [-1.0, 1.0],  # Point 3
    [-1.0, -1.0]  # Point 4
], dtype=torch.float32)

# Target: AND logic (only positive when both inputs are positive)
y_target = torch.tensor([
    [1.0],   # 1 AND 1 = 1
    [0.0],   # 1 AND -1 = 0
    [0.0],   # -1 AND 1 = 0
    [0.0]    # -1 AND -1 = 0
], dtype=torch.float32)

print("📊 Training Data:")
for i in range(len(x_data)):
    print(f"  x{i+1}: {x_data[i].numpy()} → target: {y_target[i].item()}")

print(f"\n⚙️ Initial Model State:")
print(f"  - Weights: {model.linear.weight.data.numpy()}")
print(f"  - Bias: {model.linear.bias.data.item():.4f}")

# Let's trace through one training step manually
print(f"\n🔄 MANUAL DELTA RULE STEP:")
print("-" * 30)

# Take the first data point
x_sample = x_data[0:1]  # Keep batch dimension
y_sample = y_target[0:1]

print(f"Input: {x_sample.numpy().flatten()}")
print(f"Target: {y_sample.item()}")

# Forward pass
linear_output = model.forward(x_sample)
print(f"Linear output: {linear_output.item():.4f}")

# Calculate error
error = y_sample - linear_output
print(f"Error: {error.item():.4f}")

# Show the Delta Rule calculation (manually)
learning_rate = model.config.learning_rate
weight_delta = learning_rate * error * x_sample
bias_delta = learning_rate * error

print(f"\nDelta Rule Calculations:")
print(f"  - Learning rate: {learning_rate}")
print(f"  - Weight delta: η × error × input = {learning_rate} × {error.item():.4f} × {x_sample.numpy().flatten()}")
print(f"    = {weight_delta.numpy().flatten()}")
print(f"  - Bias delta: η × error = {learning_rate} × {error.item():.4f} = {bias_delta.item():.4f}")

print(f"\n📈 This demonstrates the CONTINUOUS nature of ADALINE:")
print(f"  - Error magnitude: {abs(error.item()):.4f} (not just 0 or 1)")
print(f"  - Proportional update: Larger errors → Larger weight changes")
print(f"  - Every sample updates weights (even if prediction is 'close')")
