# Linear Regression with Gradient Descent

This notebook demonstrates how gradient descent is used to train linear regression models.

## Contents
1. Linear Regression Theory
2. Implementation from Scratch
3. Training on Synthetic Data
4. Comparison: Batch vs SGD vs Mini-batch
5. Real Data Application
6. Interactive Demo

In [None]:
import sys
sys.path.append('../src')

import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, FloatSlider, IntSlider
import warnings
warnings.filterwarnings('ignore')

from linear_regression import (
    LinearRegressionGD,
    generate_linear_data,
    plot_regression_line,
    plot_cost_convergence,
    compare_gd_methods
)

%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 6)

## 1. Linear Regression Theory

**Model:** 
$$h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n = \theta^T x$$

**Cost Function (Mean Squared Error):**
$$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$$

**Gradient:**
$$\frac{\partial J}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$$

**Update Rule:**
$$\theta_j := \theta_j - \alpha \frac{\partial J}{\partial \theta_j}$$

## 2. Generate Synthetic Data

In [None]:
# Generate synthetic linear data
X, y, true_theta = generate_linear_data(
    n_samples=100,
    n_features=1,
    noise=10.0,
    random_state=42
)

print(f"Generated {len(X)} samples")
print(f"True parameters (intercept, slope): {true_theta}")

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6, s=50)
plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Generated Linear Data with Noise', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

## 3. Train Linear Regression Model

In [None]:
# Initialize and train the model
model = LinearRegressionGD(
    learning_rate=0.1,
    max_iter=100,
    method='batch'
)

model.fit(X, y, verbose=True)

print(f"\nLearned parameters: {model.theta}")
print(f"True parameters: {true_theta}")
print(f"\nR² score: {model.score(X, y):.4f}")

In [None]:
# Plot the regression line
plot_regression_line(X, y, model, title="Linear Regression Fit")

In [None]:
# Plot cost convergence
plt.figure(figsize=(10, 6))
plt.plot(model.cost_history, 'b-', linewidth=2)
plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Cost (MSE)', fontsize=12)
plt.title('Cost Function Convergence', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.yscale('log')
plt.show()

print(f"Initial cost: {model.cost_history[0]:.4f}")
print(f"Final cost: {model.cost_history[-1]:.4f}")
print(f"Cost reduction: {(1 - model.cost_history[-1]/model.cost_history[0])*100:.2f}%")

## 4. Comparison: Batch vs SGD vs Mini-batch GD

Three variants of gradient descent:

1. **Batch Gradient Descent**: Uses all training samples for each update
   - Pros: Stable convergence, smooth path
   - Cons: Slow for large datasets

2. **Stochastic Gradient Descent (SGD)**: Uses one sample at a time
   - Pros: Fast updates, can escape local minima
   - Cons: Noisy convergence, may oscillate

3. **Mini-batch Gradient Descent**: Uses small batches of samples
   - Pros: Balance between speed and stability
   - Cons: Need to tune batch size

In [None]:
# Generate larger dataset for comparison
X_large, y_large, _ = generate_linear_data(
    n_samples=1000,
    n_features=1,
    noise=15.0,
    random_state=42
)

print(f"Dataset size: {len(X_large)} samples")

In [None]:
# Compare the three methods
models = compare_gd_methods(
    X_large, 
    y_large,
    learning_rate=0.1,
    max_iter=50
)

In [None]:
# Detailed comparison
print("\n" + "="*60)
print("DETAILED COMPARISON")
print("="*60)

for name, model in models.items():
    print(f"\n{name}:")
    print(f"  Learned parameters: {model.theta}")
    print(f"  R² score: {model.score(X_large, y_large):.4f}")
    print(f"  Final cost: {model.cost_history[-1]:.4f}")
    print(f"  Iterations: {len(model.cost_history)-1}")

## 5. Effect of Learning Rate

In [None]:
# Test different learning rates
learning_rates = [0.01, 0.05, 0.1, 0.2, 0.3]
models_lr = {}

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
colors = plt.cm.rainbow(np.linspace(0, 1, len(learning_rates)))

for lr, color in zip(learning_rates, colors):
    model = LinearRegressionGD(learning_rate=lr, max_iter=50, method='batch')
    model.fit(X, y, verbose=False)
    models_lr[f'α={lr}'] = model
    
    # Plot cost history
    ax1.plot(model.cost_history, color=color, linewidth=2, 
            label=f'α={lr}', alpha=0.8)

ax1.set_xlabel('Iteration', fontsize=11)
ax1.set_ylabel('Cost (MSE)', fontsize=11)
ax1.set_title('Cost Convergence with Different Learning Rates', 
             fontsize=12, fontweight='bold')
ax1.legend(loc='best')
ax1.grid(True, alpha=0.3)
ax1.set_yscale('log')

# Plot R² scores
names = list(models_lr.keys())
scores = [model.score(X, y) for model in models_lr.values()]
bars = ax2.bar(names, scores, color=colors, alpha=0.7, edgecolor='black')

for bar, score in zip(bars, scores):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
            f'{score:.4f}', ha='center', va='bottom', fontsize=10)

ax2.set_ylabel('R² Score', fontsize=11)
ax2.set_title('Model Performance', fontsize=12, fontweight='bold')
ax2.set_ylim([0, 1.1])
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 6. Interactive Demo

Experiment with different parameters!

In [None]:
def interactive_linear_regression(n_samples=100, noise=10.0, learning_rate=0.1, max_iter=50):
    """Interactive linear regression demo"""
    
    # Generate data
    X, y, true_theta = generate_linear_data(
        n_samples=n_samples,
        n_features=1,
        noise=noise,
        random_state=42
    )
    
    # Train model
    model = LinearRegressionGD(
        learning_rate=learning_rate,
        max_iter=max_iter,
        method='batch'
    )
    model.fit(X, y, verbose=False)
    
    # Create visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Left plot: Data and regression line
    ax1.scatter(X, y, alpha=0.5, s=50, label='Data')
    
    X_line = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
    y_pred = model.predict(X_line)
    ax1.plot(X_line, y_pred, 'r-', linewidth=3, label='Regression line')
    
    # Add equation
    equation = f'y = {model.theta[0]:.2f} + {model.theta[1]:.2f}x'
    ax1.text(0.05, 0.95, equation, transform=ax1.transAxes,
            fontsize=12, verticalalignment='top',
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    r2 = model.score(X, y)
    ax1.text(0.05, 0.85, f'R² = {r2:.4f}', transform=ax1.transAxes,
            fontsize=12, verticalalignment='top',
            bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8))
    
    ax1.set_xlabel('X', fontsize=12)
    ax1.set_ylabel('y', fontsize=12)
    ax1.set_title('Linear Regression Fit', fontsize=13, fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Right plot: Cost history
    ax2.plot(model.cost_history, 'b-', linewidth=2, marker='o', markersize=4)
    ax2.set_xlabel('Iteration', fontsize=12)
    ax2.set_ylabel('Cost (MSE)', fontsize=12)
    ax2.set_title('Cost Convergence', fontsize=13, fontweight='bold')
    ax2.grid(True, alpha=0.3)
    ax2.set_yscale('log')
    
    # Add statistics
    stats_text = f'Samples: {n_samples}\n'
    stats_text += f'Noise: {noise:.1f}\n'
    stats_text += f'Iterations: {len(model.cost_history)-1}\n'
    stats_text += f'Final cost: {model.cost_history[-1]:.2f}'
    
    ax2.text(0.05, 0.95, stats_text, transform=ax2.transAxes,
            fontsize=10, verticalalignment='top',
            bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8))
    
    plt.tight_layout()
    plt.show()

# Create interactive widget
interact(
    interactive_linear_regression,
    n_samples=IntSlider(min=50, max=500, step=50, value=100, description='Samples:'),
    noise=FloatSlider(min=1, max=30, step=1, value=10, description='Noise:'),
    learning_rate=FloatSlider(min=0.01, max=0.5, step=0.01, value=0.1, description='Learning Rate:'),
    max_iter=IntSlider(min=10, max=100, step=10, value=50, description='Max Iterations:')
);

## Key Takeaways

1. **Linear regression can be solved using gradient descent**
2. **The cost function (MSE) is convex** - guaranteed to find global minimum
3. **Three variants**: Batch (stable), SGD (fast), Mini-batch (balanced)
4. **Learning rate affects convergence speed** - needs tuning
5. **R² score measures goodness of fit** (1.0 = perfect, 0.0 = baseline)
6. **More data generally leads to better models** (with appropriate parameters)