# 🤖 Application 1: Linear Regression From Scratch

> *"The best way to learn is to do. The best way to build a model is to build it from scratch."*

Welcome to the first application notebook! Here, we will tie together everything we've learned—Linear Algebra, Calculus, and Optimization—to build one of the most fundamental machine learning models: **Linear Regression**.

Instead of using a library like Scikit-learn, we will build it from scratch using only NumPy. This will give you a deep understanding of what's happening under the hood.

## 🎯 How the Math Comes Together

- **Linear Algebra**: We'll use vector and matrix operations to make predictions efficiently. The core prediction equation, `y_hat = X @ theta`, is a direct application of matrix-vector multiplication.
- **Calculus**: We will define a loss function (Mean Squared Error) and then calculate its gradient with respect to the model parameters. This gradient tells us how to update our parameters to improve the model.
- **Optimization**: We will use the Gradient Descent algorithm we learned about to iteratively update our model's parameters (`theta`) and minimize the loss function.

## 📚 Import Essential Libraries

In [None]:
# Core libraries
import numpy as np
import matplotlib.pyplot as plt

# Plotting style
plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

# Create a synthetic dataset
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1) # y = 4 + 3x + noise

print("🤖 Libraries and synthetic data loaded!")

# Visualize the data
plt.scatter(X, y)
plt.xlabel('X (Feature)')
plt.ylabel('y (Target)')
plt.title('Synthetic Data for Linear Regression')
plt.show()

---

# 📝 Step 1: The Model and Loss Function

### The Linear Model (Linear Algebra)
Our hypothesis `h(x)` is a linear function of the input `x`:
$$ h_\theta(x) = \theta_0 + \theta_1 x_1 $$
In matrix form, this becomes much cleaner:
$$ \hat{y} = X_b \cdot \theta $$
Where:
- `$\hat{y}$` is the vector of predictions.
- `$X_b$` is the design matrix with a column of ones added for the bias term `$\theta_0$`.
- `$\theta$` is the parameter vector `[$\theta_0$, $\theta_1$]`.

### The Loss Function (Calculus & Statistics)
We'll use the Mean Squared Error (MSE), which measures the average squared difference between predictions and actual values.
$$ J(\theta) = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2 $$

In [None]:
# Add a column of ones to X for the bias term (theta_0)
# This creates the design matrix X_b
X_b = np.c_[np.ones((100, 1)), X]

print("Original X shape:", X.shape)
print("Design Matrix X_b shape:", X_b.shape)
print("First 5 rows of X_b:\n", X_b[:5])

---

# 📉 Step 2: The Gradient (Calculus)

To use Gradient Descent, we need the partial derivative of the loss function `J(θ)` with respect to each parameter `θ_j`. The gradient vector is:

$$ \nabla_\theta J(\theta) = \begin{pmatrix} \frac{\partial J}{\partial \theta_0} \\ \frac{\partial J}{\partial \theta_1} \\ \vdots \\ \frac{\partial J}{\partial \theta_n} \end{pmatrix} = \frac{2}{m} X_b^T \cdot (X_b \cdot \theta - y) $$

This elegant equation gives us the gradient for all parameters at once. It's the direction of steepest ascent on the loss surface. We will move in the opposite direction to minimize the loss.

---

# ⚙️ Step 3: The Training (Optimization)

Now we'll implement the Gradient Descent algorithm using the gradient we just defined.

**The Process**:
1. Initialize parameters `θ` randomly.
2. Repeat for a number of iterations:
   a. Calculate the gradient of the loss function using the formula above.
   b. Update the parameters `θ` by taking a step in the opposite direction of the gradient: `θ = θ - η * gradient`.
3. Keep track of the loss at each step to watch it decrease.

In [None]:
def train_linear_regression(X_b, y, learning_rate=0.1, n_iterations=100):
    """
    Trains a linear regression model using Batch Gradient Descent.
    """
    m, n = X_b.shape
    
    # 1. Initialize parameters randomly
    theta = np.random.randn(n, 1)
    
    # Keep track of loss and theta history
    loss_history = []
    theta_history = [theta]
    
    # 2. Repeat for n_iterations
    for iteration in range(n_iterations):
        # a. Calculate the gradient
        gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
        
        # b. Update parameters
        theta = theta - learning_rate * gradients
        
        # Calculate and store loss and theta
        predictions = X_b.dot(theta)
        loss = np.mean((predictions - y)**2)
        loss_history.append(loss)
        theta_history.append(theta)
        
    return theta, loss_history, theta_history

# Train the model
final_theta, loss_history, theta_history = train_linear_regression(X_b, y, learning_rate=0.1, n_iterations=100)

print("--- Training Complete ---")
print(f"True parameters were [4, 3]")
print(f"Final learned parameters (theta):\n  θ₀ (bias): {final_theta[0][0]:.4f}\n  θ₁ (weight): {final_theta[1][0]:.4f}")
print(f"Initial Loss: {loss_history[0]:.4f}")
print(f"Final Loss: {loss_history[-1]:.4f}")

---

# 📊 Step 4: Visualization and Analysis

Now let's visualize the results to see how our model learned.

In [None]:
def visualize_training(loss_history, theta_history, X, X_b, y):
    """
    Visualize the training process: loss curve and regression lines.
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 8))
    
    # 1. Plot the loss curve
    ax1.plot(loss_history)
    ax1.set_xlabel('Iteration')
    ax1.set_ylabel('Mean Squared Error (Loss)')
    ax1.set_title('Training Loss Over Time', fontsize=16, weight='bold')
    ax1.grid(True)
    
    # 2. Plot the regression lines at different stages
    ax2.scatter(X, y, alpha=0.6, label='Data')
    
    plot_indices = [0, 10, 50, 99]
    colors = ['r', 'orange', 'yellow', 'g']
    
    for i, idx in enumerate(plot_indices):
        theta = theta_history[idx]
        y_predict = X_b.dot(theta)
        label = f'Iteration {idx}' if idx != 99 else f'Final Fit (Iter {idx})'
        ax2.plot(X, y_predict, color=colors[i], linewidth=2, label=label)
    
    ax2.set_xlabel('X (Feature)')
    ax2.set_ylabel('y (Target)')
    ax2.set_title('Learned Regression Line Over Time', fontsize=16, weight='bold')
    ax2.legend()
    ax2.grid(True)
    
    plt.show()

visualize_training(loss_history, theta_history, X, X_b, y)

### Analysis

- **Loss Curve (Left)**: You can see the loss decreasing rapidly at the beginning and then plateauing as the model converges to the optimal parameters. This is a classic sign of successful training.
- **Regression Line (Right)**: The plot shows how the model's prediction line starts from a random position and gradually moves to fit the data better and better with each iteration, finally settling on a line that closely matches the underlying pattern.

Congratulations! You have successfully built and trained a machine learning model from scratch, seeing firsthand how linear algebra, calculus, and optimization work in harmony.