<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/Linear%20Regression/Linear%20Regression%20Code%20Walk%20Through.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression: Code Walk Through

This notebook walks through the **computational steps** of the Linear Regression algorithm from scratch.

## What We'll Cover:
1. **Visualize the data** - understand the dataset
2. **Add bias term** - transform data to include intercept
3. **Find best fit line** - compute optimal weights using closed-form solution
4. **Make predictions** - use learned weights to predict new values

We'll show **both manual calculation** (to understand the logic) and **vectorized matrix operations** (for efficiency).

### Key Concept:
- Linear regression finds the **best fit line** through the data
- Uses **closed-form solution** (no iterative training needed!)
- Formula: **y = w₁x + w₀** where w₀ is intercept and w₁ is slope

## Step 1: Import Libraries

We need:
- **NumPy** for numerical operations and matrix calculations
- **Matplotlib** for visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Step 2: Create Training Data

We have:
- **180 training points** generated from **x values ranging from -9.5 to 8.5** (step size 0.1)
- **Continuous target values** that follow a linear relationship with added Gaussian noise
- Our goal: find the line that best fits this data

The data follows the relationship: **y = x + 1 + noise** where noise ~ N(0, 2)

In [None]:
# Sample data (matching lecture slides)
# Generate X values from -9.5 to 8.5 with step size 0.1
X_train = np.arange(-9.5, 8.5, 0.1)

# Generate y values: y = x + 1 + noise
# where noise follows a normal distribution with mean 0 and std 2
np.random.seed(42)  # For reproducibility
data_points = X_train
y_train = data_points + 1 + np.random.normal(0, 2, len(data_points))

# Reshape X_train to be a column vector for matrix operations
X_train = X_train.reshape(-1, 1)

# Test point
X_test = np.array([5])

print("Training data shape:", X_train.shape)  # (180, 1) = 180 points, 1 feature
print("Target values shape:", y_train.shape)   # (180,) = 180 target values
print("\nFirst few training points:")
print(X_train[:3].ravel())
print("\nCorresponding target values:")
print(y_train[:3])
print(f"\nTarget value range: [{y_train.min():.3f}, {y_train.max():.3f}]")
print(f"\nTest point: x = {X_test[0]}")

## Step 3: Visualize the Data

Let's plot our training data to see the relationship between x and y.

We can see the points roughly follow a **linear trend** - perfect for linear regression!

In [None]:
# Scatter plot of training data
plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train,
           c='lightblue',
           label='Training data')
plt.xlabel('x', fontsize=14)
plt.ylabel('y', fontsize=14)
plt.title('Training Data: Looking for Linear Relationship', fontsize=16)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()

print(f"We have {len(X_train)} training points")
print(f"Goal: Find the line y = w₁x + w₀ that best fits this data")

## Step 4: Add Column of Ones (Bias Term)

To include an **intercept term** (bias) in our model, we need to add a column of 1s to our data.

**Why?**
- Our model is: **y = w₁x + w₀**
- We can rewrite this as: **y = w₀(1) + w₁x**
- In matrix form: **y = [1, x] × [w₀, w₁]ᵀ**

**Transformation:**
- Original: **[x⁽¹⁾, x⁽²⁾, ..., x⁽ᴺ⁾]**
- With bias: **[[1, x⁽¹⁾], [1, x⁽²⁾], ..., [1, x⁽ᴺ⁾]]**

This matrix is called **Φ** (Phi) or the **design matrix**.

### Approach 1: Manual Construction (Understanding the Logic)

Let's manually build the design matrix step by step.

In [None]:
# Create column of ones manually
N = len(X_train)  # Number of training points
ones_column = np.ones((N, 1))  # Column of 1s with shape (180, 1)

print("Original X_train shape:", X_train.shape)
print("Ones column shape:", ones_column.shape)
print("\nOnes column:")
print(ones_column[:3])  # Show first 3

# Concatenate: [ones column | X_train]
Phi_manual = np.concatenate([ones_column, X_train], axis=1)

print("\nDesign matrix Φ (Phi) shape:", Phi_manual.shape)  # (180, 2)
print("\nFirst few rows of Φ:")
print(Phi_manual[:5])
print("\nEach row is now: [1, x]")

### Approach 2: Using `np.c_[]` (Efficient)

`np.c_[]` is a convenient way to concatenate arrays column-wise.

This is more concise than manual concatenation.

In [None]:
# Add column of ones using np.c_[]
Phi = np.c_[np.ones(len(X_train)), X_train]

print("Design matrix Φ shape:", Phi.shape)  # (180, 2)
print("\nFirst few rows of Φ:")
print(Phi[:5])
print()

# Verify both approaches match
print("Manual and np.c_[] results match:", np.allclose(Phi_manual, Phi))

## Step 5: Find Best Fit Line (Closed-Form Solution)

Now we compute the optimal weights using the **normal equation**:

$$\mathbf{w} = (\Phi^T \Phi)^{-1} \Phi^T \mathbf{y}$$

Where:
- **Φ** is our design matrix (with bias column)
- **y** is our target values
- **w** = [w₀, w₁] are the weights (intercept and slope)

This gives us the **exact solution** in one computation (no iterative training!).

### Breaking Down the Formula Step by Step

Let's compute each part of the formula manually to understand what's happening.

In [None]:
print("Step 1: Compute Φᵀ (Phi transpose)")
print("="*50)
Phi_T = Phi.T
print(f"Φ shape: {Phi.shape}")
print(f"Φᵀ shape: {Phi_T.shape}")
print("\nΦᵀ:")
print(Phi_T)
print()

In [None]:
print("Step 2: Compute Φᵀ Φ (matrix multiplication)")
print("="*50)
Phi_T_Phi = Phi_T @ Phi  # Using @ for matrix multiplication
print(f"Φᵀ shape: {Phi_T.shape}")
print(f"Φ shape: {Phi.shape}")
print(f"Φᵀ Φ shape: {Phi_T_Phi.shape}")
print("\nΦᵀ Φ:")
print(Phi_T_Phi)
print("\nThis is a 2×2 symmetric matrix")
print()

In [None]:
print("Step 3: Compute (Φᵀ Φ)⁻¹ (matrix inverse)")
print("="*50)
Phi_T_Phi_inv = np.linalg.inv(Phi_T_Phi)
print(f"(Φᵀ Φ)⁻¹ shape: {Phi_T_Phi_inv.shape}")
print("\n(Φᵀ Φ)⁻¹:")
print(Phi_T_Phi_inv)
print()

# Verify it's actually the inverse
identity = Phi_T_Phi @ Phi_T_Phi_inv
print("Verification: Φᵀ Φ × (Φᵀ Φ)⁻¹ should equal identity matrix:")
print(np.round(identity, 10))  # Round to avoid floating point display issues
print()

In [None]:
print("Step 4: Compute Φᵀ y (matrix-vector multiplication)")
print("="*50)
Phi_T_y = Phi_T @ y_train
print(f"Φᵀ shape: {Phi_T.shape}")
print(f"y shape: {y_train.shape}")
print(f"Φᵀ y shape: {Phi_T_y.shape}")
print("\nΦᵀ y:")
print(Phi_T_y)
print()

In [None]:
print("Step 5: Compute weights w = (Φᵀ Φ)⁻¹ Φᵀ y")
print("="*50)
weights_manual = Phi_T_Phi_inv @ Phi_T_y
print(f"Weights shape: {weights_manual.shape}")
print("\nWeights [w₀, w₁]:")
print(weights_manual)
print()
print(f"Intercept (w₀): {weights_manual[0]:.6f}")
print(f"Slope (w₁):     {weights_manual[1]:.6f}")
print()
print(f"Our linear model: y = {weights_manual[1]:.3f}x + {weights_manual[0]:.3f}")

### All-in-One Computation (Efficient)

Now let's compute the weights in one line using the full formula.

This is how you'd typically implement it in practice.

In [None]:
# Compute weights using the closed-form solution
weights = np.linalg.inv(Phi.T @ Phi) @ Phi.T @ y_train

print("Weights computed in one line:")
print(weights)
print()
print(f"Intercept (w₀): {weights[0]:.6f}")
print(f"Slope (w₁):     {weights[1]:.6f}")
print()

# Verify both approaches match
print("Manual and one-line results match:", np.allclose(weights_manual, weights))
print()
print(f"Final linear model: y = {weights[1]:.3f}x + {weights[0]:.3f}")

## Step 6: Visualize the Best Fit Line

Let's plot our learned line along with the training data to see how well it fits!

In [None]:
# Create x values for plotting the line
x_line = np.linspace(X_train.min(), X_train.max(), 100)

# Compute y values using our learned weights: y = w₁x + w₀
y_line = weights[1] * x_line + weights[0]

# Plot
plt.figure(figsize=(10, 6))

# Training data
plt.scatter(X_train, y_train,
           c='steelblue', s=150, alpha=0.7,
           edgecolors='black', linewidths=2,
           label='Training data', zorder=3)

# Best fit line
plt.plot(x_line, y_line,
        'r-', linewidth=3, alpha=0.8,
        label=f'Best fit line: y = {weights[1]:.3f}x + {weights[0]:.3f}')

plt.xlabel('x', fontsize=14)
plt.ylabel('y', fontsize=14)
plt.title('Linear Regression: Best Fit Line', fontsize=16)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Model equation: y = {weights[1]:.3f}x + {weights[0]:.3f}")

## Step 7: Make Predictions on Test Data

Now let's use our trained model to predict a new value.

**Example:** What is the predicted y value when x = 5?

### Step 7a: Add Bias Term to Test Data

Just like with training data, we need to add a column of 1s to our test data.

**Transformation:** [5] → [1, 5]

In [None]:
# Test point (defined earlier: x = 5)
X_test_reshaped = X_test.reshape(-1, 1)  # Reshape for matrix operations

print("Original test point:", X_test)
print("Reshaped test point:", X_test_reshaped)
print("Shape:", X_test_reshaped.shape)
print()

# Add bias term
X_test_with_bias = np.c_[np.ones(len(X_test_reshaped)), X_test_reshaped]

print("Test point with bias:", X_test_with_bias[0])
print("Shape:", X_test_with_bias.shape)
print()
print("Now it has the form [1, x] to match our weights [w₀, w₁]")

### Step 7b: Compute Prediction

Prediction is simply a **dot product**:

$$
\hat{y} =
\begin{bmatrix}1 & x\end{bmatrix} \times
\begin{bmatrix} w_0 \\ w_1 \end{bmatrix}
= w_0 + x \times w_1.
$$


In [None]:
# Manual calculation
x_value = X_test_reshaped[0, 0]  # x = 5.0
w0 = weights[0]  # Intercept
w1 = weights[1]  # Slope

# Prediction using matrix multiplication
prediction = X_test_with_bias @ weights

print("Matrix multiplication approach:")
print("="*50)
print(f"X_test with bias: {X_test_with_bias[0]}")
print(f"Weights:          {weights}")
print(f"Predictions: ", prediction[0])
print()

## Step 8: Visualize the Prediction

Let's visualize our prediction on the fitted line.

In [None]:
# Create x values for plotting the line
x_line = np.linspace(X_train.min(), X_train.max(), 100)
y_line = weights[1] * x_line + weights[0]

# Plot
plt.figure(figsize=(10, 6))

# Training data
plt.scatter(X_train, y_train,
           c='lightblue',
           label='Training data', zorder=3)

# Best fit line
plt.plot(x_line, y_line,
        'k-', linewidth=2, alpha=0.8,
        label=f'Best fit line: y = {weights[1]:.3f}x + {weights[0]:.3f}')

# Test point and prediction
plt.scatter(X_test_reshaped, prediction,
           c='red', s=100, marker='o',
           edgecolors='black', linewidths=2,
           label=f'Prediction: x={X_test_reshaped[0,0]:.1f}, ŷ={prediction[0]:.3f}',
           zorder=4)

# Draw dashed lines to show prediction
plt.plot([X_train.min(), X_test_reshaped[0,0]], [prediction[0], prediction[0]], 'k--')
plt.plot([X_test_reshaped[0,0], X_test_reshaped[0,0]], [y_train.min(), prediction[0]], 'k--')
#        'k--', alpha=0.5, linewidth=1)
#plt.plot([X_train.min(), prediction[0]], [X_test_reshaped[0,0], prediction[0]],
#        'k--', alpha=0.5, linewidth=1)

plt.axis([X_train.min(), X_train.max(), y_train.min(), y_train.max()])
plt.xlabel('x', fontsize=14)
plt.ylabel('y', fontsize=14)
plt.title('Linear Regression: Making a Prediction', fontsize=16)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

print(f"\nFor test point x = {X_test_reshaped[0,0]:.1f}:")
print(f"Predicted value ŷ = {prediction[0]:.3f}")
print(f"\nCalculation: ŷ = {weights[1]:.3f} × {X_test_reshaped[0,0]:.1f} + {weights[0]:.3f} = {prediction[0]:.3f}")

## Step 9: Make Predictions for Multiple Test Points

Let's predict for several test points at once to see how vectorization works.

In [None]:
# Multiple test points
X_test_multiple = np.array([[1.0], [2.5], [4.0], [5.5], [6.0]])

print("Test points:")
print(X_test_multiple.ravel())
print()

# Add bias term
X_test_multiple_with_bias = np.c_[np.ones(len(X_test_multiple)), X_test_multiple]

print("Test points with bias:")
print(X_test_multiple_with_bias)
print()

# Make predictions (vectorized operation!)
predictions_multiple = X_test_multiple_with_bias @ weights

print("Predictions:")
for i, (x, y_pred) in enumerate(zip(X_test_multiple, predictions_multiple)):
    print(f"  x = {x[0]:.1f}  →  ŷ = {y_pred:.3f}")

## Summary

We've walked through all the computational steps of Linear Regression:

1. ✅ **Visualized data** - saw training points showing linear trend
2. ✅ **Added bias term** - transformed data by adding column of 1s to create design matrix Φ
3. ✅ **Found best fit line** - used closed-form solution **w = (ΦᵀΦ)⁻¹Φᵀy** to compute optimal weights
4. ✅ **Made predictions** - computed ŷ = [1, x] · [w₀, w₁] for new points

### Key Linear Regression Concepts:

| Concept | Description |
|---------|-------------|
| **Design Matrix (Φ)** | Training data with bias column: [[1, x⁽¹⁾], [1, x⁽²⁾], ...] |
| **Weights (w)** | [w₀, w₁] where w₀ = intercept, w₁ = slope |
| **Normal Equation** | w = (ΦᵀΦ)⁻¹Φᵀy gives optimal solution directly |
| **Prediction** | ŷ = w₀ + w₁x (or [1,x] · w in matrix form) |

### Key NumPy Operations Used:

- **`np.c_[ones, X]`** - concatenate columns to add bias term
- **`.T`** - transpose matrix (Φ → Φᵀ)
- **`@`** - matrix multiplication operator
- **`np.linalg.inv()`** - compute matrix inverse
- **`Phi.T @ Phi`** - compute ΦᵀΦ (Gram matrix)
- **`X @ w`** - compute predictions via matrix-vector multiplication

### Linear Regression vs KNN Regression:

| Aspect | Linear Regression | KNN Regression |
|--------|------------------|----------------|
| **Model** | Parametric (learns fixed weights) | Non-parametric (uses training data directly) |
| **Training** | Closed-form solution (instant) | No training needed |
| **Prediction** | Fast (just w₀ + w₁x) | Slower (must find K neighbors) |
| **Assumes** | Linear relationship | Local similarity |
| **Memory** | Only stores weights | Must store all training data |
