## 📘 Summary: Gradient Descent for Linear Regression

This notebook implements **gradient descent** to optimize the parameters (`theta`) of a **linear regression** model by minimizing a cost function.

---

### 🔢 Objective

We aim to find the best-fitting line (or hyperplane) by minimizing the **cost function**, which measures how far the model's predictions are from the actual target values.

---

### 💡 Cost Function

The cost function used is **Mean Squared Error (MSE)**:




This function:
- Computes the model's predictions.
- Calculates the squared differences from the actual values.
- Averages them to get the cost (error).
- The `1/2` factor simplifies gradient calculations.

---

### 📥 Key Variables

- `X`: Input features (shape: m × n)
- `X_bias`: `X` with a column of ones for the intercept term (`theta_0`)
- `Y`: Target/output values (shape: m × 1)
- `theta`: Model parameters (shape: (n+1) × 1)
- `alpha`: Learning rate — controls how big each update step is
- `iterations`: Number of steps (updates) for gradient descent
- `cost_history`: List that tracks the cost at each iteration

---

### 🚀 Gradient Descent

At each iteration, the algorithm:
1. Computes predictions using current `theta`
2. Calculates the error
3. Computes the gradient (direction to move)
4. Updates `theta` to reduce the cost
5. Stores the cost for visualization/convergence check

The process repeats for the specified number of iterations.

---


In [2]:
import numpy as np
import matplotlib.pyplot as plt

In [3]:
# Config
DISP_WIDTH = 30

In [4]:
# Data: Study hours and corresponding exam scores
# Print the optimal parameters
print("*" * DISP_WIDTH )
print("LOADING DATASET ...".center(DISP_WIDTH))
print("*" *DISP_WIDTH)

******************************
     LOADING DATASET ...      
******************************


In [5]:
X = np.array([1,2,3,4,5,6,7,8,9,10])
Y = np.array([50,55,60,65,70,75,80,85,90,95])

In [None]:
m = len(Y)

10


In [None]:
X_bias = np.c_[np.ones(m), X]

In [None]:
theta = np.zeros(2)

In [None]:
alpha = 0.01
iterations = 1000

In [10]:
# Cost function (Mean Squared Error)
def compute_cost(X_bias, Y, theta):
    predictions = X_bias.dot(theta)
    errors = predictions - Y
    cost = (1 / (2 * m)) * np.dot(errors.T, errors)
    return cost

In [11]:
# Gradient descent algorithm
def gradient_descent(X_bias, Y, theta, alpha, iterations):
    cost_history = []
    