```{contents}
```

## OLS

* **OLS (Ordinary Least Squares)** is the most common method to estimate the parameters (coefficients) of a linear regression model.
* It finds the **best-fit line** by **minimizing the sum of squared errors (residuals)** between actual and predicted values.

$$
\text{Residual (error)} = y_i - \hat{y}_i
$$

$$
\hat{y}_i = \beta_0 + \beta_1x_i
$$

---

### Objective

OLS minimizes:

$$
SSE = \sum_{i=1}^n (y_i - (\beta_0 + \beta_1x_i))^2
$$

Where:

* $y_i$ = actual value
* $\hat{y}_i$ = predicted value
* $n$ = number of observations

This ensures the line is as close as possible to all data points.

---

### Derivation (Simple Linear Regression)

We solve for parameters $\beta_0$ (intercept) and $\beta_1$ (slope) using calculus:

1. Take partial derivatives of SSE wrt $\beta_0$ and $\beta_1$.
2. Set them = 0 (to minimize error).
3. Solve → gives the **normal equations**.

Final formulas:

* **Slope**:

$$
\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}
$$

* **Intercept**:

$$
\beta_0 = \bar{y} - \beta_1 \bar{x}
$$

---

### Multiple Linear Regression (Matrix Form)

For multiple features:

$$
Y = X\beta + \epsilon
$$

OLS solution:

$$
\hat{\beta} = (X^TX)^{-1}X^TY
$$

Where:

* $X$ = feature matrix
* $Y$ = target vector
* $\beta$ = coefficient vector

---

### Why OLS?

✅ Simple and widely used
✅ Provides exact solution (no iterations needed, unlike Gradient Descent)
✅ Works well when data assumptions hold (linearity, independence, homoscedasticity, normality of errors)

---

👉 In short:
OLS gives us a **mathematical way** to find the regression line by minimizing squared differences between actual and predicted values.
