# Linear Regression Closed Form Solution Derivation

## Introduction
The objective of linear regression is to find the best-fitting line or model that minimizes the difference between the predicted and actual values. This report presents the derivation of the closed form solution for linear regression, which provides an explicit expression for the optimal coefficient vector.

## Cost Function Formulation
The cost function in linear regression is typically expressed as the Sum of Squared Errors (SSE), defined as the sum of the squared differences between the actual values ($Y_i$) and the predicted values ($\hat{Y_i}$) for each data point. Mathematically, it can be represented as:

$$J = \sum_{1}^{N}{(Y_i - \hat{Y_i})^2}$$
Where $N$ is the number of data points, $Y_i$ is the actual value for data point $i$, and $\hat{Y_i}$ is the predicted value for data point $i$.

## Derivation of the Closed Form Solution

To minimize the cost function J, we take the derivative with respect to the coefficient vector $\theta$, set it equal to zero, and solve for $\theta$. Let's proceed with the derivation.

The cost function can be expressed in matrix form as:
$$J = (Y - X\theta)^T (Y - X\theta)$$

(Since we know that when A is a matrix, the following relationship holds true:
$||A||^2 = A^T A$)

Utilizing the transpose property $(A+B)^T = A^T + B^T$, we can rearrange the terms:

$$J = (Y^T - (X\theta)^T) (Y - X\theta)$$

Furthermore, utilizing the transpose property $(AB)^T = B^T A^T$, we simplify the expression:

$$J = (Y^T - \theta^TX^T) (Y - X\theta)$$
 
Expanding the equation:
$$J = Y^T Y - Y^T (X\theta) - (\theta^T X^T) Y + (\theta^T X^T) (X\theta)$$

Simplifying further:
$$J = Y^T Y - Y^T X \theta - \theta^T X^T Y + X^T X \theta^2$$

To minimize J, we take the derivative with respect to $\theta$:
$$ \frac{\partial{J}}{\partial{\theta}} = 0$$

Let's go term by term:

$$\frac{\partial{(Y^T Y)}}{\partial{\theta}} = 0$$
$$\frac{\partial{(Y^T X \theta)}}{\partial{\theta}} = (Y^TX)^T$$
Here, we used the fact that $\frac{\partial{(A X)}}{\partial{X}} = A^T$


$$\frac{\partial{(\theta^T X^T Y)}}{\partial{\theta}} = X^T Y$$
Here, we used the fact that $\frac{\partial{(X^T A)}}{\partial{X}} = A$


$$\frac{\partial{(\theta^T X^T X \theta)}}{\partial{\theta}} = 2 X^T X \theta$$

Finally:

$$ \frac{\partial{J}}{\partial{\theta}} 
=-(Y^TX)^T - X^T Y + 2 X^T X \theta
= 0$$


We know that $(A^T)^T = A$ and $(AB)^T = B^T A^T$. So we can write this:

$$ \frac{\partial{J}}{\partial{\theta}} 
=-X^T Y - X^T Y + 2 X^T X \theta
= 0$$


$$ \frac{\partial{J}}{\partial{\theta}} 
=- 2 X^T Y + 2 X^T X \theta
= 0$$

Simplifying:
$$ 2 X^T X \theta = 2 X^T Y$$

Dividing by 2:

$$ X^T X \theta = X^T Y$$

To obtain the optimal $\theta$, we isolate it by multiplying both sides by the inverse of $X^T X$:

$$ (X^T X)^{-1} X^T X \theta = (X^T X)^{-1} X^T Y$$

Finally, we arrive at the closed form solution for linear regression:

$$\theta = (X^T X)^{-1} X^T Y$$

## Conclusion
The closed form solution for linear regression provides an explicit expression for the optimal coefficient vector $\theta$. By minimizing the cost function using the derivative and setting it equal to zero, we derive the closed form expression. This solution allows for a direct calculation of the optimal coefficients, simplifying the process of fitting a linear regression model to the given data.