#### Using the **Nnormal Equation** to solve Linear Regression:
$$\theta = (X^{T}X)^{-1}X^{T}y$$

##### Linear Regression Model
The linear regression model can be represented as: $(h_{\theta}(x) = \theta^T x)$ where $(h_{\theta}(x))$ is the model's prediction, $(\theta)$ represents the model parameters (including intercept and slope), and $(x)$ is the feature vector.

##### Cost Function
The goal of linear regression is to minimize the cost $(\theta)$, thereby minimizing the cost function $(J(\theta))$. The cost function is the sum of the squares of the errors between predicted and actual values, given by:
$[ J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_{\theta}(x^{(i)}) - y^{(i)} \right)^2 ]$
Here, $(m)$ is the number of training samples, $(x^{(i)})$ is the feature vector of the $i^{th}$ sample, and $(y^{(i)})$ is the actual target value of the $i^{th}$ sample.

##### Matrix Representation of the Cost Function
Representing the cost function in matrix form simplifies computation. Define matrix $(X)$ where each row is a feature vector of a training sample (usually the first column is 1, representing the intercept), and vector $(y)$ contains all training samples' target values. The cost function can then be expressed as:
$[ J(\theta) = \frac{1}{2} (X\theta - y)^T (X\theta - y) ]$

##### Derivation of the Cost Function
To find the $(\theta)$ that minimizes the cost function, we derive with respect to $(\theta)$ and set the derivative equal to zero. The derivative of $(J(\theta))$ is given by:
$[ \frac{\partial J}{\partial \theta} = X^T(X\theta - y) ]$
Setting the derivative to zero, we find:
$[ X^T(X\theta - y) = 0 ]$
$[ X^TX\theta - X^Ty = 0 ]$
$[ X^TX\theta = X^Ty ]$


In [2]:
import numpy as np

def linear_regression_normal_equation(X: list[list[float]], y: list[float]) -> list[float]:
	# Your code here, make sure to round
    x = np.array(X) # 3x2
    x_trans = np.transpose(x) # 2x3
    y = np.array(y) # 1x3
    theta = np.linalg.inv(x_trans.dot(x)).dot(x_trans).dot(y)
    return theta

# input_x = [[1, 1], [1, 2], [1, 3]]
# input_y = [1, 2, 3]
# print(linear_regression_normal_equation(input_x, input_y))

[8.8817842e-16 1.0000000e+00]


#### Using **Gradient Descent**

1. **Hypothesis Function**:
    - The hypothesis for linear regression is given by:
      $$[h_{\theta}(X) = \theta_0 + \theta_1 X_1 + \theta_2 X_2 + \ldots + \theta_n X_n]$$
      where $( X )$ is the input features matrix, and $( \theta )$ are the parameters (weights).

2. **Cost Function**:
    - The cost function (MSE) measures how well the hypothesis fits the training data:
      $$[
      J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \left( h_{\theta}(X^{(i)}) - y^{(i)} \right)^2
      ]$$
      where $( m )$ is the number of training examples.
      $$\delta J(\theta) = \frac{1}{m} X^{T} (X\theta - y)$$

3. **Gradient Descent Algorithm**:
    - Gradient descent updates the parameters $( \theta )$ iteratively:
      $$[
      \theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(X^{(i)}) - y^{(i)} \right) \cdot X_j^{(i)}
      ]$$
      $$\theta := \theta - \alpha \cdot \delta J(\theta)$$


In [7]:
import numpy as np
def linear_regression_gradient_descent(X: np.ndarray, y: np.ndarray, alpha: float, iterations: int) -> np.ndarray:
	# Your code here, make sure to round
	m, n = X.shape
	x_trans = np.transpose(X) # n m
	theta = np.zeros((n, 1)) # n 1
	y_trans = y.reshape(-1, 1)
	for _ in range(iterations):
		pred = X.dot(theta) # m 1
		difference = (1/m)*x_trans.dot(pred - y_trans)
		theta -= alpha * difference
	return theta

input_x = np.array([[1, 1], [1, 2], [1, 3]])
input_y = np.array([1, 2, 3])
alpha = 0.01
iterations = 1000
print(linear_regression_gradient_descent(input_x, input_y, alpha, iterations))
# output: np.array([0.1107, 0.9513])

[[0.11071521]
 [0.95129619]]


- Use * for element-wise multiplication.
- Use @ for matrix multiplication (preferred for readability).
- Use np.dot for more complex operations like dot products of 1-D arrays or higher-dimensional array multiplication.