# Plain Gradient Descent with Fixed Learning Rate and Momentum

We consider the Mean Squared Error (MSE) cost function for Ordinary Least Squares (OLS) regression:

$$ C(\beta) = \frac{1}{n}(\mathbf{X}\beta - \mathbf{y})^T(\mathbf{X}\beta - \mathbf{y}) $$

The analytical gradient is given by:

$$ \nabla C(\beta) = \frac{2}{n}\mathbf{X}^T(\mathbf{X}\beta - \mathbf{y}) $$

The parameter update rule for plain gradient descent is:

$$ \beta = \beta - \eta \nabla C(\beta) $$

where \( \eta \) is the learning rate.

To accelerate convergence, we can add **momentum** to the gradient descent algorithm. Momentum helps by considering the previous update's direction and magnitude, smoothing out the updates, and potentially leading to faster convergence.

The momentum update equations are:

$$ v_t = \gamma v_{t-1} + \eta \nabla C(\beta) $$
$$ \beta = \beta - v_t $$

where:

- \( v_t \) is the velocity vector at iteration \( t \).
- \( \gamma \) is the momentum coefficient (typically between 0 and 1).
- \( \eta \) is the learning rate.
- \( \nabla C(\beta) \) is the gradient of the cost function.

Our goal is to find the parameters \( \beta \) that minimize the cost function using gradient descent with a fixed learning rate, and to compare the convergence with and without momentum.

# Newton's Method Using the Hessian Matrix

While gradient descent uses first-order derivatives to find the minimum of a function, Newton's method leverages second-order derivatives (the Hessian matrix) for potentially faster convergence.

The update rule for Newton's method is:

$$ \beta = \beta - H^{-1}(\beta) \nabla C(\beta) $$

where:

- $\nabla C(\beta)$ is the gradient vector.
- $H(\beta) $ is the Hessian matrix of second derivatives.

For the MSE cost function in OLS regression, the Hessian matrix is given by:

$$ H(\beta) = \frac{2}{n} \mathbf{X}^T \mathbf{X} $$

Newton's method can achieve quadratic convergence near the optimum, but it requires computation of the Hessian and its inverse, which can be computationally intensive for large datasets.