---
title: Simplifying the normal equation with Gram-Schmidt  
date: 2020-07-27   
comments: false  
tags: maths, linear algebra, python  
keywords: python, data science, linear algebra, linear regression, normal equation, least squares, numpy

---

In the [last post]({filename}2020-07-13-linear-algebra-ols-regression.md) I talked about how to find the coefficients that give us the line of best fit for a OLS regression problem using the normal solution. The core of this approach is the equation:

$$
X^TXb = X^Ty
$$

The way we solved this in the previous post was to pull out the system of simultaneous equations and solve these for $b$. However, a more straightforward way is to simply rewrite the equation by "dividing" both sides by $X^TX$, so that we are directly solving for $b$. If you remember back to [this post]({filename}2020-06-15-matrix-inversion.md), we need to multiply both sides of the equation by the inverse of $X^TX$ in order to isolate $b$:

$$
\begin{aligned}
(X^TX)^{-1}(X^TX)b &= (X^TX)^{-1}X^Ty \\
Ib &= (X^TX)^{-1}X^Ty \\
b &= (X^TX)^{-1}X^Ty
\end{aligned}
$$

The issue is that this equation now involves taking the inverse of a matrix, which is computationally expensive. In fact, this step is so costly that Andrew Ng recommends that you should not us the normal equation to calculate OLS regression if you have more than 10,000 features! Luckily there is a way of getting rid of this whole inversion step by using a special type of matrix called orthonormal.

## The advantage of orthonormal vectors

An orthonormal matrix is one where every column vector is an orthogonal unit vector. We have seen orthogonal vectors before: this simply means that the dot product of every pair of vectors in the matrix is $0$. Unit vectors are similarly straightforward - they are just vectors with a length of $1$.

Why are orthonormal matrices so great for helping us find $b$? It is because the multiplication of an orthonormal matrice's transpose by itself is the identity matrix, or $Q^TQ = I$, where $Q$ is the orthonormal vector. Does the lefthand side of this equation look familiar? It's the exact part of $X^TXb = X^Ty$ that we tried to get rid of earlier by multiplying by the inverse. If we were able to use the orthonormal form of $X$, we could isolate $b$ without having to calculate the inverse at all!

Before we get into how to convert our matrix $X$ into $Q$, let's have a look at why $Q^TQ = 0$.

* When we multiply a unit vector by itself, we get that vector squared, which equals the formula for length, which is 1
* When we multiply any other two vectors in the matrix, the dot product is 0 because they are orthogonal
* Vectors are only multiplied by themselves when $i = j$, which is only on the diagonals

## Turning a matrix into an orthonormal matrix

* Try to find an example where the length of each is a square root (only possible for first vector)

$X = \begin{bmatrix} 2 & 1 \\ 1 & 3 \\ 4 & 1 \\ 2 & 5 \end{bmatrix}$


In [7]:
3**2 + 2**2 + 1**2 + 1**2 + 1**2

16

In [11]:
4**2 + 2**2 + 2**2 + 1**2

25

In [15]:
5**2 + 3**2 + 1**2 + 1**2

36