# Of Matrix Inverses and Pseudo-Inverses

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

## Matrix Inverses

Just as numbers have multiplicative inverses, so some matrices have matrix inverses. A multiplicative inverse for a number $x$, $x^{-1}$, satisfies the following:

$xx^{-1} = 1$.

In [None]:
x = np.random.rand(1)

In [None]:
x * x**(-1)

Similarly, an inverse for a matrix $M$, $M^{-1}$, satisfies:

- $MM^{-1} = I$
- $M^{-1}M = I$,

where $I$ is the identity matrix with 1's down the main diagonal and 0's everywhere else:

$I = \begin{bmatrix}1 & 0 & ... & 0 & 0 \\
0 & 1 & ... & 0 & 0 \\
... & ... & ... & ... & ... \\
0 & 0 & ... & 1 & 0 \\
0 & 0 & ... & 0 & 1 \end{bmatrix}$.

In [None]:
M = np.random.rand(10, 10)
np.round(M.dot(np.linalg.inv(M)))

Similarly, if we have a numerical equation like $ax = b$, we can represent the solution to this equation as:

$x = a^{-1}b$.

In [None]:
np.random.seed(42)
a = np.random.rand(1)
x = np.random.rand(1)
b = a*x
x

In [None]:
a**(-1) * b

And if we have a matrix equation like $A\vec{x} = \vec{b}$, we can represent the solution to this equation as:

$\vec{x} = A^{-1}\vec{b}$.

In [None]:
np.random.seed(42)
A = np.random.rand(10, 10)
x = np.random.rand(10)
b = A.dot(x)
x

In [None]:
np.linalg.inv(A).dot(b)

## Pseudo-Inverses

A matrix has an inverse *only if it is square*, and therefore an equation like $\vec{x} = A^{-1}\vec{b}$ corresponds to a situation where we have exactly as many rows as we have columns in our dataset. The vector $\vec{x}$, moreover, represents an exact solution to the system of equations represented by $A$ and $b$.

But of course in the typical situation in data science we have more rows than columns (more observations than features), and so we are in the realm not of exact solutions but of optimizations: $A\vec{x}\approx\vec{b}$, for some *non-square* $A$. And the least-squares regression hyperplane, for example, provides exactly that: It's the hyperplane $\vec{x}$ that minimizes the sum of squared differences between $\vec{b}$ (our target) and $A\vec{x}$ (our target estimates).

How can we express our optimizing betas $\vec{x}$ in terms of $A$ and $\vec{b}$? Observe the following:

$A\vec{x} = \vec{b}$

Therefore:

$A^TA\vec{x} = A^T\vec{b}$.

Now *this* matrix, $A^TA$, *is* square and so we'll assume that it has an inverse.

Thus we write:

$(A^TA)^{-1}(A^TA)\vec{x} = (A^TA)^{-1}A^T\vec{b}$, i.e.

$\vec{x} = (A^TA)^{-1}A^T\vec{b}$.

By analogy with the situation with square $A$, this matrix $(A^TA)^{-1}A^T$ is called the ***pseudo-inverse*** of $A$: Just as $A^{-1}\vec{b}$ provides an exact solution for $\vec{x}$ in the equation $A\vec{x} = \vec{b}$ (square $A$), so $(A^TA)^{-1}A^T\vec{b}$ provides the solution to the least-squares optimization problem $A\vec{x}\approx\vec{b}$ (non-square $A$).

In [None]:
np.random.seed(42)
A = np.random.rand(100, 10)
x = np.random.rand(10)
b = A.dot(x)
x

In [None]:
np.linalg.inv(A.T.dot(A)).dot(A.T).dot(b)

NumPy has a shortcut for this:

In [None]:
np.linalg.pinv(A).dot(b)

Let's compare this matrix calculation with the betas found by `sklearn.linear_model.LinearRegression()`:

In [None]:
LinearRegression(fit_intercept=False).fit(A, b).coef_