# Moore-Penrose Pseudo-Inverse

The pseudo-inverse, denoted by $A^\dagger$, is a fundamental concept in linear algebra and linear algebra's applictions that shares similarities with the traditional inverse operation for a matrix while not necessairly being the inverse of the given matrix in question. Given a matrix A, the pseudo-inverse $A^\dagger$ can be computed using the singular value decomposition(SVD).

$A_{m \times n} = U \Sigma V^T$
 
In many applications, such as signal processing and machine learning, the pseudo-inverse plays a crucial role in solving ill-conditioned systems and finding optimal solutions. One such example of this is that the pseudo-inverse provides a minimum-norm least squares solution to the system $A\vec{x}=\vec{b}$, even when $A$ is not invertible. Which is why it gets the name "pseudo-inverse". The fact that an inverse-like matrix can be established and used in a similar manner to the traditional inverse of a matrix is what ultimately gives the pseudo-inverse its value. Additionally though, if we find the pseudo-inverse of matrix that is in fact invertable the the pseudo-inverse is the inverse of a matrix (i.e. $A^\dagger = A^{-1}$). The best place to begin understanding the pseudo-inverse is to look at how one can find the pseudo inverse of a matrix.

## Finding the Pseudo-Inverse

Suppose we have some linear system (because of course, we always tie thing back to solving a linear system):

$A\vec{x}=\vec{b}$

What is the easiest way, analyticaly speaking, in one operation, to solve this linear system?

$A^{-1} A\vec{x}=A^{-1}\vec{b}$

$I\vec{x}=A^{-1}\vec{b}$

$\vec{x}=A^{-1}\vec{b}$

But $A$ is not always invertable, hence we are covering the pseudo-inverse. So say we do the same thing but with a decomposition of $A$.

Every matix will have a singular value decomposition (SVD). So we can rewrite our linear system as the following:

$U \Sigma V^T \vec{x}=\vec{b}$

So now invert everything from our decomposition (remember $U$ and $V^T$ are orthogonal, and $\Sigma$ is square and diagonal).

First move over $U$.

$U^T U \Sigma V^T \vec{x}= U^T\vec{b}$

$\Sigma V^T \vec{x}= U^T\vec{b}$

Then move over $\Sigma$.

$\Sigma^{-1} \Sigma V^T \vec{x}= \Sigma^{-1} U^T\vec{b}$

$V^T \vec{x}= \Sigma^{-1} U^T\vec{b}$

Last move over $V$.

$V V^T \vec{x}= V \Sigma^{-1} U^T\vec{b}$

$\vec{x}= V \Sigma^{-1} U^T\vec{b}$

This matrix product on the right hand side (not including $\vec{b}$) is the pseudo-inverse $A^{\dagger}$.

$A^{\dagger} = V \Sigma^{-1} U^T$

If we use the SVD of a matrix this is easy to compute since we are just transposing matrices ($U$ and $V^T$) and inverting the singular values along the diagonal of $\Sigma$. This will work for any $A_{m \times n}$. And we can test it out with the following code.

In [34]:
import numpy as np

def pseudo_inverse(A):
    """
    Compute the Moore-Penrose pseudo-inverse of a matrix A using SVD.
    """
    A = np.array(A, dtype=float)

    # Compute SVD decomposition of A into U, Sigma, and Vh matrices
    u, sig, vt = np.linalg.svd(A, full_matrices=False)
    print(f"Singular Values: {sig}\n")
    
    sigma_inv = 1 / sig[::-1]
    idx = np.argsort(sig)
    new_sigma_inv = np.zeros_like(sigma_inv)
    new_sigma_inv[idx[:len(sigma_inv)]] = sigma_inv
    print(f"Inverse of singular values: {new_sigma_inv}\n")

    # Create a diagonal matrix with the inverse of singular values
    inv_Sigma = np.diag(new_sigma_inv)

    # Compute the pseudo-inverse (Moore-Penrose inverse) using the formula:
    # pinv_A = V * inv_Sigma * U^T
    pinv_A = np.dot(vt.T,np.dot(inv_Sigma, u.T))
    return pinv_A


A = np.random.randint(-1000, 1001, size=(5, 5))
A_dag = pseudo_inverse(A)
print(f"A:\n {A}\n")
print(f"A Dagger:\n {A_dag}\n")

# Compute A Inverse A
AinvA = np.round(np.dot(A_dag, A))
print(f"A Dagger A: \n{AinvA}")

Singular Values: [2500.97078779 1660.16788082 1438.97551827  720.80997463  133.52589418]

Inverse of singular values: [0.00039984 0.00060235 0.00069494 0.00138733 0.00748918]

A:
 [[ 676  457  906  992  480]
 [-726 -735  124  891  577]
 [-234 -704  463  318 -387]
 [-931  882  788 -946 -835]
 [ 855  150  185  682  763]]

A Dagger:
 [[-2.53698093e-04 -6.73228369e-04  5.58746087e-04 -1.22680332e-04
   8.17856504e-04]
 [ 1.49631249e-03 -4.99018838e-05 -1.36821512e-03 -4.24490406e-04
  -2.06210400e-03]
 [-1.09372304e-03  7.45789136e-05  1.24152531e-03  1.18479160e-03
   2.55796371e-03]
 [ 2.40879601e-03  9.27173480e-05 -9.97131809e-04 -1.35827769e-03
  -3.57768267e-03]
 [-1.89776556e-03  6.63257153e-04  2.33114121e-04  1.14773811e-03
   3.37720129e-03]]

A Dagger A: 
[[ 1. -0. -0. -0. -0.]
 [-0.  1. -0. -0.  0.]
 [ 0. -0.  1.  0. -0.]
 [-0.  0. -0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]


In [35]:
# The overdetermined system
B = np.random.randint(-1000, 1001, size=(6, 4))
B_dag = pseudo_inverse(B)
print(f"B:\n {B}\n")
print(f"B Dagger:\n {B_dag}\n")

# Compute B Inverse B
BinvB = np.round(np.dot(B_dag, B))
print(f"B Dagger B: \n{BinvB}")

Singular Values: [2017.30118406 1282.77367783  817.07317577  423.63669539]

Inverse of singular values: [0.00049571 0.00077956 0.00122388 0.00236051]

B:
 [[-125 -181  221  558]
 [-290 -449 -395  221]
 [ 525 -824  823  244]
 [ 929 -412  130 -484]
 [-275  469  220  862]
 [ 688 -353  646 -765]]

B Dagger:
 [[-3.73722748e-05 -2.59700017e-04 -1.10031054e-04  1.60339205e-03
   5.61089952e-04 -5.19578069e-04]
 [-1.99929612e-04 -7.90491137e-04 -5.02744409e-04  1.84697405e-04
   5.72304770e-04 -6.53036959e-06]
 [ 1.14630810e-04 -5.00925685e-04  4.40166475e-04 -9.49305864e-04
   1.04163580e-04  7.97272000e-04]
 [ 3.13723704e-04  1.07210219e-04  2.04272197e-04  6.35058641e-04
   6.50290778e-04 -6.51273242e-04]]

B Dagger B: 
[[ 1.  0. -0.  0.]
 [ 0.  1. -0.  0.]
 [ 0.  0.  1. -0.]
 [ 0.  0.  0.  1.]]


In [36]:
# The underdetermined system
C = np.random.randint(-1000, 1001, size=(3, 5))
C_dag = pseudo_inverse(C)
print(f"C:\n {C}\n")
print(f"C Dagger:\n {C_dag}\n")

# Compute C Inverse C
CinvC = np.round(np.dot(C_dag, C))
# We should expect some of these to be zero since this is the underdetermined case
# and the product will be a larger matrix
print(f"C Dagger C: \n{CinvC}")

Singular Values: [1755.85994291 1000.57051684  819.00396928]

Inverse of singular values: [0.00056952 0.00099943 0.001221  ]

C:
 [[ -35 -758  449  483  161]
 [-536  -71 -822  340   -9]
 [ 868  697 -909  456 -601]]

C Dagger:
 [[ 3.10059840e-04 -6.46742779e-04  5.25156867e-04]
 [-6.99533979e-04 -1.76090336e-04  6.68889464e-05]
 [ 1.99195211e-04 -6.68080784e-04 -1.81621691e-04]
 [ 8.23242741e-04  2.71723926e-04  3.96077661e-04]
 [-4.01214617e-05  7.83425077e-05 -2.52640121e-04]]

C Dagger C: 
[[ 1.  0.  0.  0. -0.]
 [ 0.  1. -0. -0. -0.]
 [ 0. -0.  1. -0.  0.]
 [ 0. -0. -0.  1. -0.]
 [-0. -0.  0. -0.  0.]]


As we see in the previous examples when we compute $A^\dagger A$ (or some variation with other matrix variables), we get back the identity matrix for the square and overdetermined cases, and a diagonal matrix with only ones and zeros along the diagonal for the underdetermined case because the matrix product will result in a larger dimension matrix. This shows us that $A^\dagger$ behaves like $A^{-1}$ despite the fact that in the non-square examples the matrices are not neceessairly invertable. Hence, in those cases the we consider the computed psuedo-inverse matrix "pseudo" and not a traditional inverse matrix.

## Testing in linear systems

Naturally this provides us with another way of finding a solution or least squares solution for a linear system. We can test that out with the following code, again for square, overdetermined, and underdetermined matrices.

In [37]:
# Square linear system
b1 = np.random.randint(-1000, 1001, size=(len(A), 1))

print(f"b vector: \n {b1}\n")

# "Solve" (approximate solution) linear system
x1 = np.dot(A_dag, b1)
# Confirm approx solution
check = b1 - np.dot(A, x1)
print(f"b - Ax: \n{check}")

b vector: 
 [[-937]
 [ 368]
 [-646]
 [-646]
 [-779]]

b - Ax: 
[[-1.13686838e-13]
 [ 3.97903932e-13]
 [ 2.27373675e-13]
 [ 7.95807864e-13]
 [-3.41060513e-13]]


In [38]:
# Underdetermined system
b3 = np.random.randint(-1000, 1001, size=(len(C), 1))

print(f"b vector: \n {b3}\n")

# "Solve" (approximate solution) linear system
x3 = np.dot(C_dag, b3)
# Confirm approx solution
check = b3 - np.dot(C, x3)
print(f"b - Cx: \n{check}")

b vector: 
 [[  84]
 [-141]
 [-799]]

b - Cx: 
[[ 2.98427949e-13]
 [ 5.68434189e-14]
 [-5.68434189e-13]]


In [39]:
# Overdetermined system (remember this won't have a prefect solution usually)
b2 = np.random.randint(-1000, 1001, size=(len(B), 1))

print(f"b vector: \n {b2}\n")

# "Solve" (approximate solution) linear system
x2 = np.dot(B_dag, b2)

# Use least squares to get a least squares solution
x2_lsq = np.linalg.lstsq(B, b2, rcond=-1)

# Confirm least-squares solution
check = x2 - x2_lsq[0]
print(f"x(pseudo-inverse) - x(least-squares): \n{check}")

b vector: 
 [[ 656]
 [ 158]
 [-984]
 [-947]
 [ 865]
 [-346]]

x(pseudo-inverse) - x(least-squares): 
[[ 3.33066907e-16]
 [ 0.00000000e+00]
 [-3.33066907e-16]
 [ 3.60822483e-16]]


As seen in the previous example with the overdetermined system, we cannot find a solution so instead using the pseudo-inverse provides us with a least squares solution rather than an actual solution to the linear system.


### References

https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse

https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/0550c89b69c99e97dcbf52074e293308_MIT18_06SCF11_Ses3.8sum.pdf