In [43]:
import math
import numpy as np

In [81]:
def inv_safe(mat1):
    matinv = np.zeros_like(mat1)
    for ix in range(mat1.shape[0]):
        elem = mat1[ix,ix]
        if abs(elem) > 1e-5:
            matinv[ix,ix] = 1./elem
        else:
            matinv[ix,ix] = 0
    #print(matinv)
    return matinv

def validate_inv(a, ai):
    eye_a = a @ ai
    diff = np.sum(abs(eye_a - np.eye(a.shape[0])).ravel())
    print(diff)
    return diff < 1e-5

def total_unimodularize(mat1):
    mat1[mat1 < 0.33] = -1
    mat1[np.logical_and(mat1>=0.33,mat1<0.66)] = 0
    mat1[mat1>=0.66] = +1
    return mat1



In [82]:
zz = total_unimodularize(np.random.rand(3,5))
print('zz\n',zz)

zzz = zz @ zz.T
print('zzz\n',zzz)

u, s, v = np.linalg.svd(zzz)
#print('u\n', u); print('s\n', s); print('v\n', v)

zzzi = v.T @ inv_safe(np.diag(s)) @ u.T 
print('zzzi\n', zzzi)

validate_inv(zzz, zzzi)

zz
 [[ 0. -1.  1. -1.  1.]
 [-1.  0.  0.  1. -1.]
 [-1. -1.  0. -1.  0.]]
zzz
 [[ 4. -2.  2.]
 [-2.  3.  0.]
 [ 2.  0.  3.]]
zzzi
 [[ 0.75        0.5        -0.5       ]
 [ 0.5         0.66666667 -0.33333333]
 [-0.5        -0.33333333  0.66666667]]
5.3290705182e-15


True

In [50]:
u[0,0]

-0.66810109918390403

# Gaussian Elimination

Computational efficiency
The number of arithmetic operations required to perform row reduction is one way of measuring the algorithm's computational efficiency. For example, to solve a system of n equations for n unknowns by performing row operations on the matrix until it is in echelon form, and then solving for each unknown in reverse order, requires n(n+1) / 2 divisions, (2n3 + 3n2 − 5n)/6 multiplications, and (2n3 + 3n2 − 5n)/6 subtractions,[8] for a total of approximately 2n3 / 3 operations. Thus it has arithmetic complexity of O(n3); see Big O notation. This arithmetic complexity is a good measure of the time needed for the whole computation when the time for each arithmetic operation is approximately constant. This is the case when the coefficients are represented by floating point numbers or when they belong to a finite field. If the coefficients are integers or rational numbers exactly represented, the intermediate entries can grow exponentially large, so the bit complexity is exponential.[9] However, there is a variant of Gaussian elimination, called Bareiss algorithm that avoids this exponential growth of the intermediate entries, and, with the same arithmetic complexity of O(n3), has a bit complexity of O(n5).

This algorithm can be used on a computer for systems with thousands of equations and unknowns. However, the cost becomes prohibitive for systems with millions of equations. These large systems are generally solved using iterative methods. Specific methods exist for systems whose coefficients follow a regular pattern (see system of linear equations).

To put an n by n matrix into reduced echelon form by row operations, one needs n^{3} arithmetic operations; which is approximately 50% more computation steps.[10]

One possible problem is numerical instability, caused by the possibility of dividing by very small numbers. If, for example, the leading coefficient of one of the rows is very close to zero, then to row reduce the matrix one would need to divide by that number so the leading coefficient is 1. This means any error that existed for the number which was close to zero would be amplified. Gaussian elimination is numerically stable for diagonally dominant or positive-definite matrices. For general matrices, Gaussian elimination is usually considered to be stable, when using partial pivoting, even though there are examples of stable matrices for which it is unstable.[11]

src: [wiki](https://www.wikiwand.com/en/Gaussian_elimination#/Computational_efficiency)

# SVD

The SVD of a matrix M is typically computed by a two-step procedure. In the first step, the matrix is reduced to a bidiagonal matrix. This takes O(mn2) floating-point operations (flops), assuming that m ≥ n. The second step is to compute the SVD of the bidiagonal matrix. This step can only be done with an iterative method (as with eigenvalue algorithms). 

[wiki](https://www.wikiwand.com/en/Singular_value_decomposition#/Calculating_the_SVD)

# QR Decomposition


The QR decomposition via Givens rotations is the most involved to implement, as the ordering of the rows required to fully exploit the algorithm are not trivial to determine. However, it has a significant advantage in that each new zero element affects only the row with the element to be zeroed and the row above. This makes the Givens rotation algorithm more bandwidth efficient and parallelisable, in contrast with the Householder reflection technique.

[wiki](https://www.wikiwand.com/en/QR_decomposition#/Computing_the_QR_decomposition)

# Gram Schmidt

The cost of this algorithm is asymptotically 2nk2 floating point operations, where n is the dimensionality of the vectors (Golub & Van Loan 1996, §5.2.8).



# Cholesky

There are various methods for calculating the Cholesky decomposition. The computational complexity of commonly used algorithms is O(n3) in general.[citation needed] The algorithms described below all involve about n3/3 FLOPs, where n is the size of the matrix A. Hence, they are half the cost of the LU decomposition, which uses 2n3/3 FLOPs (see Trefethen and Bau 1997).

Which of the algorithms below is faster depends on the details of the implementation. Generally, the first algorithm will be slightly slower because it accesses the data in a less regular manner.

[wiki](https://www.wikiwand.com/en/Cholesky_decomposition#/Computation)