In [1]:
import warnings; warnings.filterwarnings('ignore')
import numpy as np
import oxyba as ox; from importlib import reload; reload(ox);
from time import perf_counter, process_time

### Load Demo Dataset

In [2]:
from sklearn.datasets import load_boston
tmp = load_boston()
num_obs = len(tmp.target);
y = tmp.target
X = np.c_[ np.ones(shape=(num_obs,1)), tmp.data[:,[5,12]] ];

### pinv What?
The `p` stands for **p**seudoinverse or Moore-**P**enrose inverse.
(Just use the mnemonic that fits your brain.)

The pseudoinverse is the inverse based on Singular Value Decomposition but with the correction for ill-conditioned matrices (In [the Docs of linreg_ols_svd](http://oxyba.de/docs/linreg_ols_svd/) it is the `func2_...` test implementation).

The task is to estimate

$$
\hat{\beta} = (X^T X)^{-1} - (X^T y)
$$

whereas the inverse $A^{-1} = (X^T X)^{-1}$ is required.

1. Conduct SVD: $A = U S V^T$
2. Compute the inverse: $A^{-1} = V S^{-1} U^T$

If $A A^T$ is an ill-conditioned or resp. singular matrix then some of $S$'s diagonal elements $diag(s_1, s_2, .., s_n)$ are close to zero and the division-by-zero problems arises. 
The pseudoinverse approach just sets $\frac{1}{s_i}=0$ if a element's value is below a certain threshold $s_i<tol$.

### When to apply pinv?
* not at all because you are not supposed to conduct Linear Regression on ill-conditioned matrices (e.g. multicollinarity)
* there is nothing you can do about $X$ (e.g. your customer want exactly these variables included, it's the best data you got, ...)

### Test Implementations
The first implementation applies Numpy's [pinv](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.pinv.html).
`func1_pretty` is more readable than the faster `func1_faster` function.

In [3]:
def func1_pretty(y,X):
    from numpy import dot
    from numpy.linalg import pinv
    x2inv = pinv(dot(X.T, X));
    return dot(x2inv, dot(X.T,y));


In [4]:
def func1_faster(y, X, rcond=1e-15):
    import numpy as np
    return np.dot( np.linalg.pinv( np.dot(X.T, X), rcond=rcond), np.dot(X.T,y));

The statsmodels's [OLS.fit](https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.fit.html) class method has the option use `pinv`.

In [5]:
def func2_pretty(y, X):
    from statsmodels.api import OLS
    estim = OLS(y,X).fit(method='pinv')
    return estim.params;

In [6]:
def func2_faster(y, X):
    import statsmodels.api as sm
    return sm.OLS(y,X).fit(method='pinv').params;

### Benchmarking 

In [7]:
trials = 50000
funcnames = [func1_pretty, func1_faster, 
             func2_pretty, func2_faster,
             ox.linreg_ols_pinv,
             ox.linreg_ols_lu]

In [8]:
for func in funcnames:
    beta = func(y,X);
    print(func.__name__, beta);

func1_pretty [-1.35827281  5.09478798 -0.64235833]
func1_faster [-1.35827281  5.09478798 -0.64235833]
func2_pretty [-1.35827281  5.09478798 -0.64235833]
func2_faster [-1.35827281  5.09478798 -0.64235833]
linreg_ols_pinv [-1.35827281  5.09478798 -0.64235833]
linreg_ols_lu [-1.35827281  5.09478798 -0.64235833]


Looks good.

In [9]:
print('{0:6s} {1:6s} {2:s}'.format('Clock', 'CPU', 'function name'))
for func in funcnames:
    sh,sc = perf_counter(), process_time();
    for i in range(trials):
        beta = func(y,X); 
        if beta is None: print('error solving')
        beta = None;
    eh,ec = perf_counter(), process_time()
    print('{0:.4f} {1:.4f} {2:s}'.format(eh-sh, ec-sc, func.__name__))

Clock  CPU    function name
5.7880 5.6634 func1_pretty
5.7412 5.6840 func1_faster
25.8798 25.7423 func2_pretty
29.1202 27.6430 func2_faster
5.8252 5.7584 linreg_ols_pinv
2.3903 2.3451 linreg_ols_lu


I am kind of shocked how slow statmodels' implementation is but it is not very surprising because statmodels computes a plethora of metrics for each estimation. The strength of statsmodel is basically to compute all kind of metrics and tests rather than speed.