In [1]:
import warnings; warnings.filterwarnings('ignore')
import numpy as np
import oxyba as ox; from importlib import reload; reload(ox);
from time import perf_counter, process_time

### Load Demo Dataset

In [2]:
from sklearn.datasets import load_boston
tmp = load_boston()
num_obs = len(tmp.target);
y = tmp.target
X = np.c_[ np.ones(shape=(num_obs,1)), tmp.data[:,[5,12]] ];

### Some other Contenders
I use sklearn's [LinearRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html). 

In [3]:
def func1(y,x):
    from sklearn.linear_model import LinearRegression
    #lr = LinearRegression(fit_intercept=False, copy_X=False, n_jobs=-1); #doesn't make a difference
    lr = LinearRegression(fit_intercept=False)
    lr.fit(y=y, X=X);
    return lr.coef_;

### Benchmarking

In [4]:
trials = 50000
funcnames = [ox.linreg_ols_lu,
             ox.linreg_ols_pinv,
             ox.linreg_ols_svd,
             ox.linreg_ols_qr,
             func1]

In [5]:
for func in funcnames:
    beta = func(y,X);
    print(func.__name__, beta);

linreg_ols_lu [-1.35827281  5.09478798 -0.64235833]
linreg_ols_pinv [-1.35827281  5.09478798 -0.64235833]
linreg_ols_svd [-1.35827281  5.09478798 -0.64235833]
linreg_ols_qr [-1.35827281  5.09478798 -0.64235833]
func1 [-1.35827281  5.09478798 -0.64235833]


Yep. Looks good

In [6]:
print('{0:6s} {1:6s} {2:s}'.format('Clock', 'CPU', 'function name'))
for func in funcnames:
    sh,sc = perf_counter(), process_time();
    for i in range(trials):
        beta = func(y,X); 
        if beta is None: print('error solving')
        beta = None;
    eh,ec = perf_counter(), process_time()
    print('{0:.4f} {1:.4f} {2:s}'.format(eh-sh, ec-sc, func.__name__))

Clock  CPU    function name
3.4387 3.0257 linreg_ols_lu
8.3896 8.1435 linreg_ols_pinv
8.9878 8.8945 linreg_ols_svd
10.1030 10.0522 linreg_ols_qr
30.2831 29.9103 func1


We got a winner: `linreg_ols_lu`

Surprisingly sklearn's `LinearRegression` class is almost 10x slower than Numpy's/LAPACK's `solve` based on LU decomposition. It seems that [LinearRegression](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/base.py) applies `scipy.linalg.lstsq` what actually a wrapper for `numpy.linalg.lstsq`. In other words, sklearn's LinearRegression is based on Singular Value Decomposition. However, it is still 3x times slower than `linreg_ols_svd` that applies `numpy.linalg.lstsq`. All the sklearn overhead costs performance.