# Computing vectorized OLS
Implementing univariate, simple OLS for multiple Y and multiple X without using loop. But uses Einstein summation in Python.

The idea is to vectorize the calculation as much as possible. For OLS of $N$ samples looping over $J$ effects and $R$ conditions:

$$\hat{\beta}_{jr} = (x_j^Tx_j)^{-1}x_j^Ty_r$$
$$s(\hat{\beta}_{jr}) = \frac{(x_j^Tx_j)^{-1} (y_r - x_j\beta_r)^T(y_r - x_j\beta_r)}{N-2}$$

The computation can be vectorized, because:

- $x_j^Ty_r$ is in fact each element of matrix $X^TY_{J\times R}$, which can be computed up-front
- The loop over $j$ for $x_j^Tx_j$ can be replaced by Einstein summation notation in `numpy`
- $(y_r - x_j\beta_r)$ is a $N$ vector; the loop over $r$ can again be replaced by Einstein summation. 
- The above calculation will have to be looped over $j$, which, via Einstein summation will be expanded to a 3D array without the need to loop.

## Implementation

In [13]:
import numpy as np
N = 10
R = 2
J = 3
X = np.random.rand(N, J)
Y = np.random.rand(N, R)

### Expected results

In [14]:
from scipy.stats import linregress
from sklearn.linear_model import LinearRegression

def univariate_simple_regression(X, y, Z=None):
    if Z is not None:
        model = LinearRegression()
        model.fit(Z, y)
        y = y - model.predict(Z)
    return np.vstack([linregress(x, y) for x in X.T])[:,[0,1,4]]

def get_summary_stats(X,Y):
    B = np.zeros((X.shape[1], Y.shape[1]))
    S = np.zeros((X.shape[1], Y.shape[1]))
    for r, y in enumerate(Y.T):
        B[:,r], S[:,r] = univariate_simple_regression(X, y)[:,[0,2]].T
    return B, S

res = get_summary_stats(X, Y)

### Compute $\hat{\beta}$

In [3]:
X -= np.mean(X, axis=0, keepdims=True)
Y -= np.mean(Y, axis=0, keepdims=True)
XtY = X.T @ Y
XtX_vec = np.einsum('ji,ji->i', X, X)
Bhat = XtY / XtX_vec[:,np.newaxis]

In [4]:
Bhat

array([[ 0.26552071, -0.46715414],
       [-0.24884064, -0.10294156],
       [ 0.0422749 , -0.05770318]])

In [5]:
res[0]

array([[ 0.26552071, -0.46715414],
       [-0.24884064, -0.10294156],
       [ 0.0422749 , -0.05770318]])

### Compute $s(\hat{\beta})$

In [9]:
Xr = Y - np.einsum('ij,jk->jik', X, Bhat)
Re = np.einsum('ijk,ijk->ik', Xr, Xr)
S = np.sqrt(Re / XtX_vec[:,np.newaxis] / (X.shape[0] - 2))

In [10]:
S

array([[ 0.48999972,  0.41774897],
       [ 0.29072673,  0.27105792],
       [ 0.39868505,  0.35864399]])

In [11]:
res[1]

array([[ 0.48999972,  0.41774897],
       [ 0.29072673,  0.27105792],
       [ 0.39868505,  0.35864399]])

## Putting all together


In [17]:
from sklearn.linear_model import LinearRegression

def get_summary_stats(X, Y, Z=None):
    if Z is not None:
        model = LinearRegression()
        for j in Y.shape[1]:
            model.fit(Z, Y[:,j])
            Y[:,j] = Y[:,j] - model.predict(Z)
    # Compute Bhat
    X -= np.mean(X, axis=0, keepdims=True)
    Y -= np.mean(Y, axis=0, keepdims=True)
    XtY = X.T @ Y
    XtX_vec = np.einsum('ji,ji->i', X, X)
    Bhat = XtY / XtX_vec[:,np.newaxis]
    Xr = Y - np.einsum('ij,jk->jik', X, Bhat)
    Re = np.einsum('ijk,ijk->ik', Xr, Xr)
    S = np.sqrt(Re / XtX_vec[:,np.newaxis] / (X.shape[0] - 2))
    return Bhat, S