In [1]:
import warnings; warnings.filterwarnings('ignore')
import numpy as np
import oxyba as ox
from importlib import reload; reload(ox);

We will check if a Linear Regression model outperform the baseline model in terms of crossvalidation results.

### Create Wrapper Functions for the Model/Algorithm
The `oxyba.crossvalidation_loop` function requires a fit/estimation function that returns coefficients, and an evaluation function that returns some fitness score. 

In `myfit` the Linear Regression model based on LU decomposition `oxyba.linreg_ols_lu` is wrapped.

In [2]:
def myfit(data):
    import oxyba as ox
    return ox.linreg_ols_lu(data[:,0], data[:,1:]);

The `myeval` wraps the RMSE calculation for Linear Regressions `oxyba.linreg_rmse`.

In [3]:
def myeval(data, coeffs):
    import oxyba as ox
    return ox.linreg_rmse(data[:,0], data[:,1:], coeffs);

### Load the Demo dataset
As usual the Boston demo dataset is used. 

The Linear Regression Model doe snot use an intercept and is specified as

$$
y = \beta_1 x_{RM} + \beta_2 x_{LSTAT} + \epsilon
$$


In [4]:
from sklearn.datasets import load_boston
tmp = load_boston()
y = tmp.target
varnames = tmp.feature_names[[5,12]]
X = np.c_[tmp.data[:,[5,12]] ];

20% of the dataset is reserved for testing lateron

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, shuffle=False)
N=len(y_train)
print('Number of observations in the training set: {:d}'.format(N))

Number of observations in the training set: 404


### Number of CV Blocks
Cross-validation is often conducted with `K=10` or `K=5` number of blocks. 
However, running a t-Test by comparing just 5 or 10 CV errors, it is statistically not really the right thing to do. 
In order to run a proper t-Test lateron the number of blcoks should be `K=30`.
However, out training set has just `N=404` observations, what would result in just `int(N/K)=13` observations to compute the fitess score with a validation block.

With `N=404` a trade-off could be `K=20` what implies a block size of `20`. 
However, this notebook is also about demonstrating the t-Test.

In [6]:
K=30

### The Linear Regression Model
We put the target variable `y` and design matrix `X` into the array `data`.

In [7]:
data = np.c_[y_train, X_train]

First, run a Jackknife Test and check if the p-values of the estimated coefficients are significant.
And yes they are.

In [8]:
coeff_subs, coeff_full = ox.jackknife_loop(myfit, data, d=1);
jk_pval, jk_tstat, jk_coeff, jk_se, _ = ox.jackknife_stats(coeff_subs, coeff_full, d=1)
ox.jackknife_print(jk_pval, jk_tstat, jk_coeff, jk_se, d=1)


Delete-1 Jackknife
                                   Var0       Var1    
                      p-Values:    0.00000    0.00000 
                      t-Scores:   49.96298   -12.80183 
 Jackknife Standard Error (SE):    0.09926    0.04928 
   Jackknife Estimates (theta):    4.959     -0.631   


Second, run the cross-validation procedure. 

In [9]:
errors1, coeffs1, idxmat = ox.crossvalidation_loop(
    myfit, myeval, data, K=K, random_state=42)

print('{2:>30s}: {0:.4f}\n{3:>30s}: {1:.4f}'.format(
    *ox.crossvalidation_error(errors1), 
    'CV error of the baseline model', 
    'CV std dev'))

CV error of the baseline model: 5.2147
                    CV std dev: 1.6908


One cross-validation result is worth nothing if it is not compared to another model, e.g. the model with different hyperparmeters, another model class, or a baseline model. 
A baseline model is some simple naive prediction. 

### The Baseline Model
In our case, the baseline model is the mean

$$
y = \mu + \epsilon
$$

The `data` variable contain the target variable `y` and the intercept term as `x`.

In [10]:
data = np.c_[y_train, np.ones(shape=(N,1))]

First, let's check the Jackknife Estimation on the training set. The mean $\mu$ turns out to be an eligible predictor of the target variable (Check the JK p-value below).

In [11]:
coeff_subs, coeff_full = ox.jackknife_loop(myfit, data, d=1);
jk_pval, jk_tstat, jk_coeff, jk_se, _ = ox.jackknife_stats(coeff_subs, coeff_full, d=1)
ox.jackknife_print(jk_pval, jk_tstat, jk_coeff, jk_se, d=1)


Delete-1 Jackknife
                                   Var0    
                      p-Values:    0.00000 
                      t-Scores:   52.47001 
 Jackknife Standard Error (SE):    0.46075 
   Jackknife Estimates (theta):   24.176   


Second, conduct the cross-validation procedure. We will use exactly the same reshuffled blocks `idxmat` that have been used above.

In [12]:
errors2, coeffs2, _ = ox.crossvalidation_loop(
    myfit, myeval, data, idxmat=idxmat)

print('{2:>30s}: {0:.4f}\n{3:>30s}: {1:.4f}'.format(
    *ox.crossvalidation_error(errors2), 
    'CV error of the baseline model', 
    'CV std dev'))

CV error of the baseline model: 8.9902
                    CV std dev: 2.1711


Just by eyeballing, we might conclude that the Linear Regression model is better than its Baseline model.
However, the CV error difference might not always very big and thus wrong conclusions not very far away.

### Is the difference significant?
Depending on your own threshold you might accept or reject the model.
(In Social Science `p < 0.05` would be close enough to Zero)

In [13]:
pvalue, tscore, se, mu = ox.crossvalidation_stats(errors1, errors2)
print('P-Value of CV error differences: {:.4f}'.format(pvalue))

P-Value of CV error differences: 0.0309
