<div style="text-align:center"><span style="font-size:2em; font-weight: bold;"> Lecture 7—Training</span></div>

# Programming: Quadratic Programming

When performing optimization, it is sometimes possible to reduce a given problem to a linear or quadratic programming problem. Such problems have extremely fast solutions, which we can use to speed up our code. The downside is that we have to do a little math to put them in this form.

## Linear Programming

$$
\begin{align}
\min_{x,s}\;&c'x \\
\text{subject to}\;& Gx+s=h \\
& Ax=b \\
& s\geq0
\end{align}
$$

## Quadratic Programming

$$
\begin{align}
\min_{x}\;&(1/2)x'Px+q'x \\
\text{subject to}\;& Gx\leq h \\
& Ax=b
\end{align}
$$

$$
\min_\beta e'e = \min_\beta (y-X\beta)'(y-X\beta) = \min_\beta y'y-2y'X\beta+\beta'X'X\beta
$$
$$
\min_\beta \beta'X'X\beta-2y'X\beta
$$
$$
\min_\beta (1/2)\beta'X'X\beta-y'X\beta
$$

# Data Science

## $L_1$ Regularization

Given a model:
$$y=\mathbf X\beta+e$$
The $L_1$ regularization of the model is defined by:
$$
\begin{align}
\min_\beta \;& e'e \\
\text{subject to}\;&\Vert\beta\Vert_1 \leq T
\end{align}
$$
Which is equivalent to the summation notation version:
$$
\begin{align}
\min_\beta \;& \sum_{i=1}^n e_i^2\\
\text{subject to}\;&\sum_{j=1}^r\vert\beta_j\vert \leq T
\end{align}
$$
Solving for $\beta$ we obtain:
$$
\begin{align}
\min_\beta \;& (1/2)\beta'\mathbf X'\mathbf X \beta-y'\mathbf X \beta\\
\text{subject to}\;&\mathbf 1_r'\vert\beta\vert \leq T
\end{align}
$$ 
To put this in terms where we can use the quadratic programming format we need to split the betas into their positive and negative components: $\beta=\beta_+-\beta_-$ So we define $\beta_\pm = [\beta_+',\beta_-']'$
$$
\begin{align}
\min_{\beta_{\pm}} \;& (1/2)\beta_\pm'\left(
\begin{bmatrix}
1 & -1 \\
-1 & 1 
\end{bmatrix}
\otimes\mathbf X'\mathbf X \right)\beta_\pm-\left(
\begin{bmatrix}
1 & -1
\end{bmatrix}
\otimes y'\mathbf X \right) \beta_{\pm}\\
\text{subject to}\;&\mathbf 1_{2r}'\beta_\pm \leq T\\
& -\mathbf I_{2r} \beta_{\pm}\leq 0
\end{align}
$$

In [2]:
!pip install --user cvxopt



In [2]:
import numpy as np
import pandas as pd
import cvxopt as cvx
import cvxopt.solvers as solv
from scipy.stats import zscore

df = pd.read_csv('BWGHT.csv')
npx = df[['cigs','faminc','male','white']].values
npy = df['bwght'].values
ones = np.ones((npx.shape[0],1))
npx = np.hstack((ones,npx))

In [3]:
thresh = 100
def solve_lasso(x,y,thresh):
    n,r = x.shape
    P = np.kron(np.array([[1,-1],[-1,1]]),x.T@x)
    q = -np.kron(np.array([[1],[-1]]),x.T@y.reshape(-1,1))
    G_1 = -np.eye(2*r)
    h_1 = np.zeros((2*r,1))
    G_2 = np.ones((1,2*r))
    h_2 = np.array([[thresh]])
    G = np.vstack((G_1,G_2))
    h = np.vstack((h_1,h_2))
    opt = solv.qp(cvx.matrix(P),cvx.matrix(q),cvx.matrix(G),cvx.matrix(h))
    opt = np.array(opt['x'])
    return opt[:r,0]-opt[r:,0]
np.abs(solve_lasso(npx,npy,thresh)).sum()

     pcost       dcost       gap    pres   dres
 0: -9.7915e+06 -9.7909e+06  3e+04  2e+00  5e-16
 1: -9.7915e+06 -9.7908e+06  2e+03  1e-01  1e-15
 2: -9.7915e+06 -9.7891e+06  7e+02  7e-02  4e-16
 3: -9.7841e+06 -9.7591e+06  7e+03  5e-02  5e-16
 4: -9.7706e+06 -9.7445e+06  1e+04  3e-02  4e-16
 5: -9.7305e+06 -9.7361e+06  3e+04  1e-02  3e-16
 6: -9.7259e+06 -9.7349e+06  3e+04  9e-03  9e-17
 7: -9.7289e+06 -9.7317e+06  7e+03  2e-03  1e-16
 8: -9.7257e+06 -9.7323e+06  9e+03  1e-03  9e-17
 9: -9.7285e+06 -9.7318e+06  4e+03  5e-04  9e-17
10: -9.7260e+06 -9.7317e+06  6e+03  2e-04  7e-17
11: -9.7294e+06 -9.7313e+06  2e+03  5e-05  4e-17
12: -9.7270e+06 -9.7318e+06  5e+03  2e-05  4e-17
13: -9.7290e+06 -9.7316e+06  3e+03  1e-05  1e-16
14: -9.7268e+06 -9.7315e+06  5e+03  2e-06  7e-17
15: -9.7295e+06 -9.7312e+06  2e+03  8e-07  8e-17
16: -9.7273e+06 -9.7317e+06  4e+03  3e-07  4e-17
17: -9.7290e+06 -9.7316e+06  3e+03  1e-07  2e-16
18: -9.7270e+06 -9.7315e+06  4e+03  2e-08  1e-16
19: -9.7295e+06 -9.73

99.99977369045132

In [4]:
from cleands import *
least_squares_regressor(npx,npy).params

array([ 1.12065256e+02, -4.74159926e-01,  6.00548455e-02,  3.14523963e+00,
        5.40726154e+00])

In [5]:
solve_lasso(npx,npy,thresh)

     pcost       dcost       gap    pres   dres
 0: -9.7915e+06 -9.7909e+06  3e+04  2e+00  5e-16
 1: -9.7915e+06 -9.7908e+06  2e+03  1e-01  1e-15
 2: -9.7915e+06 -9.7891e+06  7e+02  7e-02  4e-16
 3: -9.7841e+06 -9.7591e+06  7e+03  5e-02  5e-16
 4: -9.7706e+06 -9.7445e+06  1e+04  3e-02  4e-16
 5: -9.7305e+06 -9.7361e+06  3e+04  1e-02  3e-16
 6: -9.7259e+06 -9.7349e+06  3e+04  9e-03  9e-17
 7: -9.7289e+06 -9.7317e+06  7e+03  2e-03  1e-16
 8: -9.7257e+06 -9.7323e+06  9e+03  1e-03  9e-17
 9: -9.7285e+06 -9.7318e+06  4e+03  5e-04  9e-17
10: -9.7260e+06 -9.7317e+06  6e+03  2e-04  7e-17
11: -9.7294e+06 -9.7313e+06  2e+03  5e-05  4e-17
12: -9.7270e+06 -9.7318e+06  5e+03  2e-05  4e-17
13: -9.7290e+06 -9.7316e+06  3e+03  1e-05  1e-16
14: -9.7268e+06 -9.7315e+06  5e+03  2e-06  7e-17
15: -9.7295e+06 -9.7312e+06  2e+03  8e-07  8e-17
16: -9.7273e+06 -9.7317e+06  4e+03  3e-07  4e-17
17: -9.7290e+06 -9.7316e+06  3e+03  1e-07  2e-16
18: -9.7270e+06 -9.7315e+06  4e+03  2e-08  1e-16
19: -9.7295e+06 -9.73

array([ 9.95019244e+01, -9.20345888e-07,  4.97724354e-01,  5.62908348e-05,
        6.77729885e-05])

In [6]:
np.abs(least_squares_regressor(npx,npy).params).sum()

121.15197211648324

In [7]:
thresh = 130
np.abs(solve_lasso(npx,npy,thresh)).sum()

     pcost       dcost       gap    pres   dres
 0: -9.7915e+06 -9.7926e+06  3e+04  1e+00  4e-16
 1: -9.7915e+06 -9.7924e+06  1e+03  2e-02  5e-16
 2: -9.7915e+06 -9.7916e+06  1e+02  3e-04  2e-16
 3: -9.7915e+06 -9.7915e+06  1e+00  3e-06  2e-16
 4: -9.7915e+06 -9.7915e+06  1e-02  3e-08  2e-17
Optimal solution found.


121.15196701975871

In [8]:
from cleands import *

class l1_regularization_regressor(least_squares_regressor):
    def __init__(self,x,y,thresh:float,*args,**kwargs):
        super(l1_regularization_regressor,self).__init__(x,y,thresh=thresh,*args,**kwargs)
        self.threshold=thresh
    def __fit__(self,x,y,thresh:float,*args,**kwargs):
        if x[:,0].var()==0:
            dx = x[:,1:]-x[:,1:].mean(0)
            dy = y-y.mean(0)
            outp = solve_lasso(dx,dy,thresh)
            intc = y.mean(0)-x[:,1:].mean(0)@outp.reshape(-1,1)
            return np.concatenate([intc,outp])
        else:
            return solve_lasso(x,y,thresh)
        
params = l1_regularization_regressor(npx,npy,thresh=5).params
print(params)
print(np.abs(params[1:]).sum())

     pcost       dcost       gap    pres   dres
 0: -1.3346e+04 -1.3345e+04  1e+02  2e+00  6e-16
 1: -1.3346e+04 -1.3333e+04  1e+01  3e-01  6e-16
 2: -1.3345e+04 -1.3282e+04  1e+01  3e-01  7e-16
 3: -1.3150e+04 -1.2618e+04  3e+02  2e-01  1e-15
 4: -1.2211e+04 -1.2253e+04  4e+01  2e-16  4e-16
 5: -1.2252e+04 -1.2252e+04  4e-01  4e-16  2e-16
 6: -1.2252e+04 -1.2252e+04  4e-03  4e-16  2e-16
Optimal solution found.
[ 1.14402171e+02 -4.58683832e-01  7.46625357e-02  1.58104713e+00
  2.88559886e+00]
4.999992358655065


In [9]:
least_squares_regressor(npx,npy).params

array([ 1.12065256e+02, -4.74159926e-01,  6.00548455e-02,  3.14523963e+00,
        5.40726154e+00])

In [10]:
least_squares_regressor(npx,npy).params[1:].sum()

8.138396096447336

## Cross-validation

In [16]:
def mean_squared_error(model,x,y):
    return ((y-model.predict(x))**2).mean()
def k_fold_cross_validation(model,x,y,
                            folds:int=5,
                            seed=None,
                            statistic=mean_squared_error):
    n,r = x.shape
    deck = np.arange(n)
    outp = []
    if seed is not None: np.random.seed(seed)
    np.random.shuffle(deck)
    for i in range(folds):
        test = deck[int(i*n/folds):int((i+1)*n/folds)]
        train_lower = deck[:int(i*n/folds)]
        train_upper = deck[int((i+1)*n/folds):]
        train = np.concatenate((train_lower,train_upper))
        modl = model(x[train],y[train])
        mspe = statistic(modl,x[test],y[test])
        outp += [mspe]
    return np.array(outp)
k_fold_cross_validation(least_squares_regressor,npx,npy,folds=5,seed=90210).mean()

398.62277614025425

In [17]:
npy.var()

413.98538886212833

In [18]:
(413.98538886212833-398.62277614025425)/413.98538886212833

0.037109069873454824

In [19]:
model = lambda x,y: l1_regularization_regressor(x,y,thresh=9)
k_fold_cross_validation(model,npx,npy,seed=90210).mean()

     pcost       dcost       gap    pres   dres
 0: -1.1663e+04 -1.1674e+04  1e+02  1e+00  1e-15
 1: -1.1663e+04 -1.1663e+04  1e+01  1e-01  1e-15
 2: -1.1663e+04 -1.1656e+04  4e+00  5e-02  1e-15
 3: -1.1636e+04 -1.1595e+04  3e+01  4e-02  8e-16
 4: -1.1579e+04 -1.1577e+04  7e-01  9e-04  1e-16
 5: -1.1576e+04 -1.1576e+04  8e-03  9e-06  2e-16
 6: -1.1576e+04 -1.1576e+04  8e-05  9e-08  2e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.0225e+04 -1.0241e+04  9e+01  8e-01  7e-16
 1: -1.0225e+04 -1.0231e+04  7e+00  1e-02  2e-15
 2: -1.0225e+04 -1.0225e+04  2e-01  1e-04  2e-17
 3: -1.0225e+04 -1.0225e+04  2e-03  1e-06  2e-16
 4: -1.0225e+04 -1.0225e+04  2e-05  1e-08  1e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -9.9615e+03 -9.9740e+03  9e+01  8e-01  4e-15
 1: -9.9615e+03 -9.9641e+03  5e+00  3e-02  5e-15
 2: -9.9615e+03 -9.9619e+03  4e-01  8e-04  4e-16
 3: -9.9616e+03 -9.9616e+03  3e-02  5e-06  1e-16
 4: -9.9616e+03 -9.9616e

398.3159403129158

In [8]:
print(np.sqrt(398.95848731234884))
print(np.sqrt(npy.var()))

19.97394521150864
20.346630897082896


In [20]:
class l1_cross_validation_regressor(l1_regularization_regressor):
    def __init__(self,x,y,max_thresh=None,folds:int=5,statistic=mean_squared_error,seed=None,*args,**kwargs):
        default_state = solv.options.get('show_progress',True)
        solv.options['show_progress'] = False
        if max_thresh==None: max_thresh = np.abs(least_squares_regressor(x,y).params[1:]).sum()
        outp = []
        for lam in np.linspace(0,1,100):
            model = lambda x,y: l1_regularization_regressor(x,y,thresh=lam*max_thresh)
            mse = k_fold_cross_validation(model,x,y,folds=folds,statistic=statistic,seed=seed).mean()
            outp += [(mse,lam)]
        outp = np.array(outp)
        lam = outp[outp[:,0].argmin(),1]
        thresh = lam*max_thresh
        solv.options['show_progress'] = default_state
        super(l1_cross_validation_regressor,self).__init__(x,y,thresh=thresh,*args,**kwargs)
        self.statistic = outp[outp[:,0].argmin(),0]
        self.max_threshold = max_thresh
        self.lambda_value = lam
model = l1_cross_validation_regressor(npx,npy,seed=90210)
print(model.params)
print(model.threshold,model.lambda_value,model.statistic)

     pcost       dcost       gap    pres   dres
 0: -1.3346e+04 -1.3355e+04  1e+02  1e+00  4e-16
 1: -1.3346e+04 -1.3345e+04  9e+00  1e-01  2e-15
 2: -1.3345e+04 -1.3340e+04  4e+00  5e-02  6e-16
 3: -1.3316e+04 -1.3272e+04  3e+01  4e-02  6e-16
 4: -1.3255e+04 -1.3253e+04  3e+00  1e-03  2e-16
 5: -1.3252e+04 -1.3252e+04  3e-02  1e-05  1e-16
 6: -1.3252e+04 -1.3252e+04  3e-04  1e-07  2e-16
 7: -1.3252e+04 -1.3252e+04  3e-06  1e-09  1e-16
Optimal solution found.
[ 1.12747568e+02 -4.69641354e-01  6.43198686e-02  2.68854065e+00
  4.67100896e+00]
7.893510824141454 0.8686868686868687 398.1962698819594


In [21]:
(398.62277614025425-398.1962698819594)/398.62277614025425

0.0010699495458453168

In [14]:
least_squares_regressor(npx,npy).params

array([ 1.12065256e+02, -4.74159926e-01,  6.00548455e-02,  3.14523963e+00,
        5.40726154e+00])

# Programming challenges

##  Tree Simulation

Write a monte carlo simulation comparing the accuracy of tree based models to that of regression for categorical data.

## Recursive partitioning until non-rejection

Modify our recursive partitioning code to test the null that the two groups are equal and stop splitting when the null cannot be rejected. Bonus points if you use the Bonferonni correction when making the decision to split.

In [None]:
import numpy as np
import pandas as pd
import scipy.stats as sps

from cleands import *

In [None]:
class rpart(prediction_model):
    def __init__(self,x,y,sign_level=0.95,max_level=None,level=''):
        super(rpart,self).__init__(x,y)
        self.max_level = max_level
        self.level = level
        if max_level!=None and len(level)+1==max_level:
            self.RSS = np.sum((y-y.mean())**2)
            self.split_var = None
            self.split_value = None
            self.left = None
            self.right = None
            return
        xvars = np.arange(self.n_feat)
        outp = []
        for i in xvars:
            outp += [self.__calc_RSS_and_split__(x[:,i])]
        outp = np.array(outp)
        var = outp[:,0].argmin()
        self.RSS = outp[var,0]
        self.split_var = var
        self.split_value = outp[var,1]
        if max_level==None:
            xvar = (x[:,var]>self.split_value).astype(int)
            xvar = np.hstack((np.ones((self.n_obs,1)),xvar.reshape(-1,1)))
            try:
                model = least_squares_regressor(xvar,y)
            except:
                self.RSS = np.sum((y-y.mean())**2)
                self.split_var = None
                self.split_value = None
                self.left = None
                self.right = None
                return
            tstat = model.params/np.sqrt(np.diag(model.vcov_params))
            tstat = -np.abs(tstat[1])
            critv = sps.t.ppf((1-sign_level)/2/2**(len(level)+1),df=self.n_obs-2)
            if tstat>=critv:
                self.RSS = np.sum((y-y.mean())**2)
                self.split_var = None
                self.split_value = None
                self.left = None
                self.right = None
                return
        self.left = rpart(x[x[:,var]<=self.split_value,:],y[x[:,var]<=self.split_value],max_level=max_level,level=level+'L')
        self.right = rpart(x[x[:,var]>self.split_value,:],y[x[:,var]>self.split_value],max_level=max_level,level=level+'R')
    def __calc_RSS_and_split__(self,var):
        vmin = var.min()
        vmax = var.max()
        width = (vmax-vmin)/50
        outp = []
        for split in np.linspace(vmin+width,vmax-width,48):
            left = self.y[var<=split]
            right = self.y[var>split]
            rssleft = ((left-left.mean())**2).sum() if left.shape[0]>0 else 0
            rssright = ((right-right.mean())**2).sum() if right.shape[0]>0 else 0
            outp += [(rssleft+rssright,split)]
        outp = np.array(outp)
        return outp[outp[:,0].argmin(),:]
    def __str__(self):
        if self.left==None and self.right==None:
            outp = '{0} RSS: {1}; Prediction: {2}\n'.format(self.level,self.RSS,self.y.mean())
        else:
            outp = '{0} Variable: {1}; Split: {2}; RSS: {3}\n'.format(self.level,self.split_var,self.split_value,self.RSS)
            outp += str(self.left)
            outp += str(self.right)
        return outp
    def predict(self,newx):
        n = newx.shape[0]
        if self.left==None and self.right==None:
            return np.full(shape=(n,),fill_value=self.y.mean())
        outp = np.zeros((n,))
        outp[newx[:,self.split_var]<=self.split_value] = self.left.predict(newx[newx[:,self.split_var]<=self.split_value,:])
        outp[newx[:,self.split_var]>self.split_value] = self.right.predict(newx[newx[:,self.split_var]>self.split_value,:])
        return outp

In [None]:
n = 1000
x = np.random.normal(size=(n,3))
y = np.random.normal(size=(n,))
y += x@np.random.uniform(size=(3,))

In [None]:
model = rpart(x,y)
print(model)