First of all we are going to reserve this first chunk of code to import all the libraries that we will be using along this assignment

In [50]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time 
import random
from numpy.linalg import inv
from scipy.optimize import minimize

Once we have the libraries, we are going to generate a random data sample for a regression model, considering the contraints proposed in the problem:

 - 0 $\leq \beta_{i} \leq 5 \hspace{0.5cm}$ i = 0,...,K for K at least 1000 independent variables and n=5000 observations

In [48]:

K=1000

n_train=50000
n_test=2000
n=n_train+n_test
nvars=1000

randombeta = np.random.randint(0,6,size=([nvars+1,1]))
randomerror = np.random.normal(0,1,(n,1))

X0 = np.ones([n,1]) # the first column has all values equal to one for the coefficients of beta_0
X1 = np.random.uniform(0,10,([n,nvars]))
randomX = np.concatenate([X0, X1],axis=1)

y = np.dot(randomX,randombeta) + randomerror

X = randomX[0:n_train,:]
Y = y[0:n_train]

X_test = randomX[(n_train+1):n,:]
Y_test = y[(n_train+1):n]



## a) (0.5 points) Estimate the value of the regression coefficients by implementing the analytical
solution. Use this solution as a benchmark for the following sections.

We have to minimize:

\begin{align*}
  \text{minimize}_\beta \quad & \|y-X\beta\|^2_2 + \rho\|\beta\|^2_2
\end{align*}

We can rewrite the formula as:

\begin{align*}
  (y-X\beta)^T(y-X\beta) + \rho(\beta^T\beta)
\end{align*}

\begin{align*}
  Y^TY - Y^TX\beta - \beta^TX^TY + \beta^TX^T\beta X + \rho (\beta^T \beta)
\end{align*}

Thus, for minimizing we have to take the partial derivative and equal it to 0.

\begin{align*}
  \frac{\partial F}{\partial\beta} = 0  
\end{align*}

\begin{align*}
  \frac{\partial F}{\partial\beta} = -2X^TY + 2X^TX\beta +  2\rho \beta  
\end{align*}

\begin{align*}
  -2X^TY + 2X^TX\beta + 2\rho \beta = 0
\end{align*}

We can also calculate the second derivative which corresponds to the hessian

\begin{align*}
  \frac{\partial^2 F}{\partial\beta} =  2X^TX +  2\rho  
\end{align*}
 
Now taking the variable $\beta$ to one of the sides of the equality from the first derivative we obtain the analytical solution of the preceding problem, which is: 

\begin{align*}
  \beta_{ls}=(X^T X + pI)^{-1}X^T Y
\end{align*}

We can now implement this solution in python and obtain what would be the exact solution of the prolem. This solution will be use as a beenchmark for the results that we will be obtaining later on other optimization model.

In [51]:
time_start_exact = time.process_time()

beta_ls_exact = np.dot(np.dot(inv(np.dot((X.T),X)+(np.identity(np.dot(X.T,X).shape[0]))),X.T),Y)

time_elapsed_exact = (time.process_time() - time_start_exact)

print('Values of the (exact) least squares coefficients:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,beta_ls_exact[i]))
print('Elapsed time = %8.5f' %(time_elapsed_exact))


Values of the (exact) least squares coefficients:
beta   0   3.680
beta   1   0.000
beta   2   0.999
beta   3   1.000
beta   4   1.998
beta   5   4.001
beta   6   4.001
beta   7   3.999
beta   8   2.000
beta   9   4.000
beta  10  -0.002
beta  11   3.997
beta  12   2.000
beta  13   3.999
beta  14   4.001
beta  15   1.000
beta  16   1.000
beta  17   1.000
beta  18   3.001
beta  19   1.999
beta  20   1.002
beta  21  -0.001
beta  22   3.000
beta  23   1.000
beta  24   0.998
beta  25   2.001
beta  26   2.001
beta  27   3.999
beta  28   3.001
beta  29   0.001
beta  30   1.002
beta  31   0.999
beta  32   2.999
beta  33   2.000
beta  34   2.998
beta  35   0.003
beta  36   4.002
beta  37   2.001
beta  38   1.000
beta  39   1.998
beta  40   2.002
beta  41   1.001
beta  42   3.000
beta  43   1.999
beta  44   3.005
beta  45   0.999
beta  46   0.999
beta  47   0.999
beta  48   0.999
beta  49   4.000
beta  50   1.000
beta  51   1.998
beta  52   2.001
beta  53   1.999
beta  54   1.998
beta  55   0.00

We can see that at prior the betas obtained are reasonable with the betas generated for the dataset. Indeed we can see that none of them are above 5 and practically not very much under 0, which were the constrains impossed while generating the dataset for the betas.

In [54]:
print("Max: ",max(beta_ls_exact),"Min: ",min(beta_ls_exact))

Max:  [4.00506423] Min:  [-0.0037359]


Within this output we can clearly see that the boundaries impossed are fulfilled almost perfectly.


## b) (1 points) Estimate the value of the regression coefficients by using the function minimize from the Python module Scipy.optimize. Try at least four available solvers and compare their performance in terms of number of iterations, number of function, gradient and hessian evaluations as well as total computational time.

To do this we are going to define different equations from our objective function that will be neccesary to implement different algorithms and to compare the time taken and the results obtained.

Therefore we will implement:

- The objective function defined in the main problem. 
- The first derivative correspondant to the gradient
- The second derivative or the hessian

All these derivations have been calculated above, so we will now implement them in python as functions.

In [55]:

def ridreg(beta_ls, X, Y,p):
    beta_ls = np.matrix(beta_ls)
    z = Y - np.dot(X,beta_ls.T)
    a=np.dot(z.T,z)
    b=np.dot(beta_ls,beta_ls.T)
    return (a+p*b)

def ridreg_der(beta_ls,X,Y,p):
    beta_ls = np.matrix(beta_ls)
    pp = -2*np.dot((Y-np.dot(X,(beta_ls).T)).T,X) + 2*p*beta_ls
    aa = np.squeeze(np.asarray(pp))
    return aa

def ridreg_hess(beta_ls,X,Y,p):
    ss = 2*np.dot(np.transpose(X),X) + 2*p
    return ss


First of all we are going to implement an algorithm using directly the objective function with the "Nelder-Mead" method. This method only takes the function to minimize.

In [241]:
beta_ls0 = np.zeros(nvars+1)

time_start_Nelder = time.process_time()

res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='Nelder-Mead', options={'disp': True,'xtol': 1e-1})

time_elapsed_Nelder = (time.process_time() - time_start_Nelder)

print('\nValues of the least squares coefficients obtained with Nelder-Mead:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,res.x[i]))

err_val_1 = np.linalg.norm(beta_ls_exact.T-res.x,ord=2)/np.linalg.norm(beta_ls_exact.T,ord=2)
print('\nError in values of coefficients = %8.4f' %err_val_1)

print('Elapsed time = %8.5f' %(time_elapsed_Nelder))

KeyboardInterrupt: 

We are now going to include the gradient and hessian evaluation. First we use the Newtwon-CG method, which uses both evaluations, Hessian and Gradient.

In [57]:
beta_ls0=np.zeros(nvars+1)

time_start_Newton = time.process_time()

res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='Newton-CG', jac=ridreg_der, hess=ridreg_hess, options={'disp': True})

time_elapsed_Newton = (time.process_time() - time_start_Newton)

print('\nValues of the least squares coefficients obtained with Nelder-Mead:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,res.x[i]))

err_val_2 = np.linalg.norm(beta_ls_exact.T-res.x,ord=2)/np.linalg.norm(beta_ls_exact.T,ord=2)
print('\nError in values of coefficients = %8.4f' %err_val_2)

print('Elapsed time = %8.5f' %(time_elapsed_Newton))

Optimization terminated successfully.
         Current function value: 77515.315446
         Iterations: 10
         Function evaluations: 11
         Gradient evaluations: 11
         Hessian evaluations: 10

Values of the least squares coefficients obtained with Nelder-Mead:
beta   0   0.393
beta   1   0.001
beta   2   0.999
beta   3   1.001
beta   4   1.999
beta   5   4.002
beta   6   4.002
beta   7   4.000
beta   8   2.001
beta   9   4.000
beta  10  -0.001
beta  11   3.998
beta  12   2.001
beta  13   4.000
beta  14   4.002
beta  15   1.001
beta  16   1.001
beta  17   1.001
beta  18   3.002
beta  19   2.000
beta  20   1.003
beta  21  -0.000
beta  22   3.001
beta  23   1.001
beta  24   0.999
beta  25   2.001
beta  26   2.002
beta  27   3.999
beta  28   3.001
beta  29   0.002
beta  30   1.002
beta  31   1.000
beta  32   3.000
beta  33   2.001
beta  34   2.998
beta  35   0.003
beta  36   4.003
beta  37   2.001
beta  38   1.001
beta  39   1.999
beta  40   2.003
beta  41   1.002
beta  42

From this method we have obtained pretty good results in terms of computation efficiency and time consumed. We can see that it has only 10 iteractions, 11 evaluations of the function and gradient and just 10 with the hessian. Therefore we can say that the convergence of this method was very fast and the solution obtained is also pretty close to what we are considering as the exact solution.

We are now going to try with the BFGS method, which does not use the Hessian information.

In [61]:
beta_ls0=np.zeros(nvars+1)

time_start_BFGS = time.process_time()

res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='BFGS', jac=ridreg_der, options={'disp': True,'xtol': 1e-10})
#res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='BFGS', jac=ridreg_der)
time_elapsed_BFGS = (time.process_time() - time_start_BFGS)

print('\nValues of the least squares coefficients obtained with Nelder-Mead:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,res.x[i]))

err_val_3 = np.linalg.norm(beta_ls_exact.T-res.x,ord=2)/np.linalg.norm(beta_ls_exact.T,ord=2)
print('\nError in values of coefficients = %8.4f' %err_val_2)

print("Time taken for the BFGS method:", time_elapsed_BFGS)


  res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='BFGS', jac=ridreg_der, options={'disp': True,'xtol': 1e-10})


         Current function value: 77364.746278
         Iterations: 421
         Function evaluations: 953
         Gradient evaluations: 941

Values of the least squares coefficients obtained with Nelder-Mead:
beta   0   3.059
beta   1   0.001
beta   2   0.999
beta   3   1.000
beta   4   1.998
beta   5   4.001
beta   6   4.001
beta   7   3.999
beta   8   2.001
beta   9   4.000
beta  10  -0.002
beta  11   3.997
beta  12   2.001
beta  13   3.999
beta  14   4.001
beta  15   1.000
beta  16   1.001
beta  17   1.000
beta  18   3.002
beta  19   2.000
beta  20   1.002
beta  21  -0.001
beta  22   3.000
beta  23   1.000
beta  24   0.998
beta  25   2.001
beta  26   2.001
beta  27   3.999
beta  28   3.001
beta  29   0.001
beta  30   1.002
beta  31   1.000
beta  32   2.999
beta  33   2.000
beta  34   2.998
beta  35   0.003
beta  36   4.002
beta  37   2.001
beta  38   1.000
beta  39   1.998
beta  40   2.002
beta  41   1.001
beta  42   3.000
beta  43   1.999
beta  44   3.005
beta  45   0.999
beta  46

We can see that this case was way less efficient than the one obtained before. This method has done 421 iterations and has evaluated both function and gradient almost 950 times each. The convergence was slow, but the results obtained are also very good, in fact they are apparently as good as the ones obtained in the case before. However, this method took over 16 times to compute more than the "Newton-CG".

Finally we will be using "trust-constr", which does not use the gradient.

In [62]:
time_start_TC = time.process_time()

res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='trust-constr', hess=ridreg_hess, options={'disp': True,'xtol': 1e-10})
time_elapsed_TC = (time.process_time() - time_start_TC)

print('\nValues of the least squares coefficients obtained with CG:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,res.x[i]))

err_val_4 = np.linalg.norm(beta_ls_exact.T-res.x,ord=2)/np.linalg.norm(beta_ls_exact.T,ord=2)
print('\nError in values of coefficients = %8.4f' %err_val_4)

print("Time taken for the BFGS method:", time_elapsed_TC)

`xtol` termination condition is satisfied.
Number of iterations: 46, function evaluations: 26064, CG iterations: 118, optimality: 1.01e-01, constraint violation: 0.00e+00, execution time: 5.7e+02 s.

Values of the least squares coefficients obtained with CG:
beta   0   3.062
beta   1   0.001
beta   2   0.999
beta   3   1.000
beta   4   1.998
beta   5   4.001
beta   6   4.001
beta   7   3.999
beta   8   2.001
beta   9   4.000
beta  10  -0.002
beta  11   3.997
beta  12   2.001
beta  13   3.999
beta  14   4.001
beta  15   1.000
beta  16   1.001
beta  17   1.000
beta  18   3.002
beta  19   2.000
beta  20   1.002
beta  21  -0.001
beta  22   3.000
beta  23   1.000
beta  24   0.998
beta  25   2.001
beta  26   2.001
beta  27   3.999
beta  28   3.001
beta  29   0.001
beta  30   1.002
beta  31   1.000
beta  32   2.999
beta  33   2.000
beta  34   2.998
beta  35   0.003
beta  36   4.002
beta  37   2.001
beta  38   1.000
beta  39   1.998
beta  40   2.002
beta  41   1.001
beta  42   3.000
beta  43  


## c) (1 points) Modify the preceding optimization model by adding (lower and upper) bounds on the values of the β coefficients. Solve it again with the module Scipy.optimize a by using (at least) two different solvers, which should accept the introduction of bounds on the variables. Compare these methods and briefly comment on possible interpretations of the values of the coefficients.

Following the same methodology that we have been using we are going to try now different optimizations methods from scipy that allows to include boundaries on the betas selected for minimizing our function. At prior this should help to converge faster and obtain better solutions, as we know for a fact that none of the betas should be over 5 and lower 0.

In [64]:
time_start_TNC_bounds = time.process_time()

res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='TNC', bounds=(((0,5),)*len(beta_ls0)), jac=ridreg_der, hess=ridreg_hess, options={'disp': True})
time_elapsed_TNC_bounds = (time.process_time() - time_start_TNC_bounds)

print('\nValues of the least squares coefficients obtained with SLSQP for the Lasso problem:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,res.x[i]))

err_val_5 = np.linalg.norm(beta_ls_exact.T-res.x,ord=2)/np.linalg.norm(beta_ls_exact.T,ord=2)

print('\nError in values of coefficients = %8.4f' %err_val_5)

print('Elapsed time = %8.5f' %(time_elapsed_TNC_bounds))

  warn('Method %s does not use Hessian information (hess).' % method,
  NIT   NF   F                       GTG
    0    1  4.744796718479768E+12   2.37303679E+22
tnc: fscale = 1.29831e-12
tnc: stepmx = 1000
    1    4  1.688509891577473E+11   8.40489860E+20
    2    7  1.072150466288664E+10   5.32663088E+19
tnc: fscale = 2.74034e-11
    3   10  8.399049128537127E+09   4.16852311E+19
    4   13  4.678084041177627E+09   2.31923942E+19
    5   16  4.394774613144771E+09   2.17660300E+19
    6   19  2.410516454037132E+09   1.19257572E+19
    7   22  2.264374956499546E+09   1.11913683E+19
    8   25  1.292407649266452E+09   6.38054039E+18
    9   28  1.197986900833031E+09   5.90837065E+18
   10   31  9.180966085177099E+08   4.52328947E+18
   11   34  6.999906305109761E+08   3.44506928E+18
   12   37  5.249617460707458E+08   2.58086550E+18
   13   40  3.863101622384903E+08   1.89715850E+18
   14   43  3.098033425184357E+08   1.51980753E+18
   15   46  3.032219631137396E+08   1.48599079E+18
  


Values of the least squares coefficients obtained with SLSQP for the Lasso problem:
beta   0   2.665
beta   1   0.000
beta   2   0.999
beta   3   1.000
beta   4   1.998
beta   5   4.001
beta   6   4.001
beta   7   3.999
beta   8   2.001
beta   9   4.000
beta  10   0.000
beta  11   3.997
beta  12   2.000
beta  13   3.999
beta  14   4.001
beta  15   1.000
beta  16   1.001
beta  17   1.000
beta  18   3.002
beta  19   2.000
beta  20   1.002
beta  21   0.000
beta  22   3.000
beta  23   1.000
beta  24   0.998
beta  25   2.001
beta  26   2.002
beta  27   3.999
beta  28   3.001
beta  29   0.001
beta  30   1.002
beta  31   1.000
beta  32   2.999
beta  33   2.000
beta  34   2.998
beta  35   0.003
beta  36   4.002
beta  37   2.001
beta  38   1.000
beta  39   1.998
beta  40   2.002
beta  41   1.001
beta  42   3.000
beta  43   1.999
beta  44   3.005
beta  45   0.999
beta  46   0.999
beta  47   0.999
beta  48   0.999
beta  49   4.000
beta  50   1.000
beta  51   1.998
beta  52   2.001
beta  53   1.9

tnc: |fn-fn-1] = 9.02602e-10 -> convergence
  141  547  7.743948538353262E+04   5.45316850E-03
tnc: Converged (|f_n-f_(n-1)| ~= 0)


In [65]:
time_start_tk = time.process_time()

res = minimize(ridreg, beta_ls0, args=(X, Y,5), method='trust-krylov', bounds=(((0,5),)*len(beta_ls0)), jac=ridreg_der, hess=ridreg_hess, options={'disp': True})
time_elapsed_tk = (time.process_time() - time_start_BFGS)

## Print results
print('\nValues of the least squares coefficients obtained with SLSQP for the Lasso problem:')
for i in range(nvars+1):
    print('beta %3d %7.3f' %(i,res.x[i]))

err_val_5 = np.linalg.norm(beta_ls_exact.T-res.x,ord=2)/np.linalg.norm(beta_ls_exact.T,ord=2)
print('\nError in values of coefficients = %8.4f' %err_val_5)

print('Elapsed time = %8.5f' %(time_elapsed_tk))

 TR Solving trust region problem, radius 1.000000e+00; starting on first irreducible block
 TR Coldstart. Seeking suitable initial Î»â, starting with 0
 TR Starting Newton iteration for Î»â with initial choice 0.000000e+00
 TR  iter        Î»            dÎ»       âhâ(Î»)â-radius
 TR      1  1.515455e+11  1.515455e+11  0.000000e+00


 iter inewton type    objective     Î³áµ¢ââ|háµ¢|      leftmost         Î»             Î³             Î´             Î±             Î²       

     0     1  cg_b -1.527961e+11  6.076739e+05  0.000000e+00  1.515455e+11  1.540466e+11  2.501119e+09  3.998210e-10  5.902995e-08


 TR Solving trust region problem, radius 2.000000e+00; starting on first irreducible block
 TR Coldstart. Seeking suitable initial Î»â, starting with 0
 TR Starting Newton iteration for Î»â with initial choice 0.000000e+00
 TR  iter        Î»            dÎ»       âhâ(Î»)â-radius
 TR      1  7.327165e+10  7.327165e+10  0.000000e+00


 iter inewton type    objectiv


## d) Estimate the value of the regression coefficients of (1) by implementing the:


###     i. Gradient Method

In [42]:
(a,b) = X.shape

sigma = 0.1
alpha = 1
delta = 0.1
n_iter = 2000 # Maximum number of iterations
epsilon = 1e-5
tol = 10000
beta=0.2 #Set beta for the armijo rule

beta_lsg = np.zeros(b) # initial value for beta

OF_iter = np.zeros(n_iter)
tol_iter = np.zeros(n_iter)
alpha_iter = np.zeros(n_iter)

time_start_GM = time.process_time()

i = 0

while (i <= n_iter-2) and (tol > epsilon):
    i = i + 1
    grad = ridreg_der(beta_ls0,X,Y,5) # Gradient vector
    ddirect =  -grad
    j=0
    while(ridreg(beta_ls0-(alpha*grad),X,Y,5)>=ridreg(beta_ls0,X,Y,5)+sigma*alpha*np.dot(-grad.T,grad)):
        alpha=alpha*beta

    beta_ls0 = beta_ls0 + alpha*ddirect
    
    OF_iter[i] = ridreg(beta_ls0, X, Y,5)
    tol = np.linalg.norm(grad,ord=2)
    tol_iter[i] = tol
    alpha_iter[i] = alpha
    
time_elapsed_GM = (time.process_time() - time_start_GM)

print('Elapsed time = %8.5f' %(time_elapsed_GM))
print('\nNumber of iterations = %5.0f' %i)
print('Objective function   = %11.5f' %OF_iter[i])
print('Optimality tolerance = %11.5f' %tol)

print('\nValues of the least squares coefficients - gradient method:')
print('beta %-9s %7.3f' %('intercept',beta_ls0[0]))
for ii in np.arange(1,b):
    print('beta %-9s %7.3f' %(X[ii,],beta_ls0[ii]))

beta_err = np.linalg.norm(np.transpose(beta_ls_exact)-beta_ls0,ord=2)/np.linalg.norm(beta_ls0,ord=2)
print('\nBeta coefficient error = %10.5f' %beta_err)

Elapsed time = 877.86700

Number of iterations =  1999
Objective function   = 493514435.10627
Optimality tolerance = 28758888.84741

Values of the least squares coefficients - gradient method:
beta intercept   0.000
beta [1.         6.95156902 1.31095523 ... 0.53261018 0.89682905 7.37561869]   1.524
beta [1.         0.28874302 1.08201495 ... 9.94612795 8.08969039 7.50473645]   2.241
beta [1.         2.35197348 4.10021336 ... 9.66438867 3.1245721  5.2876411 ]   2.258
beta [1.         1.17395462 3.82925796 ... 1.03312205 2.72445407 2.43575667]   1.479
beta [1.         4.78431138 5.67255827 ... 8.51184538 8.86374104 9.77132494]   2.485
beta [1.         1.10642235 4.67211535 ... 6.18232081 1.5910582  0.687414  ]   2.550
beta [1.         8.44738237 7.10591938 ... 5.36541678 0.70631418 6.81942125]   1.972
beta [1.         4.85005075 0.05113374 ... 5.58935821 4.20184039 4.64454292]   1.560
beta [1.         9.99975553 0.02019504 ... 9.72457466 4.08051211 1.69732461]   1.523
beta [1.         4.

### ii. Newton Method

In [66]:
# Implementation of Newton's method
(a,b) = X.shape

## Parameters for the algorithm

alpha = 1
n_iter = 200 # Maximum number of iterations
epsilon = 1e-5
tol = 10000
sigma = 0.1
delta = 0.1
beta=0.2

## Initial values for the variables and data containers

#beta_lsnm = np.zeros(b) # initial value for beta
beta_lsnm=np.zeros((n_iter,b))

OF_iter = np.zeros(n_iter)
tol_iter = np.zeros(n_iter)
alpha_iter = np.zeros(n_iter)

# Implement Newton's method

time_start = time.process_time()

i = 0

while (i <= n_iter-2) and (tol > epsilon):

    grad = ridreg_der(beta_lsnm[i,],X,Y,5)
    hess = ridreg_hess(beta_lsnm[i,],X,Y,5)
    ddirect = np.dot(-np.linalg.inv(hess),grad)
    while(ridreg(beta_lsnm[i,]-alpha*grad,X,Y,5)>ridreg(beta_lsnm[i,],X,Y,5)+sigma*alpha*np.dot(-grad.T,grad)):
       alpha=alpha*beta
    beta_lsnm[i+1,] = beta_lsnm[i,] + alpha*ddirect
    OF_iter[i] = ridreg(beta_lsnm[i,], X, Y,5)
    tol = np.linalg.norm(grad,ord=2)
    tol_iter[i] = tol
    alpha_iter[i] = alpha
    i = i + 1

time_elapsed = (time.process_time() - time_start)

## Print the results

print('Elapsed time = %8.5f' %(time_elapsed))
print('\nNumber of iterations = %5.0f' %i)
print('Objective function   = %11.5f' %OF_iter[i])
print('Optimality tolerance = %11.5f' %tol)

print('\nValues of the least squares coefficients - gradient method:')
print('beta %-9s %7.3f' %('intercept',beta_lsnm[i,0]))
for ii in np.arange(1,b):
    print('beta %-9s %7.3f' %(X[ii],beta_lsnm[ii]))

beta_err = np.linalg.norm(np.transpose(beta_ls_exact)-beta_lsnm[i,],ord=2)/np.linalg.norm(beta_lsnm[i,],ord=2)
print('\nBeta coefficient error = %10.5f' %beta_err)


Elapsed time = 621.84432

Number of iterations =   199
Objective function   =     0.00000
Optimality tolerance = 154046636947.05249

Values of the least squares coefficients - gradient method:


TypeError: only size-1 arrays can be converted to Python scalars

In [71]:
beta_lsnm[i,]

array([-1.30036210e-05,  2.76616104e-09,  3.51889017e-08, ...,
        6.84439506e-08,  1.01006566e-07,  2.82969197e-09])

### iii. Quasi-newton method

In [278]:

alpha = 1e-4
n_iter = 2000 # Maximum number of iterations
epsilon = 1e-5
tol = 10000
sigma = 0.1
delta = 0.1
beta=0.25

beta_ls0 = np.zeros(b) # initial value for beta

## Initial values for the variables and data containers

OF_iter = np.zeros(n_iter)
tol_iter = np.zeros(n_iter)
alpha_iter = np.zeros(n_iter)

# Implement Newton's method

time_start = time.process_time()

i = 0

bk = least_sq_ridreg_hess(beta_ls0,X,Y,5)
print(bk)

#yk=np.zeros(b)[np.newaxis]
#sk=np.zeros(b)[np.newaxis]
#xkmas1=beta_ls0+0.01

while (i <= n_iter-2) and (tol > epsilon):
    i=i+1
    grad=least_sq_ridreg_der(beta_ls0,X,Y,5)
    ddirect=np.dot(-np.linalg.inv(bk),grad)

    beta_ls1=beta_ls0+alpha*ddirect
    xkmas1=beta_ls1+alpha*ddirect

    yk=least_sq_ridreg_der(xkmas1,X,Y,5) - grad
    sk=xkmas1-beta_ls0
    sk=sk[np.newaxis]
    yk=yk[np.newaxis]

    bk=bk - (np.dot(np.dot(sk,bk).T,np.dot(sk,bk)))/np.asscalar(np.dot(np.dot(sk,bk),sk.T)) + np.dot(yk,yk.T)/np.asscalar(np.dot(yk,sk.T))
    



[[ 100010.          499835.04776097  500478.15308092 ...  497380.07502379
   497342.0624821   499802.71204146]
 [ 499835.04776097 3336163.86775877 2502833.41468945 ... 2486224.10266724
  2490245.27555858 2500498.44265578]
 [ 500478.15308092 2502833.41468945 3336409.74319732 ... 2489187.62528998
  2488056.35840391 2495430.58353448]
 ...
 [ 497380.07502379 2486224.10266724 2489187.62528998 ... 3310015.46497208
  2476978.5167301  2484328.14336766]
 [ 497342.0624821  2490245.27555858 2488056.35840391 ... 2476978.5167301
  3306808.93859246 2487441.21692144]
 [ 499802.71204146 2500498.44265578 2495430.58353448 ... 2484328.14336766
  2487441.21692144 3327035.98523806]]


  bk=bk - (np.dot(np.dot(sk,bk).T,np.dot(sk,bk)))/np.asscalar(np.dot(np.dot(sk,bk),sk.T)) + np.dot(yk,yk.T)/np.asscalar(np.dot(yk,sk.T))


In [279]:
beta_ls1

array([-3.20388656e+08,  5.80959469e+05,  5.77444615e+05, ...,
        4.62265232e+05,  5.82391453e+05,  4.55448206e+05])


# e) Estimate the value of the regression coefficients of (1) by implementing the:

 - Coordinate descent method

In [76]:
(a, b) = X.shape

def least_sq_ridreg_der_coord(beta_ls,index,X,Y,p):
    pp=np.array(-2*np.dot((Y-np.dot(X,beta_ls)).T,X[:,index])) + 2*p*beta_ls[index]
    aa=np.zeros([b,1])
    aa[index]=pp
    return aa.T

beta_ls=np.zeros(b)

niter = 100000
epsilon = 1e-9
i = 0
alpha = 1e-5
OF_iter = np.zeros(niter)
tol_iter = np.zeros(niter)
error_coord_iter = []
tol = 10
beta_coord = np.zeros([b,1])

time_start = time.time()

while (i < niter) and (tol > epsilon):
    k = np.random.randint(b)
    gradk = least_sq_ridreg_der_coord(beta_coord,k,X,Y,5).T
    beta_coord = beta_coord - alpha*gradk
    tol = np.linalg.norm(gradk, ord = 2)
    OF_iter[i]  = least_sq_ridreg(beta_coord.T, X, Y,5)
    tol_iter[i] = tol
    error_coord_iter.append(np.linalg.norm(np.transpose(beta_ls_exact) - beta_coord.T, ord = 2)/np.linalg.norm(beta_ls_exact, ord = 2))
    i +=1
    
    
time_elapsed = (time.time() - time_start)
print('time elapsed =',time_elapsed)
print('betas coord =',beta_coord)
print('betas exact =', beta_ls_exact.T)
print('number iterations =',i)
print('tolerance=',tol) 
print('error', np.linalg.norm(np.transpose(beta_ls_exact) - beta_coord.T, ord = 2)/np.linalg.norm(beta_ls_exact, ord = 2)) 

time elapsed = 10.717936038970947
betas coord = [[0.00000000e+000]
 [0.00000000e+000]
 [0.00000000e+000]
 ...
 [1.83373142e+148]
 [0.00000000e+000]
 [0.00000000e+000]]
betas exact = [[2.29483459 2.00134888 2.99953027 ... 0.99946899 1.00206974 2.00065238]]
number iterations = 218
tolerance= nan
error nan


- Stochastic Gradient

In [43]:
import random

(a, b) = X.shape
niter = 100000
epsilon = 1e-9
i = 0
alpha = 1e-3
OF_iter = np.zeros(niter)
tol_iter = np.zeros(niter)
error_stoch_iter = np.zeros(niter)
tol = 10
beta_stoch = np.zeros(b)


time_start = time.time()

while (i < niter) and (tol > epsilon):
    k=random.choice(range(a))
    beta_stoch = beta_stoch - 2*alpha*(np.dot(beta_stoch,X[k])-Y[k])*X[k]  
    OF_iter[i]  = least_sq_ridreg(beta_stoch, X, Y,5)
    if i>0:
        tol = np.abs((OF_iter[i]-OF_iter[i-1])/OF_iter[i-1])
    tol_iter[i] = tol
    error_stoch_iter[i] = np.linalg.norm(np.transpose(beta_ls_exact) - beta_stoch, ord = 2)/np.linalg.norm(beta_ls_exact, ord = 2)
    i +=1
    
    
time_elapsed = (time.time() - time_start)
print('time elapsed =',time_elapsed)
print('betas stoch =',beta_stoch)
print('betas exact =', beta_ls_exact.T)
print('number iterations =',i)
print('tolerance=',tol) 
print('final error =', np.linalg.norm(np.transpose(beta_ls_exact) - beta_stoch, ord = 2)/np.linalg.norm(beta_ls_exact, ord = 2)) 

time elapsed = 2.3825461864471436
betas stoch = [1.69505199e+150 1.19123913e+151 5.16826695e+150 ... 9.79271218e+150
 1.45262656e+151 1.33450175e+151]
betas exact = [[2.29483459 2.00134888 2.99953027 ... 0.99946899 1.00206974 2.00065238]]
number iterations = 89
tolerance= nan
final error = 4.0991232679657335e+150


  tol = np.abs((OF_iter[i]-OF_iter[i-1])/OF_iter[i-1])


- Other three techniques

# Mini Batch gradient descent

In [221]:
(a,b)=X.shape
beta_lsg=np.zeros(b) #initial value for beta
alpha=1e-4 
n_iter=100000#maximim number iteration
OF_iter=np.zeros(n_iter)
tol_iter=np.zeros(n_iter)
alpha_iter=np.zeros(n_iter)
error_minib_iter=np.zeros(n_iter)
i=0;
tol=100000;
epsilon=1e-9

#### Number of samples to take into consideration in each iteration
subsetsize = 10
#### Calculate one set of subsetsize random index to choose radomly some samples 
subsets = np.random.choice([x for x in range(0,a)],n_iter*subsetsize)
subsets.resize(n_iter,subsetsize)

time_start = time.time()

while (i <= n_iter-2) and (tol>epsilon):

    vector=np.zeros(b)

    for j in range(subsetsize):
        vector=vector+least_sq_ridreg_der(beta_lsg,X[subsets[j],:],Y[subsets[j],:],5)

    beta_lsg = beta_lsg - (alpha/(subsetsize))*vector

    OF_iter[i] = least_sq_ridreg(beta_lsg, X, Y,5)
    if i>0:
        tol = np.abs((OF_iter[i]-OF_iter[i-1])/OF_iter[i-1])
    tol_iter[i] = tol
    error_minib_iter[i] = np.linalg.norm(np.transpose(beta_ls_exact)-beta_lsg,ord=2)/np.linalg.norm(beta_lsg,ord=2)
    i=i+1
    
time_elapsed = (time.time() - time_start) 

print('time elapsed =',time_elapsed)
print('iterations =',i)
print('Objective Function value =', OF_iter[i])
print('Betas =',beta_lsg)
print('Tolerance=',tol)
print('error=',np.linalg.norm(np.transpose(beta_ls_exact)-beta_lsg,ord=2)/np.linalg.norm(beta_lsg,ord=2))



time elapsed = 2.7450928688049316
iterations = 89
Objective Function value = 0.0
Betas = [1.22551207e+150 6.06991289e+150 6.56619720e+150 ... 6.33701626e+150
 5.81007815e+150 5.61601394e+150]
Tolerance= nan
error= 0.9999999999999999


  tol = np.abs((OF_iter[i]-OF_iter[i-1])/OF_iter[i-1])


In [201]:
subsetsize = 10
#### Calculate one set of subsetsize random index to choose radomly some samples 
subsets = np.random.choice([x for x in range(0,a)],n_iter*subsetsize)
subsets.resize(n_iter,subsetsize)

In [215]:
subsets[1]


array([11851, 34357, 45058, 27512, 12909,  7879, 43285, 41007, 19493,
       39018])

# Second order Stochastic Quasi newton

In [229]:
alpha = 1e-4
n_iter = 200 # Maximum number of iterations
epsilon = 1e-5
tol = 10000
sigma = 0.1
delta = 0.1
beta=0.25

beta_ls0 = np.zeros(b) # initial value for beta

## Initial values for the variables and data containers

OF_iter = np.zeros(n_iter)
tol_iter = np.zeros(n_iter)
alpha_iter = np.zeros(n_iter)

# Implement Newton's method

time_start = time.process_time()

i = 0

Hk=np.linalg.inv(least_sq_ridreg_hess(beta_ls0,X,Y,5))


while (i <= n_iter-2) and (tol > epsilon):
    i+=1
    grad=least_sq_ridreg_der(beta_ls0,X,Y,5)
    wkmas1 = beta_ls0 - alpha*np.dot(Hk,grad)
    sk=wkmas1 - beta_ls0
    sk=sk[np.newaxis]
    vk = least_sq_ridreg_der(wkmas1,X,Y,5) + least_sq_ridreg_der(beta_ls0,X,Y,5)
    vk=vk[np.newaxis]

    Hk= np.dot(np.dot((np.identity(b)) - (np.dot(vk.T,sk)/np.asscalar(np.dot(vk,sk.T))).T,Hk),
    (np.identity(b)) - (np.dot(vk.T,sk)/np.asscalar(np.dot(vk,sk.T)))) + (np.dot(sk.T,sk)/np.dot(sk,vk.T))

    beta_ls0=wkmas1

    
 

    



  Hk= np.dot(np.dot((np.identity(b)) - (np.dot(vk.T,sk)/np.asscalar(np.dot(vk,sk.T))).T,Hk),
  (np.identity(b)) - (np.dot(vk.T,sk)/np.asscalar(np.dot(vk,sk.T)))) + (np.dot(sk.T,sk)/np.dot(sk,vk.T))


In [231]:
beta_ls0[70:300]

array([8.56418045e-06, 1.07861418e-04, 3.07373364e-04, 4.07866802e-04,
       7.15394976e-06, 9.98421621e-06, 2.06332258e-04, 2.06976337e-04,
       1.08874346e-04, 8.71588321e-06, 3.08687215e-04, 7.94506702e-06,
       3.09258127e-04, 4.10504832e-04, 2.09705137e-04, 3.08720855e-04,
       2.08102861e-04, 3.07572343e-04, 3.09265700e-04, 8.78754246e-06,
       6.73228042e-06, 3.06225612e-04, 2.07847948e-04, 7.39641751e-06,
       5.46747039e-06, 2.07408044e-04, 2.06698556e-04, 4.07092728e-04,
       1.06934363e-04, 3.07844252e-04, 7.83756619e-06, 7.68035824e-06,
       3.09364938e-04, 1.08198324e-04, 3.07853719e-04, 8.00000538e-06,
       7.44025451e-06, 2.08111006e-04, 4.07284122e-04, 2.08296838e-04,
       2.08197689e-04, 1.07250149e-04, 4.07566452e-04, 1.09216084e-04,
       2.10050950e-04, 1.07797446e-04, 3.08613727e-04, 2.10076186e-04,
       6.87827499e-06, 4.05504804e-04, 2.08522344e-04, 1.08863677e-04,
       1.08696453e-04, 3.07855232e-04, 4.08514939e-04, 3.06616043e-04,
      

# Other methods

In [292]:
alpha = 1e-4
n_iter = 2000 # Maximum number of iterations
epsilon = 1e-5
tol = 10000
sigma = 0.1
delta = 0.1
beta=0.5

beta_ls0 = np.zeros(b) # initial value for beta
beta_lsmas1 = beta_ls0

## Initial values for the variables and data containers

OF_iter = np.zeros(n_iter)
tol_iter = np.zeros(n_iter)
alpha_iter = np.zeros(n_iter)

# Implement Newton's method

time_start = time.process_time()

i = 0

while (i <= n_iter-2) and (tol > epsilon):
    i+=1
    gradd=least_sq_ridreg_der(beta_lsmas1,X,Y,5)
    beta_ls2= beta_ls1 - alpha*gradd + beta*(beta_ls1 - beta_ls0)
    beta_ls0=beta_ls1
    beta_ls1=beta_ls2

    tol = np.linalg.norm(gradd, ord = 2)
    OF_iter[i]  = least_sq_ridreg(beta_ls2, X, Y,5)
    tol_iter[i] = tol


In [293]:
beta_ls2

array([5.62099551e+09, 3.68284531e+10, 3.68772247e+10, ...,
       3.66404591e+10, 3.66444545e+10, 3.68198390e+10])

f) (2 points) Consider the constrained problem:

In [34]:
import time

def f(beta,X,Y,mu):
    beta = np.matrix(beta)
    z = Y - np.dot(X,beta.T)
    zz=np.dot(z.T,z)
    sumatorio=np.sum(beta)
    aa=mu*sumatorio
    return(zz-aa)

def f_der(beta,X,Y,mu):
    beta = np.matrix(beta)
    pp = -2*np.dot((Y-np.dot(X,(beta).T)).T,X)
    aa = np.squeeze(np.asarray(pp))
    return (aa-(beta.shape[1]*mu))

def f_hess(beta_ls,X,Y):
    ss = 2*np.dot(np.transpose(X),X)
    return ss

(a, b) = X.shape
beta=np.zeros((n_iter,b))

n_iter = 200 # Maximum number of iterations
epsilon = 1e-5
tol = 10000


OF_iter = np.zeros(n_iter)
tol_iter = np.zeros(n_iter)
alpha_iter = np.zeros(n_iter)

# Implement Newton's method

time_start = time.process_time()

i = 1
t=np.zeros(200)


while (i <= n_iter-2) and (tol > epsilon):
    
    beta[i,]=beta[i-1]-np.dot(np.linalg.inv(f_hess(beta[i-1],X,Y)),f_der(beta[1-1],X,Y,0.1))
    t[i]=f(beta[i,],X,Y,0.1)

    tol=np.linalg.norm(beta[i]-beta[i-1],ord=2)
    i+=1
    

In [36]:
print(i)
print(sum(beta[i-1,]))

199
392460.1499442518


In [35]:
np.dot(np.linalg.inv(f_hess(beta,X,Y)),f_der(beta,X,Y,5)).shape

ValueError: shapes (1001,1001) and (200,1001) not aligned: 1001 (dim 1) != 200 (dim 0)