# Close form solution vs iterative approaches

In this assignment we have two different data sets and we are going to compare the results and time of training for these data sets with different approaches. The following is how we compute the close form solution.

If we consider x and y like the following matrices:

\begin{equation*}
x=[x_1|x_2|...|x_n]
\end{equation*}
\begin{equation*}
y=\begin{bmatrix}
    y_{1}  \\
    y_{2}  \\
    \vdots  \\
    y_{n} 
\end{bmatrix}
\end{equation*}

Then we like to minimize 

\begin{equation*}
\frac{1}{N} \sum||w^Tx_i-y_i||^2 + \lambda ||w||^2
\end{equation*}

\begin{equation*}
f(w)=\frac{1}{N}||X^Tw-y||^2+\lambda||w||^2
\end{equation*}

and then we compute the gradian of the function with respect to w. Then compute w when the gradian is zero. The following is the close form solution for w

\begin{equation*}
w=(XX^T+\lambda I)^{-1}Xy
\end{equation*}

Using this equation we can compute the close form solution and compare it with iterative approaches.

In [5]:
import pandas as pd
import numpy as np
import time

np.random.seed(0)

data = pd.read_csv("crimedata.csv")

X = data.iloc[:,5:127]
Y = data.iloc[:,127]

print(X.shape)
print(Y.shape)

X_ = np.array(X.replace({'?':0})).T
X_ = X_.astype(float)
Y_ = np.array(Y).reshape(Y.shape[0],1)
lambda_ = 0.0

start = time.time()
I = np.identity(122)
p1 = np.matmul(X_,X_.T)
p2 = lambda_*I
inv = np.linalg.inv(p1+p2)
close_form_solution = np.matmul(np.matmul(inv,X_),Y_)
end = time.time()

print(close_form_solution)

print("Time for close form solution: ", end-start)

(1993, 122)
(1993,)
[[  1.86841381e-01]
 [ -1.81301156e-02]
 [  2.08988338e-01]
 [ -1.73607984e-02]
 [ -2.59207016e-03]
 [  9.18384505e-02]
 [  1.77308022e-01]
 [ -1.08707971e-01]
 [ -2.52713992e-01]
 [  1.61610695e-01]
 [ -3.05185977e-01]
 [  4.97815570e-02]
 [ -1.66694523e-01]
 [ -1.13773037e-01]
 [  4.61140257e-02]
 [ -1.42346680e-01]
 [  1.30431948e-01]
 [  2.97300719e-02]
 [ -7.56978181e-02]
 [  2.74386165e-01]
 [  1.06393535e-01]
 [ -3.40757791e-01]
 [ -3.24417526e-02]
 [ -3.29397350e-02]
 [  2.33918732e-02]
 [  4.42826208e-02]
 [  3.11352774e-02]
 [  7.26533053e-02]
 [ -1.35766065e-01]
 [ -1.12611173e-01]
 [  8.34553564e-02]
 [  6.64390324e-02]
 [  2.23905044e-02]
 [  2.88552461e-01]
 [ -6.96529250e-02]
 [ -1.90295138e-02]
 [  8.09864866e-02]
 [  1.29450176e-01]
 [  4.88855301e-01]
 [  2.54113339e-01]
 [  2.03479358e-01]
 [ -5.67966995e-01]
 [ -1.21516830e-01]
 [  4.06234403e-02]
 [ -2.58798689e-01]
 [ -2.61295744e-02]
 [  4.87091765e-04]
 [  6.46779400e-02]
 [ -1.90241430e-01]


In [6]:
from sklearn import linear_model
reg = linear_model.Ridge(alpha=0)
start = time.time()
reg.fit(X_.T, Y_)
end = time.time()

np.savetxt("X.txt",X_,delimiter=",")
np.savetxt("Y.txt",Y_)

print(reg.coef_)
print("Time for close form solution: ", end-start)

[[  2.09469330e-01  -2.29729870e-02   1.76774248e-01  -5.68913755e-02
   -2.88219184e-02   4.77458789e-02   1.05566409e-01  -2.48397831e-01
   -1.28673554e-01   4.66459505e-02  -3.28016438e-01   4.87995653e-02
   -1.87094657e-01  -1.91040354e-01   4.48574635e-02  -1.86670553e-01
    9.00706417e-02   8.71734709e-03  -9.67016587e-02   3.01277556e-01
    1.14371820e-01  -3.69858227e-01  -3.53261681e-02  -3.43168296e-02
    2.17805526e-02   4.51623135e-02   3.17627494e-02   1.04490100e-01
   -1.81497041e-01  -1.06946065e-01   6.37319894e-02   5.45773696e-02
    6.81822301e-03   2.47388941e-01  -6.52548270e-02  -1.75555623e-02
    7.09479308e-02   1.10337002e-01   4.28713734e-01   2.30267521e-01
    1.34862113e-01  -5.27227370e-01  -1.04013109e-01   3.77276275e-02
   -3.50899528e-01  -3.35955452e-02  -1.91153125e-03   6.09637575e-02
   -1.85364810e-01  -1.54120303e-01   1.17675288e-01  -2.18102390e-01
    2.27972823e-02   2.98963880e-02  -8.15743813e-02   4.76534745e-02
   -4.79596914e-02  

In [7]:
data = pd.read_csv("household_power_consumption.txt",";")

print(data.shape)
print(data.head(3))
print(data.tail(3))

X = data.iloc[:,2:6]
Y = data.iloc[:,7]

print(X.shape)
print(Y.shape)

X_ = np.array(X.replace({'?':0})).T
X_ = X_.astype(float)
Y_ = np.array(Y.replace({'?':0})).reshape(Y.shape[0],1).astype(float)
lambda_ = 0.1
start = time.time()
I = np.identity(4)
p1 = np.matmul(X_,X_.T)
p2 = lambda_*I
inv = np.linalg.inv(p1+p2)
close_form_solution = np.matmul(np.matmul(inv,X_),Y_)
end = time.time()
print(close_form_solution)
print("Time for close form solution: ", end-start)


  interactivity=interactivity, compiler=compiler, result=result)


(2075259, 9)
         Date      Time Global_active_power Global_reactive_power  Voltage  \
0  16/12/2006  17:24:00               4.216                 0.418  234.840   
1  16/12/2006  17:25:00               5.360                 0.436  233.630   
2  16/12/2006  17:26:00               5.374                 0.498  233.290   

  Global_intensity Sub_metering_1 Sub_metering_2  Sub_metering_3  
0           18.400          0.000          1.000            17.0  
1           23.000          0.000          1.000            16.0  
2           23.000          0.000          2.000            17.0  
               Date      Time Global_active_power Global_reactive_power  \
2075256  26/11/2010  21:00:00               0.938                     0   
2075257  26/11/2010  21:01:00               0.934                     0   
2075258  26/11/2010  21:02:00               0.932                     0   

        Voltage Global_intensity Sub_metering_1 Sub_metering_2  Sub_metering_3  
2075256  239.82         

In [8]:
from sklearn import linear_model
reg = linear_model.Ridge(alpha=0)
start = time.time()
reg.fit(X_.T, Y_)
end = time.time()
print(reg.coef_)
print("Time for close form solution: ", end-start)

[[ -1.44193773e+01  -1.53988814e+00  -4.38485021e-03   4.01156840e+00]]
Time for close form solution:  0.20752978324890137


# results

If we compare the results, we can see that the training time for the second data set is 6 to 10 times larger for close form and iterative approach. However, the size of second data set is 10000 times larger than the first one. The point is that to compute the close form solution and even iterative algorithm the number of features is very important. The implementation of librareis are vectorized; therefore, it is easier to handle datasets with many samples. But when the number of features increase the problem start! In these cases people usually use feature reduction methods, such as univariate feature reduction, PCA, and so on. Interestingly for the second data set the close form solution and iterative models are more similar. 