# Linear regression example

### I will demonstrate how the linear regressor works using a randomly generated dataset

January 2019

In [16]:
import pandas as pd
import numpy as np

Random dataset to work with<br>
**The goal is to reproduce the input coefficients and intercept**
<br>Note - x contains the x0 column or values 1
<br><br> ** training set **

In [56]:
size = 1000

coefficients = [6.2,-1.4,2.1,-3,11,-8]
x = np.ones((size,len(coefficients)))
for i in range(1,len(coefficients)):
    x[:,i]=np.random.rand(size)
y = (x*coefficients).sum(axis=1) + np.random.normal(size=size)

** test set**

In [60]:
t_x = np.ones((size,len(coefficients)))
for i in range(1,len(coefficients)):
    t_x[:,i]=np.random.rand(size)
t_y = (t_x*coefficients).sum(axis=1) + np.random.normal(size=size)

### Linear Regressor 

<br>
Note: intercept is the first term of the coefficients vector 
<br>
Note 2: data is not scaled. Scaled results will converge faster but the coefficients will be different

In [57]:
from linear_regressor import LinearRegressor

** normal equation**
<br>other input parameters are ignored for the normal equation

In [58]:
lr = Linear_regressor()
lr.fit(x,y,method='normal')
lr.coeff

array([ 6.24591401, -1.28621158,  2.0903117 , -3.13137998, 10.98638789,
       -8.01492882])

coefficients are not precise because of small variations due to the noise added (above)<br>
predictions are very close to actual values (below)

In [72]:
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.04499974026170541 1.0066648232304667


 also works if x0 column is omitted (added by default):

In [69]:
lr.fit(x[:,1:],y,method='normal')
lr.coeff

array([ 6.24591401, -1.28621158,  2.0903117 , -3.13137998, 10.98638789,
       -8.01492882])

this, however, will fail, because it ignores the existence of an intercept:

In [75]:
lr.fit(x[:,1:],y,method='normal',add_x0=False)
lr.coeff

array([ 0.98865848,  4.26941362, -0.76978368, 13.40170904, -5.5292853 ])

also works with pandas DataFrames:

In [71]:
lr.fit(pd.DataFrame(x),pd.DataFrame(y),method='normal')
lr.coeff.T

array([[ 6.24591401, -1.28621158,  2.0903117 , -3.13137998, 10.98638789,
        -8.01492882]])

** batch gradient descent **


In [77]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',learning_rate=0.001,epochs=1000)
lr.coeff

array([ 6.24591364, -1.28621144,  2.09031183, -3.13137983, 10.98638804,
       -8.01492866])

In [78]:
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.044999732675320064 1.0066648243722716


Learning rate is automatically correcting itself if diverging (notice the number of epochs for correction to be effective is important)

In [83]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',learning_rate=10,epochs=10)
print('not enough epochs.. diverging')
print(lr.coeff)

print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

not enough epochs.. diverging
[110792.39463514  56700.38493855  56470.63592843  57577.43638834
  57387.94263325  58393.89372375]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 256331.64110505104 36691.69530708807


In [84]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',learning_rate=10,epochs=1000)
print('divergence is self corrected')
print(lr.coeff)
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

divergence is self corrected
[ 6.24591322 -1.28621128  2.09031197 -3.13137967 10.98638821 -8.01492849]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.04499972432868693 1.0066648256288628


Learning rate is automatically increased if improvement is too shallow
<br> tolerance is the percentage improvement cutoff to increase learning rate
<br> default is 1% (i.e. if the improvement to the cost function is less than 1%, increase learning rate)
<br> note: the increase and decrease of the learning rate (see above) will self regulate until convergence is achieved

In [86]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',learning_rate=0.000000001,epochs=1000,tolerance=0)
print('learning rate too slow')
print(lr.coeff)
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

learning rate too slow
[1.29574479e-05 6.22760153e-06 6.68182566e-06 5.94412380e-06
 8.19604885e-06 5.29657516e-06]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: -6.4331154991408654 4.099024613763339


In [88]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',learning_rate=0.000000001,epochs=1000,tolerance=0.01)
print('slow learning rate self corrected')
print(lr.coeff)
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

slow learning rate self corrected
[ 6.24591342 -1.28621136  2.0903119  -3.13137975 10.98638813 -8.01492857]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.044999728444160295 1.0066648250093386


Starting coefficients can be provided to continue improving (rather than start from the initiated coefficients of 0)
<br> Notice that the first 100 epochs are not enough, but then the second pass of 100 epochs provide a much better model (not the same as running 200 to begin with, because of the self correcting learning rate)

In [91]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',epochs=100)
print('not enough epochs')
print(lr.coeff)
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

not enough epochs
[1.57337471 0.73551515 0.81492666 0.68194538 1.06939357 0.56595778]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: -2.8977576979298245 4.0135220699145435


In [92]:
lr.fit(x,y,method='batch',epochs=100,starting_coeff=lr.coeff)
print('continue from the same point')
print(lr.coeff)
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

continue from the same point
[ 3.099861    1.08796721  1.65599573  0.66719538  3.34485373 -0.19194288]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: -0.01717568318022522 3.4625878062029667


In [94]:
lr = Linear_regressor()
lr.fit(x,y,method='batch',epochs=200)
print(lr.coeff)
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

[ 3.37520379  0.82634999  1.828507    0.09358271  4.75337794 -1.42320025]
y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: -0.001866339189034683 2.939901509232615


** mini-batch gradient descent **
<br>Note: for stochastic set bin_size to 1
<br>Note 2: bin_size=dataset size will **not** produce the same results as using batch, because stochastic gradient descent doesn't use the self correcting learning rate.

In [101]:
lr = Linear_regressor()
lr.fit(x,y,method='stochastic',learning_rate=0.001,epochs=2000,bin_size=20)
lr.coeff

array([ 5.68928159, -1.00478809,  2.29575836, -2.87378283, 10.96070908,
       -7.635959  ])

In [102]:
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.04888470983294069 1.0213740914946345


Smaller bins tend to provide faster convergence (notice that the number of epochs is halves, but the predictions are comparable)

In [103]:
lr = Linear_regressor()
lr.fit(x,y,method='stochastic',learning_rate=0.001,epochs=1000,bin_size=1)
lr.coeff

array([ 6.1971739 , -1.25876523,  2.17759276, -3.12043281, 10.97782032,
       -7.99787993])

In [104]:
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.06662657879586524 1.0071728611797406


### Conclusion

Normal equation works best, but if gradient descent is still wanted, start with stochastic (bin_size=1) which is faster, then use the same coefficients as input for batch, where the gradients are more precise.

In [107]:
lr = Linear_regressor()
lr.fit(x,y,method='stochastic',learning_rate=0.001,epochs=800,bin_size=1)
lr.fit(x,y,method='batch',learning_rate=0.001,epochs=200,starting_coeff=lr.coeff)
lr.coeff

array([ 6.24589382, -1.28620412,  2.0903186 , -3.1313721 , 10.98639612,
       -8.01492029])

In [108]:
print("y mean and std:  ",y.mean(),y.std())
print("h-y mean and std:",(lr.predict(t_x)-t_y).mean(),(lr.predict(t_x)-t_y).std())

y mean and std:   6.478738468499331 4.119476341055892
h-y mean and std: 0.044999335538243625 1.0066648841413905
