# Lesson 2 : testing the first example on linear model

## Understanding the model

So it looks like the first example given in the notebook is using the gradient descent to find the parameters of the equation used to calculate the output (Y) from the input (X).

$Y = X * \binom{2}{3} + 1$

This example is only looking at 1 layer (`Dense`) for an input size of `30x2` and an output of `1x30`.

Below I shortened the `import` section as much as I could to only retain the necessary modules.

In [1]:
import os, sys
current_dir = os.getcwd()
LESSON_HOME_DIR = current_dir
# Allow relative imports to directories above the
# lesson folders to get access to utils.py
sys.path.insert(1, os.path.join(sys.path[0], '..'))

# Rather than importing everything manually, we'll make things easy
# and load them all in utils.py, and just import them from there.
%matplotlib inline
import utils; reload(utils)
from utils import *

Using Theano backend.


In [2]:
import numpy as np
np.set_printoptions(precision=4, linewidth=100)

from numpy.random import random
from utils import plots, get_batches, plot_confusion_matrix, get_data

import keras
from keras.models import Sequential

In [3]:
x = random((30,2))

In [4]:
x[:5]

array([[ 0.3869,  0.6284],
       [ 0.2361,  0.1706],
       [ 0.8893,  0.6298],
       [ 0.6004,  0.3427],
       [ 0.035 ,  0.9636]])

In [5]:
y = x.dot([2,3])+1

In [6]:
y

array([ 3.659 ,  1.9842,  4.6681,  3.2289,  3.9609,  5.1587,  3.2847,  2.5479,  2.4693,  3.0463,
        3.645 ,  3.4343,  1.9311,  4.3539,  2.926 ,  4.2794,  1.933 ,  3.0446,  4.6799,  4.7203,
        2.622 ,  1.8016,  4.3305,  2.0346,  3.1238,  4.102 ,  3.9062,  3.9996,  3.4122,  3.0689])

In [7]:
print(x.mean())
print(x.var())

0.469913473886
0.0644278676374


So we essentially have a `30x2` matrix as input (X) which will get associated with a set of weights, i.e. a vector `2x1`. Those weights will eventually need to match, or get very close, to the vector we used to multiple X in order to get Y.

The dot product of the matrix and the weight vector will then need to be adjusted (bias ?).

### Building the model

So we create a linear model (via `Sequential`) which only includes one layer : `Dense`.

In [8]:
lm = Sequential()
lm.add(Dense(1,input_shape=(2,)))
lm.compile(optimizer=SGD(lr=0.1),loss="mse",)

In [9]:
lm.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
dense_1 (Dense)                  (None, 1)             3           dense_input_1[0][0]              
Total params: 3
Trainable params: 3
Non-trainable params: 0
____________________________________________________________________________________________________


In [10]:
lm.get_weights()

[array([[-0.4885],
        [-0.5648]], dtype=float32), array([ 0.], dtype=float32)]

The weights are completely random at the beginning. So, compared to our data set, the loss function should be really high :

In [11]:
lm.evaluate(x,y,verbose=1)



16.270622253417969

### Adding the input and output

Let's now add the input / output and try to fit the model to get better weights.

In [12]:
lm.fit(x,y,nb_epoch=10,batch_size=1,verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x115462e90>

Based on the value of the loss function (`mse`), which is getting smaller and smaller as the number of epoch (passes) increases, we should be getting closer to our original numbers of : `2, 3 and 1`.


In [13]:
lm.evaluate(x,y,verbose=1)



0.0016251878114417195

In [14]:
lm.get_weights()

[array([[ 1.9066],
        [ 2.882 ]], dtype=float32), array([ 1.1116], dtype=float32)]

### Is more, better ?

What happens if we do another round of fitting with 20 epochs ? Will the values get that much better ?

In [15]:
lm.fit(x,y,nb_epoch=20,batch_size=1,verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x115071a50>

In [16]:
lm.evaluate(x,y,verbose=1)



2.3474388655131406e-08

In [17]:
lm.get_weights()

[array([[ 1.9996],
        [ 2.9997]], dtype=float32), array([ 1.0004], dtype=float32)]

Considering how fast (cheap) the model gets to a REALLY close approximation of our initial values, it actually make sense to run a good number of epoch to get that much closer to our expected values.

Now this might not be true anymore for much more complex models and time might push us to consider how good of an approximation we want.