<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Libraries" data-toc-modified-id="Libraries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Libraries</a></span></li><li><span><a href="#Dataset" data-toc-modified-id="Dataset-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Dataset</a></span></li><li><span><a href="#Modelling" data-toc-modified-id="Modelling-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Modelling</a></span><ul class="toc-item"><li><span><a href="#Hand-written-rules" data-toc-modified-id="Hand-written-rules-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Hand-written rules</a></span></li><li><span><a href="#Neural-Network" data-toc-modified-id="Neural-Network-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Neural Network</a></span><ul class="toc-item"><li><span><a href="#Configure-the-neural-network" data-toc-modified-id="Configure-the-neural-network-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Configure the neural network</a></span></li><li><span><a href="#Train-the-neural-network" data-toc-modified-id="Train-the-neural-network-3.2.2"><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Train the neural network</a></span></li></ul></li></ul></li><li><span><a href="#Notes" data-toc-modified-id="Notes-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Notes</a></span><ul class="toc-item"><li><span><a href="#How-to-optimize-the-loss-function" data-toc-modified-id="How-to-optimize-the-loss-function-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>How to optimize the loss function</a></span></li></ul></li></ul></div>

# Libraries

Import libraries:

In [1]:
import tensorflow as tf

from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.losses import mean_squared_error

import pandas as pd
import numpy as np
from functools import partial

Enable interactive session:

In [2]:
sess = tf.InteractiveSession()

# Dataset

Construct the XOR table:

In [3]:
data_df = pd.DataFrame({'$x_1$': [0, 0, 1, 1],
                        '$x_2$': [0, 1, 0, 1],
                        '$y$':  [0, 1, 1, 0]})

data_df

Unnamed: 0,$x_1$,$x_2$,$y$
0,0,0,0
1,0,1,1
2,1,0,1
3,1,1,0


The goal is to predict $y$ given $x_1$ and $x_2$.

# Modelling

## Hand-written rules

Create rules to encode desired outcome:

In [4]:
def hand_written_classifier(x1, x2):
    if (x1 == 0  and x2 == 0) or (x1 == 1 and x2 == 1):
        return 0
    else:
        return 1

Test the hand written rule:

In [5]:
data_df['$\hat{y}$'] = data_df[['$x_1$', '$x_2$']].apply(lambda r: hand_written_classifier(r[0], r[1]), axis=1)
data_df

Unnamed: 0,$x_1$,$x_2$,$y$,$\hat{y}$
0,0,0,0,0
1,0,1,1,1
2,1,0,1,1
3,1,1,0,0


## Neural Network

### Configure the neural network

Specify the inputs, $X$ and output, $Y$:

In [6]:
data_matrix = tf.constant(data_df.iloc[:, :3].as_matrix(), tf.float32)
data_matrix = tf.expand_dims(data_matrix, axis=0)

x = data_matrix[:, :, :2]
y = data_matrix[:, :, 2, tf.newaxis]

The inputs will be passed to a hidden layer with 3 nodes:

In [7]:
h1 = Dense(3, activation = 'sigmoid')(x)

The results from the hidden layer will be passed to the output layer, which consists of only 1 node (the prediction, $\hat{y}$):

In [8]:
y_hat = Dense(1, activation='sigmoid')(h1)

Specify a loss function (mean squared error):

In [9]:
loss = tf.losses.mean_squared_error(y, y_hat)

Define the training step (how to optimize the loss function):

In [10]:
train_step = tf.train.AdamOptimizer().minimize(loss)

### Train the neural network

Do some TensorFlow specific stuff:

In [11]:
init = tf.global_variables_initializer()
sess.run(init)

Let's look at the model's loss and predictions before training:

In [12]:
print(f'loss\t\t\t:{loss.eval():.8f}')
print(f'predictions (raw)\t:{tf.squeeze(y_hat).eval()}')
print(f'predictions (rounded)\t:{np.round(tf.squeeze(y_hat).eval())}')

loss			:0.30564791
predictions (raw)	:[ 0.79264569  0.73112535  0.72314888  0.66735625]
predictions (rounded)	:[ 1.  1.  1.  1.]


Let's see how training the network for 10,000 iterations affects the loss and predcitions:

In [14]:
for _ in range(10000):
    train_step.run()

print(f'loss\t\t\t:{loss.eval():.8f}')
print(f'predictions (raw)\t:{tf.squeeze(y_hat).eval()}')
print(f'predictions (rounded)\t:{np.round(tf.squeeze(y_hat).eval())}')

loss			:0.00001505
predictions (raw)	:[ 0.00410306  0.99480224  0.998546    0.00377528]
predictions (rounded)	:[ 0.  1.  1.  0.]


# Notes

## How to optimize the loss function

Enable eager execution (need to restart the notebook):

In [1]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()

Let $x=2$ and $y=11$.

Suppose we want to find $w$ such that $$11 = 2\, w + 1$$

Note that the solution is $w=5$.

Express this problem in Python:

In [2]:
def model_prediction(w):
    return w * 2 + 1

def loss(w):
    return (11 - model_prediction(w))**2
    

Let's say we randomly assign to $w$ the value of 0.1:

In [3]:
w = 0.1
w

0.1

In [4]:
loss(w)

96.04000000000002

How much would the loss change if we change $w$ by a very small amount?

In [5]:
loss_val, gradients = tfe.value_and_gradients_function(loss)(w)
grad_w = gradients[0]
print(grad_w.numpy())

-39.2


Hint:

<img src='images/optimization_hint.jpg'>

Define a function to update $w$ based on its gradient:

In [6]:
def update_w(w, grad_w):
    return w - 0.001 * grad_w

So, the new value of $w$ is:

In [7]:
w = update_w(w, grad_w).numpy()
w

0.1392

The value of the loss with this new $w$ is:

In [8]:
loss(w)

94.509506483975827

If we repeat this process many times, we will eventually arrive at $w=5$:

In [9]:
for i in range(0, 1501):
    grad_w = tfe.gradients_function(loss)(w)[0]
    w = update_w(w, grad_w)
    
    if i % 100 == 0:
        print(f'Value of w after {i:>4} iterations: {w:.4f}\n')

Value of w after    0 iterations: 0.1781

Value of w after  100 iterations: 2.8403

Value of w after  200 iterations: 4.0327

Value of w after  300 iterations: 4.5668

Value of w after  400 iterations: 4.8060

Value of w after  500 iterations: 4.9131

Value of w after  600 iterations: 4.9611

Value of w after  700 iterations: 4.9826

Value of w after  800 iterations: 4.9922

Value of w after  900 iterations: 4.9965

Value of w after 1000 iterations: 4.9984

Value of w after 1100 iterations: 4.9993

Value of w after 1200 iterations: 4.9997

Value of w after 1300 iterations: 4.9999

Value of w after 1400 iterations: 4.9999

Value of w after 1500 iterations: 5.0000

