<div id="reminder" style="border-radius: 5px; background-color:#f5f5f5; padding: 15px 5px; " >
<p>Use this notebook to follow along with the lab tutorial.</p>
</div>

# <font color="blue">Lesson 9 - Deep Learning</font>

# Simple RNN for Adding Binary Bits
This notebook adds up the 8 bits of a byte into a final number. It shows the simplicity of taking in sequencial data and converting that into information to generate an output. 

NOTE: bits are built up in reverse order so we start with the end and then add the next two numbers and the next two numbers and so on until we reach the final solution. 

A special shout out to <a href="https://iamtrask.github.io/">**I Am Trask**</a> who wrote the original version of this notebook.

In [1]:
import copy, numpy as np
np.random.seed(0)

### Define the Activation Function and the Backpropgation
We are creating the "brain" of the RNN--the s-haped activation function (TanH) and the backpropagation through time (BPTT) for updating the weights which is the derivative values of the output from the "squashed" activation function. 

In [2]:
# compute tanH (s-shaped) activation function 
def sigmoid(x):
    output = 1/(1+np.exp(-x))
    return output

# compute the BPTT
def sigmoid_output_to_derivative(output):
    return output*(1-output)

### Generate 8-Bits Training Dataset
This section generates a dictionary of 8 bit bytes for our training and testing use. It also sets the binary dimension which will becoem relevant in the training phase to cycle through individual bits. 

In [3]:
# training dataset generation
int2binary = {}
binary_dim = 8 # NOTE this will be used later for the training phase

largest_number = pow(2,binary_dim)
binary = np.unpackbits(
    np.array([range(largest_number)],dtype=np.uint8).T,axis=1)
for i in range(largest_number):
    int2binary[i] = binary[i]

print(type(binary))
print("Number of numbers in our numpy array =", len(binary))
print("Largest value =", largest_number)
print(binary[:5,:])

<class 'numpy.ndarray'>
Number of numbers in our numpy array = 256
Largest value = 256
[[0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1]
 [0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 1 1]
 [0 0 0 0 0 1 0 0]]


### Set the RNNs hyperparameters
This section is where we can set (and test other configurations of) the RNNs hyperparameters

* alpha -- this is where we set the learning rate to various numbers and test, from here, what is the best one for the problem. 
* input_dim -- width of the input vector. This is the length of the sentence, phrase, word, etc. that you are putting into the system. In this example we are adding two numbers so we need just 2. 
* hidden_dim -- the vector to store the weights internally. This, like alpha, is a hyperparameter to modify/test for improved accuracy--in general this is larger than the input vector and not arbitrarily large given the affect on computational performance.
* output_dim -- the size of the output vector which in this case is 1
* epochs -- the number of times we loop through all of the training examples to train the network

In [11]:
alpha = 0.1       # learning rate
input_dim = 2     # input dimension -- the length of the "sentence"
hidden_dim = 16   # width of the hidden layer -- 2x the size of our max length of inputs
output_dim = 1    # size of what we want to return
epochs = 100000    # number of training iterations

### Initialize the RNN Synapses
This is where we initialize the weights between the input, hidden and output layers--the memory of the network to random numbers. We multiply the random number (a float between zero and 1) and substract one to ensure they range between -1 to 1 like our tanH activation function. 
* synapse_0 -- weights between the input and hidden layer; size is input by hidden
* synapse_1 -- weights between the hidden and output layer; size is hidden by output
* synapse_2 -- weights between the hidden layer and the previous time step -- the "loop back"; size is hidden by hidden

Bacause we are remembering through time, we need to store the updates that accumulate for these these weights. We are initializing these to zero because they will get assigned the synapse weights. 

In [12]:
# initialize neural network weights
synapse_0 = 2*np.random.random((input_dim,hidden_dim)) - 1
synapse_1 = 2*np.random.random((hidden_dim,output_dim)) - 1
synapse_h = 2*np.random.random((hidden_dim,hidden_dim)) - 1

synapse_0_update = np.zeros_like(synapse_0)
synapse_1_update = np.zeros_like(synapse_1)
synapse_h_update = np.zeros_like(synapse_h)

print(synapse_0)

[[-0.75780298  0.59710521  0.22229318 -0.07349527 -0.64707463 -0.0919588
  -0.88342587 -0.58128288  0.48741388  0.71434493  0.11874954  0.15565286
   0.48968942 -0.86073366 -0.75763869  0.56551254]
 [-0.05589995 -0.9589726   0.50834333 -0.88035476  0.27569852  0.63611723
  -0.1044514   0.37645432  0.74216247 -0.07997547 -0.31793472 -0.41183776
   0.93767094 -0.31749587  0.75762828  0.05324879]]


Before we go through the training the RNN, let's understand the problem a bit better. 

The first thing we need is a "new" number. We also need the binary representation of that number. So we generate a random number and divide it by 2. The reason we halve it is because we are adding to binary numbers which if we don't start from half or less, could exceed the 8-bits.

The second thing we do is lookup the binary value using int2binary. Now we have our two numbers to add together. 

Below I'm showing an example "num" which will become variables "a" and "b" in the loop.

In [13]:
# generate a simple addition problem (a + b = c)
num_int = np.random.randint(largest_number/2) # int version
print(num_int)
num = int2binary[num_int] # binary encoding
print(num)

81
[0 1 0 1 0 0 0 1]


The "right answer" is the addition of two values. In the loop this is "c". In the example I'm using "sum" and I'll simply add "num" to itself. "c" is the *Y* or actual value. We also need "d" the *Y-hat*, or predicted value. 

In [14]:
sum_int = num_int + num_int
print(sum_int)
sum = int2binary[sum_int]
print(sum)

162
[1 0 1 0 0 0 1 0]


### Training Loop
This is where we generate the problem and train the RNN on how to solve it. 

NOTE: instead setting the problem values in the outer loop, "j", we could have set up an array of values and used the inner loops to cycle through the array's dimensions, but including it in the outer loop is more parsimonious. As a result we have to reinitalize the variables for storage for each iteration. 

In [15]:
for j in range(epochs):
    
    '''Initialization'''
    # generate a simple addition problem (a + b = c)
    a_int = np.random.randint(largest_number/2) # int version
    a = int2binary[a_int] # binary encoding

    b_int = np.random.randint(largest_number/2) # int version
    b = int2binary[b_int] # binary encoding

    # true answer - the "Y"
    c_int = a_int + b_int
    c = int2binary[c_int] 
    
    # where we'll store our best guess (binary encoded) - the "Y-hat" of predicted values
    d = np.zeros_like(c) # initialize to zero Y-hat array of predicted values

    overallError = 0 # initalize error value for each epoch to monitor the convergence
    
    # initialize lists used to keep track of the layer 2 derivatives and layer 1 values at each time step
    layer_2_deltas = list()                     # derivatives from priors - layer 2
    layer_1_values = list()                     # values from layer 1
    layer_1_values.append(np.zeros(hidden_dim)) # append zeros to store
    '''End Initialzation'''
    
    # moving along the positions in the binary encoding -- right to left
    for position in range(binary_dim):
        
        # generate input and output
        # X list of a and b (in binary), indexed with the farthest right as zero
        # y is the correct answer (in binary), indexed the same 
        X = np.array([[a[binary_dim - position - 1],b[binary_dim - position - 1]]])
        y = np.array([[c[binary_dim - position - 1]]]).T # transpose the array

        '''Construct hidden layer'''
        # propagate input to the hidden layer (X,synapse_0)
        # propagate *previous* hidden layer to the current hidden layer(prev_layer_1, synapse_h)
        # sum these two vectors
        layer_1 = sigmoid(np.dot(X,synapse_0) + np.dot(layer_1_values[-1],synapse_h))

        '''Construct output layer'''
        # propagate hidden layer to the output --> make a prediction
        layer_2 = sigmoid(np.dot(layer_1,synapse_1))

        '''Verify results'''
        # determine how far predicted is from actual, store the derivative and calculate error
        layer_2_error = y - layer_2 # comparison
        layer_2_deltas.append((layer_2_error)*sigmoid_output_to_derivative(layer_2)) # store derivative at this timestep
        # save this to show it at the end
        overallError += np.abs(layer_2_error[0]) # calculate sum of errors
    
        '''For logging progress'''
        # decode estimate so we can print it out at the end
        d[binary_dim - position - 1] = np.round(layer_2[0][0])
        
        '''Set up for next pass'''
        # store hidden layer so we can use it in the next timestep
        layer_1_values.append(copy.deepcopy(layer_1))
        
    # initialize future_layer -- reset 
    future_layer_1_delta = np.zeros(hidden_dim) 
    
    '''Generate FF Loop'''
    for position in range(binary_dim):
        
        # generate input (X) and output (y)
        X = np.array([[a[position],b[position]]])
        
        '''Access current (time) hidden layer'''
        layer_1 = layer_1_values[-position-1]
        
        '''Access previous hidden layer'''
        prev_layer_1 = layer_1_values[-position-2]
        
        '''Get the output error'''
        layer_2_delta = layer_2_deltas[-position-1]
        
        '''Generate derivative for BPTT'''
        # compute error at current hidden layer given future layer and current output layer
        layer_1_delta = (future_layer_1_delta.dot(synapse_h.T) + layer_2_delta.dot(synapse_1.T)) * sigmoid_output_to_derivative(layer_1)

        '''BPTT--update the weights'''
        # Update weights between input hidden and output layers
        synapse_1_update += np.atleast_2d(layer_1).T.dot(layer_2_delta)
        synapse_h_update += np.atleast_2d(prev_layer_1).T.dot(layer_1_delta)
        synapse_0_update += X.T.dot(layer_1_delta)
        
        '''Store the future layer difference'''
        # Update the error rate in the future prediction as an input to the backpropgation step for next time
        future_layer_1_delta = layer_1_delta
    
    # Update the Weights
    synapse_0 += synapse_0_update * alpha
    synapse_1 += synapse_1_update * alpha
    synapse_h += synapse_h_update * alpha    

    # Reset the update variables to zero
    synapse_0_update *= 0
    synapse_1_update *= 0
    synapse_h_update *= 0
    
    # print out progress
    if(j % 1000 == 0):
        print("Error:" + str(overallError))
        print("Pred:" + str(d))
        print("True:" + str(c))
        out = 0
        for index,x in enumerate(reversed(d)):
            out += x*pow(2,index)
        print(str(a_int) + " + " + str(b_int) + " = " + str(out))
        print("")

Error:[5.7938749]
Pred:[0 0 0 0 0 0 0 0]
True:[1 1 0 1 1 0 1 1]
126 + 93 = 0

Error:[3.92043515]
Pred:[1 1 1 1 1 1 1 1]
True:[0 1 0 1 0 1 1 0]
42 + 44 = 255

Error:[3.65726486]
Pred:[0 0 1 0 0 0 1 1]
True:[0 1 1 0 0 0 1 0]
17 + 81 = 35

Error:[3.21084771]
Pred:[0 0 1 1 1 0 1 0]
True:[0 0 1 0 1 0 1 0]
30 + 12 = 58

Error:[3.41703455]
Pred:[1 1 1 1 1 1 1 0]
True:[1 1 0 0 1 1 1 0]
95 + 111 = 254

Error:[2.8969541]
Pred:[1 0 0 0 1 0 0 0]
True:[1 0 0 0 1 1 0 0]
127 + 13 = 136

Error:[1.05546854]
Pred:[1 1 0 0 0 0 1 0]
True:[1 1 0 0 0 0 1 0]
121 + 73 = 194

Error:[0.65233652]
Pred:[0 1 0 1 0 1 0 0]
True:[0 1 0 1 0 1 0 0]
1 + 83 = 84

Error:[0.43479605]
Pred:[0 1 0 1 0 1 0 1]
True:[0 1 0 1 0 1 0 1]
8 + 77 = 85

Error:[0.40241507]
Pred:[0 1 0 0 1 1 1 0]
True:[0 1 0 0 1 1 1 0]
48 + 30 = 78

Error:[0.38126451]
Pred:[0 1 0 0 1 1 0 1]
True:[0 1 0 0 1 1 0 1]
29 + 48 = 77

Error:[0.34108019]
Pred:[1 0 0 0 1 0 1 1]
True:[1 0 0 0 1 0 1 1]
106 + 33 = 139

Error:[0.28256375]
Pred:[0 1 0 1 1 1 1 1]
True:

<div id="reminder" style="border-radius: 5px; background-color:#f5f5f5; padding: 15px 5px; " >
<p>Please see the Wine Neural Net notebook for your opportunity to try for yourself.</p>
</div>