from: <i><b style='color:red;'>grokking</b> <b>Deep Learning</b></i>
<p>by Andrew W. Trask</p>

In [None]:
<p><b>prerequisites</b></p>
$$f(x) = \mathbf{w^{*}x} + b^{*},$$
where <b>w*</b> and b* are optimal values for parameters <b>w</b> and b
<p></p>
<p>perceptron</p>
<p>gradient descent</p>
<p>backpropagation</p>
<p><b>"The interface for the neural network is simple: it accepts an <i>input</i> variable as information and a <i>weights</i> variable as knowledge, and it outputs a prediction."</b></p>
<p><b>"Measuring error simplifies the problem of training neural networks to make correct predictions."</b></p>
<p><b>"Different ways of measuring the error prioritize error differently."</b></p>
<p>Error is calculated and applied to modify the weights during each iteration of the training.</p>
<p><b>"<i>alpha</i> is the simplest way to prevent overcorrecting weight updates."</b></p>

In [1]:
import numpy as np

In [34]:
# initialization, functions

def gradient_descent(prediction, target):
    ''' One method for calculating error.
    '''
    return (prediction - target)**2

def gradient_descent_deriv(weights):
    ''' Taking the derivative of the error
        during training
        yields amount and direction of the prediction
        from the target.
    '''
    return (2 * weights - 1)
    
def relu(x):
    ''' Returns x iff x > 0; otherwise, returns 0
    '''
    return (x > 0) * x

def relu2deriv(output):
    ''' Returns 1 for input > 0; otherwise, returns 0
    '''
    return output > 0

In [35]:
# input and target
streetlights = np.array([1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1]).reshape(4, 3) # layer 0 (input) & "x" in layer 1
walk_v_stop = np.array([1, 1, 0, 0]).T # values to train the model on

In [36]:
# hyperparameters
alpha = 0.2 # scale down correction to prevent overcorrection
hidden_size = 4

In [78]:
# initialization, weights
weights_0_1 = gradient_descent_deriv(np.random.random((3, hidden_size)))
weights_1_2 = gradient_descent_deriv(np.random.random((hidden_size, 1)))

In [79]:
# training
for iteration in range(300):
    ''' supervised learning
    '''
    # reset layer_2_error to 0
    layer_2_error = 0

    for index, values in enumerate(streetlights):
        layer_0 = streetlights[index:index+1] # rename input
        layer_1 = relu(np.dot(layer_0, weights_0_1))
        layer_2 = np.dot(layer_1, weights_1_2)

        # difference between layer 2 output and predicted values
        layer_2_error += np.sum(gradient_descent(layer_2, walk_v_stop[index:index+1]))

        # calculate the correction
        layer_2_delta = (layer_2 - walk_v_stop[index:index+1])
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)

        # apply the correction --- note that corrected weights are running sums
        # alpha is a fractional value to dampen correction, preventing overcorrection
        weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)

    if (iteration % 10 == 9):
        print(f"Error: {layer_2_error:.25f}")

Error: 0.4594603197266258209907619
Error: 0.0150584876951698632546739
Error: 0.0000520447361432251450397
Error: 0.0000001151011976940527282
Error: 0.0000000002842043991852161
Error: 0.0000000000009335346305984
Error: 0.0000000000000043588528056
Error: 0.0000000000000000257453389
Error: 0.0000000000000000001678922
Error: 0.0000000000000000000011316
Error: 0.0000000000000000000000077
Error: 0.0000000000000000000000001
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.00000000000

In [71]:
%whos

Variable                 Type        Data/Info
----------------------------------------------
alpha                    float       0.2
gradient_descent         function    <function gradient_descent at 0x72c91998c720>
gradient_descent_deriv   function    <function gradient_descen<...>_deriv at 0x72c8bcd47ec0>
hidden_size              int         4
index                    int         3
iteration                int         149
layer_0                  ndarray     1x3: 3 elems, type `int64`, 24 bytes
layer_1                  ndarray     1x4: 4 elems, type `float64`, 32 bytes
layer_1_delta            ndarray     1x4: 4 elems, type `float64`, 32 bytes
layer_2                  ndarray     1x1: 1 elems, type `float64`, 8 bytes
layer_2_delta            ndarray     1x1: 1 elems, type `float64`, 8 bytes
layer_2_error            float64     4.905728754343167e-30
np                       module      <module 'numpy' from '/ho<...>kages/numpy/__init__.py'>
relu                     function    <func

In [90]:
import pickle
with open('pickle/weights_0_1.pickle.bin', 'wb') as out_file:
    pickle.dump(weights_0_1, out_file)
with open('pickle/weights_1_2.pickle.bin', 'wb') as out_file:
    pickle.dump(weights_1_2, out_file)

[0;31mSignature:[0m
[0mpickle[0m[0;34m.[0m[0mdump[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mobj[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfile[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprotocol[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfix_imports[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbuffer_callback[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Write a pickled representation of obj to the open file object file.

This is equivalent to ``Pickler(file, protocol).dump(obj)``, but may
be more efficient.

The optional *protocol* argument tells the pickler to use the given
protocol; supported protocols are 0, 1, 2, 3, 4 and 5.  The default
protocol is 4. It was introduced in Python 3.4, and is incompatible
with previous versions.

Specifying a negative protocol version selects th

In [87]:
layer_0 = [1, 1, 1] # rename input
# weights_0_1, weights_1_2
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = np.dot(layer_1, weights_1_2)
print(int(layer_2[0]))

0


[1.48709859e-16]
