neural network example from: <i><b style='color:red;'>grokking</b> <b>Deep Learning</b></i>
<p>by Andrew W. Trask</p>

In [1]:
from IPython.display import Image
Image("images/chollet_change_of_paradigm.jpg", width=400)

<IPython.core.display.Image object>

<p><i>Machine learning: A New Programming Paradigm</i></p>
<p>from <i>Chollet, François</i>. <b>Deep Learning With Python</b>. pg 4. Manning Publications, 2021.</p>

In [2]:
Image("images/traffic_light_problem.jpg", width=400)

<IPython.core.display.Image object>

<p><b>prerequisites</b></p>
$$f(x) = \mathbf{w^{*}x} + b^{*},$$
where <b>w*</b> and b* are optimal values for parameters <b>w</b> and b
<p></p>
<p>layer: function.</p>
<p>weight: value held in a linalg object that adjusts the strength of a signal as it passed through a network</p>
<p>optimizer: used to update the weights throughout the neural network. gradient descent, which uses y=x**2</p>
<p><b>used for training</b></p>
<p>activation functions: relu: REctified Linear Unit, a nonlinear function that allows a two-layer neural network to be a universal function approximator</p>
<p>backpropagation: updating the weights in the network during training.</p>
<p>both the optimizer and the activation functions must be differentiable.</p>
<p></p>
<p><b>"The interface for the neural network is simple: it accepts an <i>input</i> variable as information and a <i>weights</i> variable as knowledge, and it outputs a prediction."</b></p>
<p><b>"Measuring error simplifies the problem of training neural networks to make correct predictions."</b></p>
<p><b>"Different ways of measuring the error prioritize error differently."</b></p>
<p>Error is calculated and applied to modify the weights during each iteration of the training.</p>
<p><b>"<i>alpha</i> is the simplest way to prevent overcorrecting weight updates."</b></p>

In [3]:
%%html
<iframe src="https://numpy.org/doc/stable/reference/generated/numpy.dot.html#numpy.dot" width="800" height="565"></iframe>

In [1]:
from numpy import array, random, dot, sum

In [5]:
# initialization, functions

def gradient_descent(prediction, target):
    ''' Optimization function.
        One method for calculating error.
        Used during training.
    '''
    return (prediction - target)**2

def gradient_descent_deriv(weights):
    ''' Taking the derivative of the error
        during training
        yields amount and direction of the prediction
        from the target.
    '''
    return (2 * weights - 1)
    
def relu(x):
    ''' Activation function.
        Returns x iff x > 0; otherwise, returns 0
    '''
    return (x > 0) * x

def relu2deriv(output):
    ''' Returns 1 for input > 0; otherwise, returns 0
    '''
    return output > 0

In [6]:
# input and target, streetlight array 2
streetlights = array([1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1]).reshape(4, 3) # "x" in layer 1
walk_v_stop = array([1, 1, 0, 0]).T # values to train the model on//layer 0 (input)
hidden_size = 4

In [7]:
from IPython.display import Image
Image("images/chollet_data_moving_through_a_neural_network.jpg", width=300)

<IPython.core.display.Image object>

<p><i>Data representations learned by a digit-classification model</i>: As data moves through a neural network (NN), it is transformed as it combines with weights between the layers.</p>
<p>Hence, the data of input layer 0 is combined with the weights between layers 0 and 1, to become the data in layer 1.</p>
<p>This combining of data with weights proceeds through the neural network, until the transformed data reaches the output layer.</p>
<p>Training a neural network, simplified: the network's output is compared to desired output, the difference is quantified; and the quantification is used to update the values of the weights throughout the network. This is done again and again, until the output matches the desired output. At this point, the network is considered trained.</p>
<p>from <i>Chollet, François</i>. <b>Deep Learning With Python</b>. pg 8. Manning Publications, 2021.</p>

In [8]:
# hyperparameter
alpha = 0.2 # scale down correction to prevent overcorrection
hidden_size = 4 # sizing the linear algebra objects holding the weights

In [9]:
# initialization, creation of linalg objects holding weights in the network
# note that in this neural network,
# we set initial values of the weights to random values
weights_0_1 = gradient_descent_deriv(random.random((3, hidden_size)))
weights_1_2 = gradient_descent_deriv(random.random((hidden_size, 1)))

In [10]:
from IPython.display import Image
Image("images/chollet_understanding_how_deep_learning_works_fig_3.jpg", width=400)

<IPython.core.display.Image object>

<p><i>Understanding How Deep Learning Works</i>: A deep neural network (DNN) is made up of representation or transformation layers. Modifiable weights tune layer values. Input data is propagated forward through the network to a final layer, where it yields a prediction.</p>
<p>In a learning cycle, the error of the prediction is quantified by a loss function, an optimizer function then generates updates that are backpropagated through the network to the weights, tuning them.</p>
<p>The cycle is repeated: input forward propagates through the network, yielding a new prediction, its error is quantified; new updates to the weights in the network are back propagated.</p>
<p>from <i>Chollet, François</i>. <b>Deep Learning With Python</b>. pp 9-10. Manning Publications, 2021.</p>

In [15]:
# TNG
for iteration in range(150):
    ''' supervised learning
        streetlights are the input; walk_v_stop are the labels (desired output)
    '''
    # reset layer_2_error to 0
    layer_2_error = 0

    for index, values in enumerate(streetlights):
        # because layer_1 & 2 are calculated using the weights,
        # and the weights get modified each iteration,
        # the layers must be recalculated
        layer_0 = streetlights[index:index+1] # INPUT
        layer_1 = relu(dot(layer_0, weights_0_1)) # layer_0: (1, 3), weights_0_1: (3, 4); output: (1, 4)
        layer_2 = dot(layer_1, weights_1_2) # layer_1: (1, 4), weights_1_2: (4, 1); MODEL OUTPUT: scalar
        ''' the layers are interconnected.
            algebraically, we can restate the last two lines as:
            layer_2 = relu(layer_0.dot(weights_0_1)).dot(weights_1_2)
        '''

        # difference between layer 2 output and predicted values
        # for human consumption, to be printed at the end of the iteration
        layer_2_error += sum(gradient_descent(layer_2, walk_v_stop[index:index+1]))

        # calculate the correction
        layer_2_delta = (layer_2 - walk_v_stop[index:index+1])
        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1) # backpropagation

        # apply the correction --- note that corrected weights are running sums
        # alpha is a fractional value to dampen correction, preventing overcorrection
        weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
        weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)

    if (iteration % 10 == 9):
        print(f"Error: {layer_2_error:.25f}")

Error: 0.0000000000003043824283452
Error: 0.0000000000000172920001376
Error: 0.0000000000000009823592775
Error: 0.0000000000000000558078584
Error: 0.0000000000000000031704455
Error: 0.0000000000000000001801132
Error: 0.0000000000000000000102322
Error: 0.0000000000000000000005813
Error: 0.0000000000000000000000330
Error: 0.0000000000000000000000019
Error: 0.0000000000000000000000001
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000
Error: 0.0000000000000000000000000


In [12]:
%whos

Variable                 Type                          Data/Info
----------------------------------------------------------------
Image                    type                          <class 'IPython.core.display.Image'>
alpha                    float                         0.2
array                    builtin_function_or_method    <built-in function array>
dot                      _ArrayFunctionDispatcher      <built-in function dot>
gradient_descent         function                      <function gradient_descent at 0x782e3f51b240>
gradient_descent_deriv   function                      <function gradient_descen<...>_deriv at 0x782e3f51b2e0>
hidden_size              int                           4
index                    int                           3
iteration                int                           99
layer_0                  ndarray                       1x3: 3 elems, type `int64`, 24 bytes
layer_1                  ndarray                       1x4: 4 elems, type `float64`

In [13]:
dot(layer_0, weights_0_1).shape

(1, 4)

In [14]:
import pickle
with open('pickle/weights_0_1.pickle.bin', 'wb') as out_file:
    pickle.dump(weights_0_1, out_file)
with open('pickle/weights_1_2.pickle.bin', 'wb') as out_file:
    pickle.dump(weights_1_2, out_file)