# Collecting All Weights Into a Vector

## Creating the Weight Vector

In [1]:
from topic_banner import new_topic
new_topic('Creating the Weight Vector')

ModuleNotFoundError: No module named 'topic_banner'

To use general-purpose optimization functions, like the SGD, AdamW, and SCG functions discussed last time, we must create a vector (one-dimensional) containing all of the weights in our network.  This vector will be passed as an argument to optimization functions as the parameters to be modified.

Can we just convert our list of weight matrices (`self.Ws`) into a list of flattened matrices?  Let's try that.

In [3]:
import numpy as np

Imagine we have a network with 4 inputs, 2 units in a single hidden layer, and 10 units in the output layer.  We would instantiate this with

        nnet = NeuralNetwork(4, [2], 10)

This network will contain two weight matrices of shapes $5 \times 2$ and $3 \times 10$, right?

In [4]:
W1shape = (5, 2)
W2shape = (3, 10)

In [5]:
n_weights_W1 = np.prod(W1shape)
n_weights_W2 = np.prod(W2shape)
n_weights_W1, n_weights_W2

(10, 30)

In [6]:
W1 = np.arange(0, n_weights_W1).reshape(W1shape)
W2 = (-np.arange(0, n_weights_W2)).reshape(W2shape)

In [7]:
W1, W2

(array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]),
 array([[  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9],
        [-10, -11, -12, -13, -14, -15, -16, -17, -18, -19],
        [-20, -21, -22, -23, -24, -25, -26, -27, -28, -29]]))

There are several ways to "flatten" a numpy array.

In [8]:
W1.flatten()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
W1.ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]:
W1.reshape(-1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Let's use `flatten`, the method with the most understandable name.  We can collect these into a one-dimensional vector using `np.hstack`.

In [11]:
all_weights = np.hstack((W1.flatten(), W2.flatten()))
all_weights

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   0,  -1,  -2,
        -3,  -4,  -5,  -6,  -7,  -8,  -9, -10, -11, -12, -13, -14, -15,
       -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28,
       -29])

When our optimization function updates these values, we want to check that our original weight matrices `W1` and `W2` are also modified.  This is because our `_forward` and `_gradient` functions will still be using `W1` and `W2`, that are stored in our `self.Ws` list.

In [12]:
all_weights[0] = 9999
all_weights

array([9999,    1,    2,    3,    4,    5,    6,    7,    8,    9,    0,
         -1,   -2,   -3,   -4,   -5,   -6,   -7,   -8,   -9,  -10,  -11,
        -12,  -13,  -14,  -15,  -16,  -17,  -18,  -19,  -20,  -21,  -22,
        -23,  -24,  -25,  -26,  -27,  -28,  -29])

In [13]:
W1, W2

(array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]),
 array([[  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9],
        [-10, -11, -12, -13, -14, -15, -16, -17, -18, -19],
        [-20, -21, -22, -23, -24, -25, -26, -27, -28, -29]]))

Rats!  The memory location for the vector `all_weights` is not the same memory location as `W1` and `W2`.

This is because the `np.hstack` function returns a `numpy` array, which allocates contiguous memory locations for all elements. `W1` and `W2` are not in contiguous memory locations, so `np.hstack` returns a `numpy` array not in the same memory locations as `W1` and `W2`.

What to do, what to do......

## Create Weight Vector and All Weight Matrices Be in Same Memory Locations

In [14]:
new_topic('Create Weight Vector and All Weight Matrices Be in Same Memory Locations')

NameError: name 'new_topic' is not defined

Maybe we can allocate `all_weights` first, then assign `W1` and `W2` to refer to memory locations within `all_weights`.  We can do this indexing into `all_weights` starting at the correct index and extending for as many weights as are in `W1` and `W2`.

In [15]:
W1 = all_weights[0:n_weights_W1]
W2 = all_weights[n_weights_W1:n_weights_W1 + n_weights_W2]
W1, W2

(array([9999,    1,    2,    3,    4,    5,    6,    7,    8,    9]),
 array([  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9, -10, -11, -12,
        -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25,
        -26, -27, -28, -29]))

Now, hopefully if we reshape them the way we want them they still refer to the same memory locations.  We can test this by reshaping each weight matrix, then assign a new value to `all_weights` and see of `W1` and `W2` have the changed values.

In [16]:
W1 = W1.reshape(W1shape)
W2 = W2.reshape(W2shape)
W1, W2

(array([[9999,    1],
        [   2,    3],
        [   4,    5],
        [   6,    7],
        [   8,    9]]),
 array([[  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9],
        [-10, -11, -12, -13, -14, -15, -16, -17, -18, -19],
        [-20, -21, -22, -23, -24, -25, -26, -27, -28, -29]]))

In [17]:
all_weights *= 2
all_weights

array([19998,     2,     4,     6,     8,    10,    12,    14,    16,
          18,     0,    -2,    -4,    -6,    -8,   -10,   -12,   -14,
         -16,   -18,   -20,   -22,   -24,   -26,   -28,   -30,   -32,
         -34,   -36,   -38,   -40,   -42,   -44,   -46,   -48,   -50,
         -52,   -54,   -56,   -58])

In [None]:
W1, W2

Yippee! This works!

## Assign Changes to Weight Vector Without Creating a Copy

In [None]:
new_topic('Assign Changes to Weight Vector Without Creating a Copy')

But, watch out.  We must make sure the optimization steps are updating `all_weights` in place, and not making a new version of `all_weights`, which would break the correspondence between `all_weights` and `W1` and `W2`.

Using `*=` worked.  Here is another way of assigning changes that also works.

In [18]:
all_weights[:] = all_weights * -4

In [20]:
all_weights

array([-79992,     -8,    -16,    -24,    -32,    -40,    -48,    -56,
          -64,    -72,      0,      8,     16,     24,     32,     40,
           48,     56,     64,     72,     80,     88,     96,    104,
          112,    120,    128,    136,    144,    152,    160,    168,
          176,    184,    192,    200,    208,    216,    224,    232])

In [19]:
W1, W2

(array([[-79992,     -8],
        [   -16,    -24],
        [   -32,    -40],
        [   -48,    -56],
        [   -64,    -72]]),
 array([[  0,   8,  16,  24,  32,  40,  48,  56,  64,  72],
        [ 80,  88,  96, 104, 112, 120, 128, 136, 144, 152],
        [160, 168, 176, 184, 192, 200, 208, 216, 224, 232]]))

But here is one form that does make a new copy of `all_weights`.

In [22]:
all_weights = all_weights + 0.1

In [23]:
all_weights

array([-7.99918e+04, -7.80000e+00, -1.58000e+01, -2.38000e+01,
       -3.18000e+01, -3.98000e+01, -4.78000e+01, -5.58000e+01,
       -6.38000e+01, -7.18000e+01,  2.00000e-01,  8.20000e+00,
        1.62000e+01,  2.42000e+01,  3.22000e+01,  4.02000e+01,
        4.82000e+01,  5.62000e+01,  6.42000e+01,  7.22000e+01,
        8.02000e+01,  8.82000e+01,  9.62000e+01,  1.04200e+02,
        1.12200e+02,  1.20200e+02,  1.28200e+02,  1.36200e+02,
        1.44200e+02,  1.52200e+02,  1.60200e+02,  1.68200e+02,
        1.76200e+02,  1.84200e+02,  1.92200e+02,  2.00200e+02,
        2.08200e+02,  2.16200e+02,  2.24200e+02,  2.32200e+02])

In [None]:
W1, W2

The lesson here is, if you are assigning changes to `all_weights`, or to `W1` and `W2`, always use `[:]` on the left side of the assignment!!

## Creating Weights for Any Neural Network

In [None]:
new_topic('Creating Weights for Any Neural Network')

So, how should we create our weights in our `NeuralNetwork` class, to handle any number of layers?  Let's say we are in the constructor for `NeuralNetwork` and have `n_inputs`, `n_hiddens_each_layer` and `n_outputs` available. First, define all of the weight matrices' shapes.


In [24]:
n_inputs = 2
n_hiddens_each_layer = [5, 5, 4]
n_outputs = 1

In [25]:
ni = n_inputs
Wshapes = []
for nh in n_hiddens_each_layer:
    Wshapes.append((1 + ni, nh))
    ni = nh
Wshapes.append((1 + ni, n_outputs))

Wshapes    

[(3, 5), (6, 5), (6, 4), (5, 1)]

Now we can create the vector of all weights after adding up the number of weights in each layer.

In [26]:
[np.prod(Wshape) for Wshape in Wshapes]

[15, 30, 24, 5]

In [27]:
n_weights = np.sum([np.prod(Wshape) for Wshape in Wshapes])
n_weights

74

In [28]:
all_weights = np.random.uniform(-1, 1, n_weights)
all_weights

array([-0.65087874, -0.79507308,  0.45242363, -0.12001258, -0.12695171,
       -0.88881351,  0.95602824,  0.90609769, -0.22648106, -0.05848763,
        0.88202565, -0.25044262, -0.39985151,  0.97396287, -0.52498817,
       -0.29327888,  0.20102603,  0.27018959, -0.28339695, -0.26588622,
       -0.9099878 , -0.86398287,  0.48332026,  0.12466235, -0.57145465,
        0.35344478,  0.01286574,  0.95561807, -0.60697185, -0.75745204,
       -0.36580154,  0.25832087, -0.78852112,  0.84095877, -0.18104839,
       -0.73727915,  0.6467106 , -0.83570924, -0.37132305, -0.5759031 ,
       -0.88331472,  0.72855873, -0.01505841,  0.77150906,  0.05207972,
       -0.54023862, -0.61616603, -0.99772161, -0.84624556,  0.80151901,
       -0.11507971, -0.94227705, -0.21688885, -0.95792591, -0.95400767,
        0.08563548,  0.46232305,  0.75560665, -0.38138449,  0.59447186,
       -0.96878661, -0.64114824, -0.18845752,  0.66176745, -0.66572959,
       -0.12297388, -0.52605101,  0.24270048, -0.95363181,  0.88

Now we need to define our list of weight matrices for each layer as views into this vector.

In [29]:
Ws = []
first_index = 0
for Wshape in Wshapes:
    last_index = first_index + np.prod(Wshape)
    nin = Wshape[0]
    W = all_weights[first_index:last_index].reshape(Wshape) / np.sqrt(nin)
    Ws.append(W)
    first_index = last_index

Ws

[array([[-0.37578502, -0.45903566,  0.2612069 , -0.0692893 , -0.0732956 ],
        [-0.51315672,  0.55196316,  0.52313575, -0.1307589 , -0.03376785],
        [ 0.50923775, -0.14459311, -0.23085438,  0.56231773, -0.30310206]]),
 array([[-0.1197306 ,  0.08206853,  0.11030444, -0.11569632, -0.10854759],
        [-0.37150096, -0.35271953,  0.19731467,  0.05089319, -0.23329538],
        [ 0.14429323,  0.00525242,  0.39012944, -0.24779522, -0.3092285 ],
        [-0.14933785,  0.10545905, -0.3219124 ,  0.34331998, -0.0739127 ],
        [-0.30099295,  0.2640185 , -0.34117687, -0.151592  , -0.23511145],
        [-0.36061172,  0.29743285, -0.00614757,  0.31496726,  0.02126146]]),
 array([[-0.22055149, -0.25154873, -0.40731814, -0.3454783 ],
        [ 0.32721877, -0.04698109, -0.38468299, -0.0885445 ],
        [-0.39107162, -0.389472  ,  0.03496054,  0.1887426 ],
        [ 0.30847512, -0.15569957,  0.24269212, -0.39550548],
        [-0.26174767, -0.07693746,  0.27016543, -0.27178297],
        [-0

In [30]:
all_weights

array([-0.65087874, -0.79507308,  0.45242363, -0.12001258, -0.12695171,
       -0.88881351,  0.95602824,  0.90609769, -0.22648106, -0.05848763,
        0.88202565, -0.25044262, -0.39985151,  0.97396287, -0.52498817,
       -0.29327888,  0.20102603,  0.27018959, -0.28339695, -0.26588622,
       -0.9099878 , -0.86398287,  0.48332026,  0.12466235, -0.57145465,
        0.35344478,  0.01286574,  0.95561807, -0.60697185, -0.75745204,
       -0.36580154,  0.25832087, -0.78852112,  0.84095877, -0.18104839,
       -0.73727915,  0.6467106 , -0.83570924, -0.37132305, -0.5759031 ,
       -0.88331472,  0.72855873, -0.01505841,  0.77150906,  0.05207972,
       -0.54023862, -0.61616603, -0.99772161, -0.84624556,  0.80151901,
       -0.11507971, -0.94227705, -0.21688885, -0.95792591, -0.95400767,
        0.08563548,  0.46232305,  0.75560665, -0.38138449,  0.59447186,
       -0.96878661, -0.64114824, -0.18845752,  0.66176745, -0.66572959,
       -0.12297388, -0.52605101,  0.24270048, -0.95363181,  0.88

Uh oh.  Not the same values!!!  What happened?

Right.  Somewhere we performed an operation that caused a copy to be made.

Must be the division.  Let's rewrite that code using `/='.

In [47]:
all_weights = np.random.uniform(-1, 1, n_weights)

Ws = []
first_index = 0
for Wshape in Wshapes:
    last_index = first_index + np.prod(Wshape)
    nin = Wshape[0]
    W = all_weights[first_index:last_index].reshape(Wshape)
    W /= np.sqrt(nin)
    Ws.append(W)
    first_index = last_index

Ws

[array([[-0.40969216,  0.04133855, -0.56128029,  0.45462498,  0.56030044],
        [ 0.46000466,  0.37721153, -0.10279191, -0.44368135,  0.21354961],
        [ 0.53388287,  0.35757322,  0.06017105, -0.44949076, -0.14913744]]),
 array([[ 0.36442311,  0.20195073,  0.09952844,  0.24651935,  0.08488891],
        [-0.03340747, -0.10892462,  0.08592047, -0.25973948, -0.13652333],
        [ 0.29521669, -0.30356495,  0.25090528,  0.03745543,  0.27744282],
        [-0.05828996, -0.22619273, -0.22286967, -0.38864965, -0.33984051],
        [ 0.24527944, -0.23858636, -0.20241847,  0.26209233,  0.16592285],
        [-0.0692501 , -0.24102665, -0.3293993 , -0.36205287,  0.1472432 ]]),
 array([[-0.35581485,  0.23450013, -0.29436977, -0.25344666],
        [ 0.31894569, -0.40030664,  0.21805523, -0.35270905],
        [ 0.39866461, -0.13211033, -0.13464466, -0.04075297],
        [ 0.0939239 , -0.1034243 , -0.16778267,  0.1276572 ],
        [-0.3174877 , -0.11936896, -0.08304427, -0.20484837],
        [-0

In [48]:
all_weights

array([-0.40969216,  0.04133855, -0.56128029,  0.45462498,  0.56030044,
        0.46000466,  0.37721153, -0.10279191, -0.44368135,  0.21354961,
        0.53388287,  0.35757322,  0.06017105, -0.44949076, -0.14913744,
        0.36442311,  0.20195073,  0.09952844,  0.24651935,  0.08488891,
       -0.03340747, -0.10892462,  0.08592047, -0.25973948, -0.13652333,
        0.29521669, -0.30356495,  0.25090528,  0.03745543,  0.27744282,
       -0.05828996, -0.22619273, -0.22286967, -0.38864965, -0.33984051,
        0.24527944, -0.23858636, -0.20241847,  0.26209233,  0.16592285,
       -0.0692501 , -0.24102665, -0.3293993 , -0.36205287,  0.1472432 ,
       -0.35581485,  0.23450013, -0.29436977, -0.25344666,  0.31894569,
       -0.40030664,  0.21805523, -0.35270905,  0.39866461, -0.13211033,
       -0.13464466, -0.04075297,  0.0939239 , -0.1034243 , -0.16778267,
        0.1276572 , -0.3174877 , -0.11936896, -0.08304427, -0.20484837,
       -0.39216118, -0.18717822, -0.33681754, -0.15565863,  0.02

In [49]:
all_weights[:] = all_weights * 1000
all_weights

array([-409.69215648,   41.33855436, -561.28029172,  454.62498048,
        560.30043656,  460.00465695,  377.21152706, -102.79190882,
       -443.68134716,  213.54961215,  533.8828659 ,  357.57322322,
         60.17105201, -449.49075751, -149.13743997,  364.42311011,
        201.95073497,   99.52844231,  246.51934829,   84.88891371,
        -33.40747358, -108.92461655,   85.92047304, -259.73947833,
       -136.52333234,  295.21668576, -303.5649511 ,  250.90527968,
         37.4554331 ,  277.44281766,  -58.28996436, -226.19272531,
       -222.86967345, -388.6496502 , -339.8405063 ,  245.27943533,
       -238.58636447, -202.41846879,  262.09232613,  165.92285224,
        -69.25009901, -241.02665242, -329.39930408, -362.05287199,
        147.24319503, -355.81485225,  234.50013269, -294.36977466,
       -253.44666309,  318.94568958, -400.3066394 ,  218.0552266 ,
       -352.70904963,  398.66460516, -132.1103318 , -134.64465567,
        -40.75296678,   93.92389691, -103.42430448, -167.78266

In [50]:
Ws

[array([[-409.69215648,   41.33855436, -561.28029172,  454.62498048,
          560.30043656],
        [ 460.00465695,  377.21152706, -102.79190882, -443.68134716,
          213.54961215],
        [ 533.8828659 ,  357.57322322,   60.17105201, -449.49075751,
         -149.13743997]]),
 array([[ 364.42311011,  201.95073497,   99.52844231,  246.51934829,
           84.88891371],
        [ -33.40747358, -108.92461655,   85.92047304, -259.73947833,
         -136.52333234],
        [ 295.21668576, -303.5649511 ,  250.90527968,   37.4554331 ,
          277.44281766],
        [ -58.28996436, -226.19272531, -222.86967345, -388.6496502 ,
         -339.8405063 ],
        [ 245.27943533, -238.58636447, -202.41846879,  262.09232613,
          165.92285224],
        [ -69.25009901, -241.02665242, -329.39930408, -362.05287199,
          147.24319503]]),
 array([[-355.81485225,  234.50013269, -294.36977466, -253.44666309],
        [ 318.94568958, -400.3066394 ,  218.0552266 , -352.70904963],
        [ 

In [51]:
Ws[-1][:] = np.zeros_like(Ws[-1])
Ws

[array([[-409.69215648,   41.33855436, -561.28029172,  454.62498048,
          560.30043656],
        [ 460.00465695,  377.21152706, -102.79190882, -443.68134716,
          213.54961215],
        [ 533.8828659 ,  357.57322322,   60.17105201, -449.49075751,
         -149.13743997]]),
 array([[ 364.42311011,  201.95073497,   99.52844231,  246.51934829,
           84.88891371],
        [ -33.40747358, -108.92461655,   85.92047304, -259.73947833,
         -136.52333234],
        [ 295.21668576, -303.5649511 ,  250.90527968,   37.4554331 ,
          277.44281766],
        [ -58.28996436, -226.19272531, -222.86967345, -388.6496502 ,
         -339.8405063 ],
        [ 245.27943533, -238.58636447, -202.41846879,  262.09232613,
          165.92285224],
        [ -69.25009901, -241.02665242, -329.39930408, -362.05287199,
          147.24319503]]),
 array([[-355.81485225,  234.50013269, -294.36977466, -253.44666309],
        [ 318.94568958, -400.3066394 ,  218.0552266 , -352.70904963],
        [ 

In [52]:
all_weights

array([-409.69215648,   41.33855436, -561.28029172,  454.62498048,
        560.30043656,  460.00465695,  377.21152706, -102.79190882,
       -443.68134716,  213.54961215,  533.8828659 ,  357.57322322,
         60.17105201, -449.49075751, -149.13743997,  364.42311011,
        201.95073497,   99.52844231,  246.51934829,   84.88891371,
        -33.40747358, -108.92461655,   85.92047304, -259.73947833,
       -136.52333234,  295.21668576, -303.5649511 ,  250.90527968,
         37.4554331 ,  277.44281766,  -58.28996436, -226.19272531,
       -222.86967345, -388.6496502 , -339.8405063 ,  245.27943533,
       -238.58636447, -202.41846879,  262.09232613,  165.92285224,
        -69.25009901, -241.02665242, -329.39930408, -362.05287199,
        147.24319503, -355.81485225,  234.50013269, -294.36977466,
       -253.44666309,  318.94568958, -400.3066394 ,  218.0552266 ,
       -352.70904963,  398.66460516, -132.1103318 , -134.64465567,
        -40.75296678,   93.92389691, -103.42430448, -167.78266

In [None]:
if all_weights:
    all_gradients = np.zeros_like(all_weights)
    grad_V = all_gradients[:n_V].reshape(Vshape)
    grad_W = all_gradients[n_V:].reshape(Wshape) 

In [None]:
if not all_weights:
    all_weights = np.random.uniform(-1, 1, n_weights)

Ws = []
first_index = 0
for Wshape in Wshapes:
    last_index = first_index + np.prod(Wshape)
    nin = Wshape[0]
    W = all_weights[first_index:last_index].reshape(Wshape)
    W /= np.sqrt(nin)
    Ws.append(W)
    first_index = last_index

In [55]:
if not all_test:
    print("test doesn't exist")
else:
    print("test exists")

NameError: name 'all_test' is not defined