# Collecting All Weights Into a Vector

In [2]:
%load_ext autoreload
%autoreload 2

topics = ['Creating the Weight Vector',
          'Create Weight Vector and All Weight Matrices in Same Memory Locations',
          'Assign Changes to Weight Vector Without Creating a Copy',
          'Creating Weights for a Neural Network']

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Creating the Weight Vector

To use general-purpose optimization functions, like the SGD, AdamW, and SCG functions discussed last time, we must create a vector (one-dimensional) containing all of the weights in our network.  This vector will be passed as an argument to optimization functions as the parameters to be modified.

Can we just convert our list of weight matrices (`self.Ws`) into a list of flattened matrices?  Let's try that.

In [3]:
import numpy as np

Imagine we have a network with 4 inputs, 2 units in a single hidden layer, and 10 units in the output layer.  We would instantiate this with

        nnet = NeuralNetwork(4, [2], 10)

This network will contain two weight matrices of shapes $5 \times 2$ and $3 \times 10$, right?

In [4]:
W1shape = (5, 2)
W2shape = (3, 10)

In [5]:
n_weights_W1 = np.prod(W1shape)
n_weights_W2 = np.prod(W2shape)
n_weights_W1, n_weights_W2

(10, 30)

In [6]:
W1 = np.arange(0, n_weights_W1).reshape(W1shape)
W2 = (-np.arange(0, n_weights_W2)).reshape(W2shape)

In [7]:
W1, W2

(array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]),
 array([[  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9],
        [-10, -11, -12, -13, -14, -15, -16, -17, -18, -19],
        [-20, -21, -22, -23, -24, -25, -26, -27, -28, -29]]))

There are several ways to "flatten" a numpy array.

In [8]:
W1.flatten()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
W1.ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]:
W1.reshape(-1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
W1.flatten?

[0;31mDocstring:[0m
a.flatten(order='C')

Return a copy of the array collapsed into one dimension.

Parameters
----------
order : {'C', 'F', 'A', 'K'}, optional
    'C' means to flatten in row-major (C-style) order.
    'F' means to flatten in column-major (Fortran-
    style) order. 'A' means to flatten in column-major
    order if `a` is Fortran *contiguous* in memory,
    row-major order otherwise. 'K' means to flatten
    `a` in the order the elements occur in memory.
    The default is 'C'.

Returns
-------
y : ndarray
    A copy of the input array, flattened to one dimension.

See Also
--------
ravel : Return a flattened array.
flat : A 1-D flat iterator over the array.

Examples
--------
>>> a = np.array([[1,2], [3,4]])
>>> a.flatten()
array([1, 2, 3, 4])
>>> a.flatten('F')
array([1, 3, 2, 4])
[0;31mType:[0m      builtin_function_or_method

In [13]:
W1.ravel?

[0;31mDocstring:[0m
a.ravel([order])

Return a flattened array.

Refer to `numpy.ravel` for full documentation.

See Also
--------
numpy.ravel : equivalent function

ndarray.flat : a flat iterator on the array.
[0;31mType:[0m      builtin_function_or_method

Let's use `flatten`, the method with the most understandable name.  We can collect these into a one-dimensional vector using `np.hstack`.

In [15]:
all_weights = np.hstack((W1.flatten(), W2.flatten()))
all_weights

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   0,  -1,  -2,
        -3,  -4,  -5,  -6,  -7,  -8,  -9, -10, -11, -12, -13, -14, -15,
       -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28,
       -29])

When our optimization function updates these values, we want to check that our original weight matrices `W1` and `W2` are also modified.  This is because our `_forward` and `_gradient` functions will still be using `W1` and `W2`, that are stored in our `self.Ws` list.

In [16]:
all_weights[0] = 9999
all_weights

array([9999,    1,    2,    3,    4,    5,    6,    7,    8,    9,    0,
         -1,   -2,   -3,   -4,   -5,   -6,   -7,   -8,   -9,  -10,  -11,
        -12,  -13,  -14,  -15,  -16,  -17,  -18,  -19,  -20,  -21,  -22,
        -23,  -24,  -25,  -26,  -27,  -28,  -29])

In [17]:
W1, W2

(array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]),
 array([[  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9],
        [-10, -11, -12, -13, -14, -15, -16, -17, -18, -19],
        [-20, -21, -22, -23, -24, -25, -26, -27, -28, -29]]))

Rats!  The memory location for the vector `all_weights` is not the same memory location as `W1` and `W2`.

This is because the `np.hstack` function returns a `numpy` array, which allocates contiguous memory locations for all elements. `W1` and `W2` are not in contiguous memory locations, so `np.hstack` returns a `numpy` array not in the same memory locations as `W1` and `W2`.

What to do, what to do......

## Create Weight Vector and All Weight Matrices in Same Memory Locations

Maybe we can allocate `all_weights` first, then assign `W1` and `W2` to refer to memory locations within `all_weights`.  We can do this indexing into `all_weights` starting at the correct index and extending for as many weights as are in `W1` and `W2`.

In [18]:
# This points W1 and W2 to the actual memory addresses defined by all_weights? I thought this would just copy the values of all_weights in some other memory location
# determined by the python VM
W1 = all_weights[0:n_weights_W1]
W2 = all_weights[n_weights_W1:n_weights_W1 + n_weights_W2]
W1, W2

(array([9999,    1,    2,    3,    4,    5,    6,    7,    8,    9]),
 array([  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9, -10, -11, -12,
        -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25,
        -26, -27, -28, -29]))

Now, hopefully if we reshape them the way we want them they still refer to the same memory locations.  We can test this by reshaping each weight matrix, then assign a new value to `all_weights` and see of `W1` and `W2` have the changed values.

In [19]:
W1 = W1.reshape(W1shape)
W2 = W2.reshape(W2shape)
W1, W2

(array([[9999,    1],
        [   2,    3],
        [   4,    5],
        [   6,    7],
        [   8,    9]]),
 array([[  0,  -1,  -2,  -3,  -4,  -5,  -6,  -7,  -8,  -9],
        [-10, -11, -12, -13, -14, -15, -16, -17, -18, -19],
        [-20, -21, -22, -23, -24, -25, -26, -27, -28, -29]]))

In [20]:
all_weights *= 2
all_weights

array([19998,     2,     4,     6,     8,    10,    12,    14,    16,
          18,     0,    -2,    -4,    -6,    -8,   -10,   -12,   -14,
         -16,   -18,   -20,   -22,   -24,   -26,   -28,   -30,   -32,
         -34,   -36,   -38,   -40,   -42,   -44,   -46,   -48,   -50,
         -52,   -54,   -56,   -58])

In [21]:
W1, W2

(array([[19998,     2],
        [    4,     6],
        [    8,    10],
        [   12,    14],
        [   16,    18]]),
 array([[  0,  -2,  -4,  -6,  -8, -10, -12, -14, -16, -18],
        [-20, -22, -24, -26, -28, -30, -32, -34, -36, -38],
        [-40, -42, -44, -46, -48, -50, -52, -54, -56, -58]]))

Yippee! This works!

## Assign Changes to Weight Vector Without Creating a Copy

But, watch out.  We must make sure the optimization steps are updating `all_weights` in place, and not making a new version of `all_weights`, which would break the correspondence between `all_weights` and `W1` and `W2`.

Using `*=` worked.  Here is another way of assigning changes that also works.

In [22]:
# `[:]` tells python to update all_weights in place instead of making a copy of all_weights

all_weights[:] = all_weights * -4

In [23]:
all_weights

array([-79992,     -8,    -16,    -24,    -32,    -40,    -48,    -56,
          -64,    -72,      0,      8,     16,     24,     32,     40,
           48,     56,     64,     72,     80,     88,     96,    104,
          112,    120,    128,    136,    144,    152,    160,    168,
          176,    184,    192,    200,    208,    216,    224,    232])

In [24]:
W1, W2

(array([[-79992,     -8],
        [   -16,    -24],
        [   -32,    -40],
        [   -48,    -56],
        [   -64,    -72]]),
 array([[  0,   8,  16,  24,  32,  40,  48,  56,  64,  72],
        [ 80,  88,  96, 104, 112, 120, 128, 136, 144, 152],
        [160, 168, 176, 184, 192, 200, 208, 216, 224, 232]]))

But here is one form that does make a new copy of `all_weights`.

In [25]:
# interesting, did not realize this creates a NEW COPY of all_weights in python...

all_weights = all_weights + 0.1

In [26]:
all_weights

array([-7.99919e+04, -7.90000e+00, -1.59000e+01, -2.39000e+01,
       -3.19000e+01, -3.99000e+01, -4.79000e+01, -5.59000e+01,
       -6.39000e+01, -7.19000e+01,  1.00000e-01,  8.10000e+00,
        1.61000e+01,  2.41000e+01,  3.21000e+01,  4.01000e+01,
        4.81000e+01,  5.61000e+01,  6.41000e+01,  7.21000e+01,
        8.01000e+01,  8.81000e+01,  9.61000e+01,  1.04100e+02,
        1.12100e+02,  1.20100e+02,  1.28100e+02,  1.36100e+02,
        1.44100e+02,  1.52100e+02,  1.60100e+02,  1.68100e+02,
        1.76100e+02,  1.84100e+02,  1.92100e+02,  2.00100e+02,
        2.08100e+02,  2.16100e+02,  2.24100e+02,  2.32100e+02])

In [27]:
W1, W2

(array([[-79992,     -8],
        [   -16,    -24],
        [   -32,    -40],
        [   -48,    -56],
        [   -64,    -72]]),
 array([[  0,   8,  16,  24,  32,  40,  48,  56,  64,  72],
        [ 80,  88,  96, 104, 112, 120, 128, 136, 144, 152],
        [160, 168, 176, 184, 192, 200, 208, 216, 224, 232]]))

The lesson here is, if you are assigning changes to `all_weights`, or to `W1` and `W2`, always use `[:]` on the left side of the assignment!!

## Creating Weights for a Neural Network

So, how should we create our weights in our `NeuralNetwork` class, to handle any number of layers?  Let's say we are in the constructor for `NeuralNetwork` and have `n_inputs`, `n_hiddens_each_layer` and `n_outputs` available. First, define all of the weight matrices' shapes.


In [28]:
n_inputs = 2
n_hiddens_each_layer = [5, 5, 4]
n_outputs = 1

In [29]:
ni = n_inputs
Wshapes = []
for nh in n_hiddens_each_layer:
    Wshapes.append((1 + ni, nh))
    ni = nh
Wshapes.append((1 + ni, n_outputs))

Wshapes    

[(3, 5), (6, 5), (6, 4), (5, 1)]

Now we can create the vector of all weights after adding up the number of weights in each layer.

In [30]:
[np.prod(Wshape) for Wshape in Wshapes]

[15, 30, 24, 5]

In [31]:
n_weights = np.sum([np.prod(Wshape) for Wshape in Wshapes])
n_weights

74

In [32]:
all_weights = np.random.uniform(-1, 1, n_weights)
all_weights

array([-0.15654574,  0.96120866, -0.25053158, -0.03625408,  0.96800706,
        0.03285966,  0.93657656, -0.40201123,  0.20601533, -0.66031644,
        0.56540737,  0.75327219,  0.96051533, -0.88015158, -0.55036538,
        0.28530437, -0.73984187, -0.33085502, -0.96054889, -0.26604115,
        0.00393425,  0.82922681,  0.19813682,  0.99696659,  0.82435952,
        0.30602155,  0.85149758, -0.04595768,  0.86259388,  0.14264346,
       -0.94903976, -0.13988279, -0.06436105, -0.89970127, -0.77246432,
        0.98438315, -0.42963167, -0.12062856, -0.62122816, -0.91923901,
        0.32361139,  0.30342279, -0.45426632, -0.59299987, -0.61042332,
       -0.49448293, -0.68416294,  0.4593618 ,  0.55820322, -0.25462592,
       -0.27116073,  0.69489367, -0.44807672, -0.63664485,  0.13730019,
       -0.43905114, -0.00466345, -0.03382306, -0.50644977, -0.09369954,
        0.23068636, -0.4794545 , -0.87695789,  0.36214994, -0.70508442,
       -0.59714267, -0.58459673,  0.99346046, -0.97779107,  0.88

Now we need to define our list of weight matrices for each layer as views into this vector.

In [33]:
Ws = []
first_index = 0
for Wshape in Wshapes:
    last_index = first_index + np.prod(Wshape)
    nin = Wshape[0]
    W = all_weights[first_index:last_index].reshape(Wshape) / np.sqrt(nin)
    Ws.append(W)
    first_index = last_index

Ws

[array([[-0.09038172,  0.55495408, -0.14464447, -0.0209313 ,  0.55887914],
        [ 0.01897153,  0.54073273, -0.23210129,  0.118943  , -0.38123387],
        [ 0.3264381 ,  0.4349019 ,  0.55455379, -0.50815575, -0.3177536 ]]),
 array([[ 0.11647502, -0.30203918, -0.135071  , -0.39214244, -0.10861084],
        [ 0.00160615,  0.33853043,  0.08088902,  0.40700991,  0.33654337],
        [ 0.12493277,  0.34762243, -0.01876214,  0.35215248,  0.05823395],
        [-0.38744386, -0.05710691, -0.02627529, -0.36730151, -0.31535724],
        [ 0.40187274, -0.17539639, -0.0492464 , -0.25361533, -0.37527776],
        [ 0.1321138 ,  0.12387183, -0.18545345, -0.24209118, -0.24920428]]),
 array([[-0.20187181, -0.27930835,  0.18753367,  0.22788551],
        [-0.1039506 , -0.11070091,  0.28368915, -0.18292656],
        [-0.25990917,  0.05605257, -0.17924188, -0.00190385],
        [-0.01380821, -0.20675725, -0.03825268,  0.09417731],
        [-0.19573648, -0.35801656,  0.14784709, -0.28784951],
        [-0

In [34]:
all_weights

array([-0.15654574,  0.96120866, -0.25053158, -0.03625408,  0.96800706,
        0.03285966,  0.93657656, -0.40201123,  0.20601533, -0.66031644,
        0.56540737,  0.75327219,  0.96051533, -0.88015158, -0.55036538,
        0.28530437, -0.73984187, -0.33085502, -0.96054889, -0.26604115,
        0.00393425,  0.82922681,  0.19813682,  0.99696659,  0.82435952,
        0.30602155,  0.85149758, -0.04595768,  0.86259388,  0.14264346,
       -0.94903976, -0.13988279, -0.06436105, -0.89970127, -0.77246432,
        0.98438315, -0.42963167, -0.12062856, -0.62122816, -0.91923901,
        0.32361139,  0.30342279, -0.45426632, -0.59299987, -0.61042332,
       -0.49448293, -0.68416294,  0.4593618 ,  0.55820322, -0.25462592,
       -0.27116073,  0.69489367, -0.44807672, -0.63664485,  0.13730019,
       -0.43905114, -0.00466345, -0.03382306, -0.50644977, -0.09369954,
        0.23068636, -0.4794545 , -0.87695789,  0.36214994, -0.70508442,
       -0.59714267, -0.58459673,  0.99346046, -0.97779107,  0.88

Uh oh.  Not the same values!!!  What happened?

Right.  Somewhere we performed an operation that caused a copy to be made.

Must be the division.  Let's rewrite that code using `/='.

In [35]:
all_weights = np.random.uniform(-1, 1, n_weights)

Ws = []
first_index = 0
for Wshape in Wshapes:
    last_index = first_index + np.prod(Wshape)
    nin = Wshape[0]
    W = all_weights[first_index:last_index].reshape(Wshape)
    W /= np.sqrt(nin)
    Ws.append(W)
    first_index = last_index

Ws

[array([[ 0.55482472,  0.21196814, -0.04102957,  0.54717722,  0.38325743],
        [-0.13055778, -0.04129885,  0.51764554, -0.36024548,  0.1189937 ],
        [ 0.07805396, -0.20025587,  0.39225332, -0.38397451, -0.47386045]]),
 array([[-0.13324139, -0.04271728,  0.17417263,  0.39537936,  0.29157843],
        [-0.09444314, -0.32704493,  0.20960864,  0.23566403, -0.07523794],
        [ 0.05485057,  0.24930347,  0.19740761, -0.21045541, -0.38031441],
        [-0.01896536,  0.26676653, -0.17888851,  0.30758503,  0.39504638],
        [-0.27033355,  0.10681996,  0.38040939,  0.18304744, -0.18299257],
        [ 0.0471133 ,  0.28594631,  0.39256469, -0.15323063, -0.37682797]]),
 array([[ 0.280223  , -0.36887087,  0.0558062 , -0.01357118],
        [ 0.0921761 ,  0.00204856, -0.32634632,  0.36224726],
        [ 0.24847209,  0.30070079, -0.29244924, -0.0287169 ],
        [-0.27657221,  0.1874864 , -0.26777313,  0.39359312],
        [-0.28939249,  0.00515269,  0.28330201, -0.2056803 ],
        [-0

In [36]:
all_weights

array([ 0.55482472,  0.21196814, -0.04102957,  0.54717722,  0.38325743,
       -0.13055778, -0.04129885,  0.51764554, -0.36024548,  0.1189937 ,
        0.07805396, -0.20025587,  0.39225332, -0.38397451, -0.47386045,
       -0.13324139, -0.04271728,  0.17417263,  0.39537936,  0.29157843,
       -0.09444314, -0.32704493,  0.20960864,  0.23566403, -0.07523794,
        0.05485057,  0.24930347,  0.19740761, -0.21045541, -0.38031441,
       -0.01896536,  0.26676653, -0.17888851,  0.30758503,  0.39504638,
       -0.27033355,  0.10681996,  0.38040939,  0.18304744, -0.18299257,
        0.0471133 ,  0.28594631,  0.39256469, -0.15323063, -0.37682797,
        0.280223  , -0.36887087,  0.0558062 , -0.01357118,  0.0921761 ,
        0.00204856, -0.32634632,  0.36224726,  0.24847209,  0.30070079,
       -0.29244924, -0.0287169 , -0.27657221,  0.1874864 , -0.26777313,
        0.39359312, -0.28939249,  0.00515269,  0.28330201, -0.2056803 ,
       -0.04581822,  0.26251636,  0.2342125 , -0.1866267 ,  0.03

In [37]:
all_weights[:] = all_weights * 1000
all_weights

array([ 554.82471734,  211.96813858,  -41.02956511,  547.17721921,
        383.25743166, -130.55778403,  -41.29884547,  517.64553959,
       -360.24547836,  118.99369767,   78.05396359, -200.25587384,
        392.25331556, -383.97451335, -473.86044704, -133.24138574,
        -42.71727963,  174.17263225,  395.37935719,  291.57843443,
        -94.44314318, -327.04493061,  209.60864004,  235.66402873,
        -75.23794409,   54.85056571,  249.30346563,  197.40761087,
       -210.45541211, -380.31440707,  -18.96536183,  266.76652964,
       -178.88851426,  307.58502824,  395.04638442, -270.33355285,
        106.8199555 ,  380.40939417,  183.04744261, -182.99257047,
         47.11329784,  285.94631212,  392.56469254, -153.23063299,
       -376.8279704 ,  280.22300432, -368.87086977,   55.80619869,
        -13.57118226,   92.17609881,    2.04856444, -326.34631733,
        362.24725959,  248.47209014,  300.70078912, -292.4492415 ,
        -28.71690399, -276.57220698,  187.48640303, -267.77312

In [38]:
Ws

[array([[ 554.82471734,  211.96813858,  -41.02956511,  547.17721921,
          383.25743166],
        [-130.55778403,  -41.29884547,  517.64553959, -360.24547836,
          118.99369767],
        [  78.05396359, -200.25587384,  392.25331556, -383.97451335,
         -473.86044704]]),
 array([[-133.24138574,  -42.71727963,  174.17263225,  395.37935719,
          291.57843443],
        [ -94.44314318, -327.04493061,  209.60864004,  235.66402873,
          -75.23794409],
        [  54.85056571,  249.30346563,  197.40761087, -210.45541211,
         -380.31440707],
        [ -18.96536183,  266.76652964, -178.88851426,  307.58502824,
          395.04638442],
        [-270.33355285,  106.8199555 ,  380.40939417,  183.04744261,
         -182.99257047],
        [  47.11329784,  285.94631212,  392.56469254, -153.23063299,
         -376.8279704 ]]),
 array([[ 280.22300432, -368.87086977,   55.80619869,  -13.57118226],
        [  92.17609881,    2.04856444, -326.34631733,  362.24725959],
        [ 