# Neural Nets
Version 0.3, in `nn`

Should do [Working efficiently with jupyter lab](https://florianwilhelm.info/2018/11/working_efficiently_with_jupyter_lab/)

When this was a notebook with integrated tests, we did: \
`
%load_ext autoreload
%autoreload 2
%matplotlib widget
#%matplotlib inline`

In [1]:
import numpy as np

A network built of components which:
1. accept an ordered set of reals (we'll use `numpy.array`, and  call them vectors) at the input port and produce another at the output port - this is forward propagation. ${\displaystyle f\colon \mathbf {R} ^{n}\to \mathbf {R} ^{m}}$
1. accept an ordered set of reals at the output port, representing the gradient of the loss function at the output, and produce the gradient of the loss function at the input port - this is back propagation, aka backprop. ${\displaystyle b\colon \mathbf {R} ^{m}\to \mathbf {R} ^{n}}$
1. from the gradient of the loss function at the output, calculate the partial of the loss function w.r.t the internal parameters ${\displaystyle \frac{\partial E}{\partial w} }$
1. accept a scalar $\eta$ to control the adjustment of internal parameters. _Or is this effected by scaling the loss gradient before passing? YES_
1. update internal parameters ${\displaystyle w \leftarrow w - \eta \frac{\partial E}{\partial w} }$


In [2]:
class Layer:
    def __init__(self):
        pass
    
    def __call__(self, x):
        """Compute response to input"""
        raise NotImplementedError
        
    def backprop(self, output_delE):
        """Use output error gradient to adjust internal parameters, return gradient of error at input"""
        raise NotImplementedError
        
    def state_vector(self):
        """Provide the layer's learnable state as a vector"""
        raise NotImplementedError

    def set_state_from_vector(self, sv):
        """Set the layer's learnable state from a vector"""
        raise NotImplementedError

A network built of a cascade of layers:

In [3]:
class Network:
    def __init__(self):
        self.layers = []
        self.eta = 0.1 #FIXME
        
    def extend(self, net):
        self.layers.append(net)
        
    def __call__(self, input):
        v = input
        for net in self.layers:
            v = net(v)
        return v
    
    def learn(self, facts, eta=None):
        self.eta = eta or self.eta
        for (x, expected) in facts:
            y = self(x)
            e = y - expected
            #loss = float(e.dot(e.T))/2.0
            loss = np.einsum('ij,ij', e, e)/2.0
            egrad = e * self.eta
            for net in reversed(self.layers):
                egrad = net.backprop(egrad)
        return loss

    def state_vector(self):
        """Provide the network's learnable state as a vector"""
        return np.concatenate([layer.state_vector() for layer in self.layers])
    
    def set_state_from_vector(self, sv):
        """Set the layer's learnable state from a vector"""
        i = 0
        for layer in self.layers:
            lsvlen = len(layer.state_vector())
            layer.set_state_from_vector(sv[i:i+lsvlen])
            i += lsvlen

___

## Useful Layers

### Identify

In [4]:
class IdentityLayer(Layer):
    def __call__(self, x):
        return x
    
    def backprop(self, output_delE):
        return output_delE

    def state_vector(self):
        return np.array([])
    
    def set_state_from_vector(self, sv):
        pass

### Affine
A layer that does an [affine transformation](https://mathworld.wolfram.com/AffineTransformation.html) aka affinity, which is the classic fully-connected layer with output offsets.

$$ \mathbf{M} \mathbf{x} + \mathbf{b} = \mathbf{y} $$
where
$$
\mathbf{x} = \sum_{j=1}^{n} x_j \mathbf{\hat{x}}_j \\
\mathbf{b} = \sum_{i=1}^{m} b_i \mathbf{\hat{y}}_i \\
\mathbf{y} = \sum_{i=1}^{m} y_i \mathbf{\hat{y}}_i
$$
and $\mathbf{M}$ can be written
$$
\begin{bmatrix}
    m_{1,1} & \dots & m_{1,n} \\
    \vdots & \ddots & \vdots \\
    m_{m,1} & \dots & m_{m,n}
\end{bmatrix} \\
$$

#### Error gradient back-propagation
$$ 
\begin{align}
 \frac{\partial loss}{\partial\mathbf{x}}
  &= \frac{\partial loss}{\partial\mathbf{y}} \frac{\partial\mathbf{y}}{\partial\mathbf{x}} \\
  &= \mathbf{M}^\mathsf{T}\frac{\partial loss}{\partial\mathbf{y}}
\end{align}
$$

#### Parameter adjustment
$$
 \frac{\partial loss}{\partial\mathbf{M}}
 = \frac{\partial loss}{\partial\mathbf{y}} \frac{\partial\mathbf{y}}{\partial\mathbf{M}}
 = \frac{\partial loss}{\partial\mathbf{y}} \mathbf{x} \\
 \frac{\partial loss}{\partial\mathbf{b}}
 = \frac{\partial loss}{\partial\mathbf{y}} \frac{\partial\mathbf{y}}{\partial\mathbf{b}}
 = \frac{\partial loss}{\partial\mathbf{y}}
$$

#### Adapting to `numpy`

In `numpy` it is more convenient to use row vectors, particularly for calculating the transform on multiple inputs in one operation. We use the identity $ \mathbf{M} \mathbf{x} = (\mathbf{x} \mathbf{M}^\mathsf{T})^\mathsf{T}.$ To avoid cluttering names, we will use `M` in the code below to hold $\mathbf{M}^\mathsf{T}$.

In [5]:
class AffineLayer(Layer):
    """An affine transformation, which is the classic fully-connected layer with offsets.
    
    The layer has n inputs and m outputs, which numbers must be supplied
    upon creation. The inputs and outputs are marshalled in numpy arrays, 1-D
    in the case of a single calculation, and 2-D when calculating the outputs
    of multiple inputs in one call.
    If called with 1-D array having shape == (n,), e.g numpy.arange(n), it will
    return a 1-D numpy array of shape (m,).
    If called with a 2-D numpy array, input shall have shape (k,n) and will return
    a 2-D numpy array of shape (k,m), suitable as input to a subsequent layer
    that has input width m.
    """
    def __init__(self, n, m):
        self.M = np.empty((n, m))
        self.b = np.empty(m)
        self.randomize()
        
    def randomize(self):
        self.M[:] = np.random.randn(*self.M.shape)
        self.b[:] = np.random.randn(*self.b.shape)
        
    def __call__(self, x):
        self.input = x
        self.output = x @ self.M + self.b
        return self.output
    
    def backprop(self, output_delE):
        input_delE = output_delE @ self.M.T
        o_delE = np.atleast_2d(output_delE)
        self.M -= np.einsum('ki,kj->ji', o_delE, np.atleast_2d(self.input))
        self.b -= np.sum(o_delE, 0)       
        return input_delE

    def state_vector(self):
        return np.concatenate((self.M.ravel(), self.b.ravel()))
    
    def set_state_from_vector(self, sv):
        """Set the layer's learnable state from a vector"""
        l_M = len(self.M.ravel())
        l_b = len(self.b.ravel())
        self.M[:] = sv[:l_M].reshape(self.M.shape)
        self.b[:] = sv[l_M : l_M + l_b].reshape(self.b.shape)

### Map
Maps a scalar function on the inputs, for e.g. activation layers.

In [6]:
class MapLayer(Layer):
    """Map a scalar function on the input taken element-wise"""
    def __init__(self, fun, dfundx):
        self.vfun = np.vectorize(fun)
        self.vdfundx = np.vectorize(dfundx)

    def __call__(self, x):
        self.input = x
        return self.vfun(x)
    
    def backprop(self, output_delE):
        input_delE = self.vdfundx(self.input) * output_delE
        return input_delE

    def state_vector(self):
        return np.array([])
    
    def set_state_from_vector(self, sv):
        pass

---

# Tests
*Dangerously incomplete* \
Mostly `unittest` the `.py` version with a separate test script, see `test-nn_v3.py`.

Make a few test arrays:

In [7]:
if __name__ == '__main__':
    two_wide = np.arange(2*4).reshape(-1,2)
    print(f"two_wide is:\n{two_wide}")
    three_wide = np.arange(3*4).reshape(-1,3)
    print(f"three_wide is:\n{three_wide}\n")

two_wide is:
[[0 1]
 [2 3]
 [4 5]
 [6 7]]
three_wide is:
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]



## Tooling

In [8]:
if __name__ == '__main__':
    class VC():
        def grad(f, x, eps=1e-6):
            epsihat = np.eye(x.size) * eps
            yp = np.apply_along_axis(f, 1, x + epsihat)
            ym = np.apply_along_axis(f, 1, x - epsihat)
            return (yp - ym)/(2 * eps)
        
    def closenuf(a, b, places=4):
        return (np.around(a, places) == np.around(b, places)).all()

## Identity layer

In [9]:
if __name__ == '__main__':
    iL = IdentityLayer()
    
    # It's transparent from input to output
    assert np.equal(iL(np.arange(5)), np.arange(5)).all()
    
    # It back-propagates the loss gradient without alteration
    assert np.equal(iL.backprop(np.arange(7)), np.arange(7)).all()
    assert np.equal(iL(two_wide), two_wide).all()

## Map layer

In [10]:
if __name__ == '__main__':
    mL = MapLayer(lambda x:x**2, lambda d:2*d)
    x = np.array([1,2,2])
    y = mL(x)
    
    # It applies the forward transformation
    #assert np.equal(y, np.array([49,9,121])).all()
    
    # It back-propagages the loss gradient
    ideal = np.array([2,3,5])
    loss = lambda v: (v - ideal).dot(v - ideal) / 2.0
    loss_at_y = loss(y)
    print(f"x = {x}, y = {y}, loss at y = {loss_at_y}")
    grad_y = VC.grad(loss, y)
    print(f"∇𝑙𝑜𝑠𝑠(𝑦) = {grad_y}")
    grad_x = VC.grad(lambda x:loss(mL(x)), x)
    print(f"∇𝑙𝑜𝑠𝑠(𝑥) = {grad_x}")
    
    # See if the backprop does the same
    _ = mL(x) # Make sure the last x is in the right place
    in_delE = mL.backprop(grad_y)
    print(f"backprop({grad_y}) = {in_delE}")
    assert closenuf(in_delE, grad_x)

x = [1 2 2], y = [1 4 4], loss at y = 1.5
∇𝑙𝑜𝑠𝑠(𝑦) = [-1.  1. -1.]
∇𝑙𝑜𝑠𝑠(𝑥) = [-2.  4. -4.]
backprop([-1.  1. -1.]) = [-2.  4. -4.]


## Affine layer

In [11]:
if __name__ == '__main__':
    # Affine
    a = AffineLayer(2,3)
    
    # Input and output widths
    assert a(np.arange(2)).shape == (3,) 

    # Can set internal state
    a.set_state_from_vector(np.arange(9))
    assert np.equal(a.M, np.array([[0, 1, 2],
                                   [3, 4, 5]])).all()
    assert np.equal(a.b, np.array([6, 7, 8])).all()

    # Let's find the internal state using numerical gradient
    x = np.random.rand(2)
    y = a(x)
    dydx = VC.grad(a, x)
    b = y - x.dot(dydx)
    #print(dydx, b)
    #print(dydx, np.arange(6).reshape(2,-1))
    assert closenuf(dydx, np.arange(6).reshape(2, -1))
    #print(b, np.arange(6, 9))
    assert closenuf(b, np.arange(6, 9))
    
    # Single-input calculation
    x = np.array([2, 1])
    y = a(x)
    #print(f"a.M is:\n{a.M}\na.b is {a.b}\nx is: {x}\ny is: {y}\n")
    assert np.equal(y, np.array([9, 13, 17])).all()

In [12]:
if __name__ == '__main__':
    # Affine
    a = AffineLayer(2,3)
    a.set_state_from_vector(np.arange(9))

    # Single-input calculation
    x = np.array([2, 1])
    y = a(x)
    assert np.equal(y, np.array([9, 13, 17])).all()
    #print(f"a.M is:\n{a.M}\na.b is {a.b}\nx is: {x}\ny is: {y}\n")

    # It back-propagages the loss gradient
    ideal = np.array([11,12,10])
    loss = lambda v: (v - ideal).dot(v - ideal) / 2.0
    loss_at_y = loss(y)
    print(f"x = {x}, y = {y}, loss at y = {loss_at_y}")
    grad_y = VC.grad(loss, y)
    print(f"∇𝑙𝑜𝑠𝑠(𝑦) = {grad_y}")
    grad_x = VC.grad(lambda x:loss(a(x)), x)
    print(f"∇𝑙𝑜𝑠𝑠(𝑥) = {grad_x}")
    
    # See if the backprop does the same
    _ = a(x) # Make sure the last x is in the right place
    out_delE = grad_y * 0.1
    in_delE = a.backprop(out_delE)
    print(f"backprop({out_delE}) = {in_delE}")
    assert closenuf(in_delE / 0.1, grad_x)
    
    # And how did the learning affect the layer?
    print(f"Now a({x}) = {a(x)}, loss = {loss(a(x))}")
    print(f"state_vector is {a.state_vector()}")

x = [2 1], y = [ 9. 13. 17.], loss at y = 27.0
∇𝑙𝑜𝑠𝑠(𝑦) = [-2.          1.          7.00000001]
∇𝑙𝑜𝑠𝑠(𝑥) = [15.00000002 33.00000001]
backprop([-0.2  0.1  0.7]) = [1.5 3.3]
Now a([2 1]) = [10.2 12.4 12.8], loss = 4.319999988809314
state_vector is [0.4 0.8 0.6 3.2 3.9 4.3 6.2 6.9 7.3]


In [13]:
if __name__ == '__main__':
    # Affine
    a = AffineLayer(2,3)
    a.set_state_from_vector(np.arange(9))
    
    print(f"a(two_wide) is:\n{a(two_wide)}")
    bp = a.backprop(three_wide * 0.001)
    print(f"bp is:\n{bp}")

    a.set_state_from_vector(np.arange(9))
    x = np.array([[0, 1],
                  [2, 3],
                  [4, 5],
                  [6, 7]])
    assert np.equal(a(x), np.array([[ 9, 11, 13],
                                    [15, 21, 27],
                                    [21, 31, 41],
                                    [27, 41, 55]])).all()
    
    
    print(f"a.M is:\n{a.M}\na.b is {a.b}\nx is: {x}\ny is: {y}")
    
    
    a.set_state_from_vector(np.array([ 2,  3,  5,  7, 11, 13, 17, 19, 23]))
    x = np.array([[29, 31]])
    y = a(x)
    print(f"x is: {x}\ny is: {y}")
    
    # AffineLayer has parameters that learn
    out_grad = np.array([4, 2, 7]) * 0.001
    in_grad = a.backprop(out_grad)
    print(f"in_grad is:\n{in_grad}")
    print(f"a(two_wide) is:\n{a(two_wide)}")
    bp = a.backprop(three_wide * 0.001)
    print(f"bp is:\n{bp}")

a(two_wide) is:
[[ 9. 11. 13.]
 [15. 21. 27.]
 [21. 31. 41.]
 [27. 41. 55.]]
bp is:
[[0.005 0.014]
 [0.014 0.05 ]
 [0.023 0.086]
 [0.032 0.122]]
a.M is:
[[0. 1. 2.]
 [3. 4. 5.]]
a.b is [6. 7. 8.]
x is: [[0 1]
 [2 3]
 [4 5]
 [6 7]]
y is: [ 9. 13. 17.]
x is: [[29 31]]
y is: [[292. 447. 571.]]
in_grad is:
[0.049 0.141]
a(two_wide) is:
[[ 23.872  29.936  35.776]
 [ 41.392  57.696  70.936]
 [ 58.912  85.456 106.096]
 [ 76.432 113.216 141.256]]
bp is:
[[0.012536 0.036504]
 [0.041405 0.128295]
 [0.070274 0.220086]
 [0.099143 0.311877]]


In [14]:
if False and __name__ == '__main__':
    # Affine
    a = AffineLayer(2,3)
    
    # Input and output widths
    assert a(np.arange(2)).shape == (3,) 

    # Can set internal state
    a.set_state_from_vector(np.arange(9))
    assert np.equal(a.M, np.array([[0, 1, 2],
                                   [3, 4, 5]])).all()
    assert np.equal(a.b, np.array([6, 7, 8])).all()

    # Single-input calculation
    x = np.array([2, 1])
    y = a(x)
    assert np.equal(y, np.array([9, 13, 17])).all()
    print(f"a.M is:\n{a.M}\na.b is {a.b}\nx is: {x}\ny is: {y}")

    # AffineLayer has parameters that learn
    out_grad = np.array([5, 3, 2]) * 0.001
    in_grad = a.backprop(out_grad)
    print(f"out_grad is: {out_grad}, in_grad is:{in_grad}")
    print(f"a.M is:\n{a.M}\na.b is {a.b}\nx is: {x}\ny is: {y}")
    
    
    print(f"a(two_wide) is:\n{a(two_wide)}")
    bp = a.backprop(three_wide * 0.001)
    print(f"bp is:\n{bp}")

    a.set_state_from_vector(np.arange(9))
    x = np.array([[0, 1],
                  [2, 3],
                  [4, 5],
                  [6, 7]])
    assert np.equal(a(x), np.array([[ 9, 11, 13],
                                    [15, 21, 27],
                                    [21, 31, 41],
                                    [27, 41, 55]])).all()
    
    
    print(f"a.M is:\n{a.M}\na.b is {a.b}\nx is: {x}\ny is: {y}")
    
    
    a.set_state_from_vector(np.array([ 2,  3,  5,  7, 11, 13, 17, 19, 23]))
    x = np.array([[29, 31]])
    y = a(x)
    print(f"x is: {x}\ny is: {y}")
    
    # AffineLayer has parameters that learn
    out_grad = np.array([4, 2, 7]) * 0.001
    in_grad = a.backprop(out_grad)
    print(f"in_grad is:\n{in_grad}")
    print(f"a(two_wide) is:\n{a(two_wide)}")
    bp = a.backprop(three_wide * 0.001)
    print(f"bp is:\n{bp}")

---

To produce an importable `nn.py`:
1. Save this notebook
1. Uncomment the `jupyter nbconvert` line below
1. Execute it.
1. Comment out the convert again
1. Save the notebook again in that form

In [14]:
###!jupyter nbconvert --to script nn.ipynb

[NbConvertApp] Converting notebook nn.ipynb to script
[NbConvertApp] Writing 9337 bytes to nn.py
