# Neural Nets v2
`nn_v2`

Should do [Working efficiently with jupyter lab](https://florianwilhelm.info/2018/11/working_efficiently_with_jupyter_lab/)

When this was a notebook with integrated tests, we did: \
`
%load_ext autoreload
%autoreload 2
%matplotlib widget
#%matplotlib inline`

In [4]:
# import Importing_Notebooks
import numpy as np

A network built of components which:
1. accept an ordered set of reals (we'll use `numpy.array`, and  call them vectors) at the input port and produce another at the output port - this is forward propagation. ${\displaystyle f\colon \mathbf {R} ^{n}\to \mathbf {R} ^{m}}$
1. accept an ordered set of reals at the output port, representing the gradient of the loss function at the output, and produce the gradient of the loss function at the input port - this is back propagation, aka backprop. ${\displaystyle b\colon \mathbf {R} ^{m}\to \mathbf {R} ^{n}}$
1. from the gradient of the loss function at the output, calculate the partial of the loss function w.r.t the internal parameters ${\displaystyle \frac{\partial E}{\partial w} }$
1. accept a scalar $\eta$ to control the adjustment of internal parameters. _Or is this effected by scaling the loss gradient before passing??_
1. update internal parameters ${\displaystyle w \leftarrow w - \eta \frac{\partial E}{\partial w} }$


In [5]:
class Layer:
    def __init__(self):
        pass
    
    def __call__(self, x):
        """Computes response to input"""
        raise NotImplementedError
        
    def backprop(self, output_delE):
        """Uses output error gradient to adjust internal parameters, and returns gradient of error at input"""
        raise NotImplementedError

A network built of a cascade of layers:

In [6]:
class Network:
    def __init__(self):
        self.layers = []
        self.eta = 0.1 #FIXME
        
    def extend(self, net):
        self.layers.append(net)
        
    def __call__(self, input):
        v = input
        for net in self.layers:
            v = net(v)
        return v
    
    def learn(self, facts):
        for (x, expected) in facts:
            y = self(x)
            e = y - expected
            loss = float(e.dot(e.T))/2.0
            egrad = e * self.eta
            for net in reversed(self.layers):
                egrad = net.backprop(egrad)
        return loss

___

## Useful Layers

### Identify

In [7]:
class IdentityLayer(Layer):
    def __call__(self, x):
        return x
    
    def backprop(self, output_delE):
        return output_delE

### Affine
A layer that does an [affine transformation](https://mathworld.wolfram.com/AffineTransformation.html) aka affinity, which is the classic fully-connected layer with output offsets.

$$ \mathbf{M} \mathbf{x} + \mathbf{b} = \mathbf{y} $$
where
$$
\mathbf{x} = \sum_{j=1}^{n} x_j \mathbf{\hat{x}}_j \\
\mathbf{b} = \sum_{i=1}^{m} b_i \mathbf{\hat{y}}_i \\
\mathbf{y} = \sum_{i=1}^{m} y_i \mathbf{\hat{y}}_i
$$
and $\mathbf{M}$ can be written
$$
\begin{bmatrix}
    m_{1,1} & \dots & m_{1,n} \\
    \vdots & \ddots & \vdots \\
    m_{m,1} & \dots & m_{m,n}
\end{bmatrix} \\
$$

#### Error gradient back-propagation
$$ 
\begin{align}
 \frac{\partial loss}{\partial\mathbf{x}}
  &= \frac{\partial loss}{\partial\mathbf{y}} \frac{\partial\mathbf{y}}{\partial\mathbf{x}} \\
  &= \mathbf{M}^\mathsf{T}\frac{\partial loss}{\partial\mathbf{y}}
\end{align}
$$

#### Parameter adjustment
$$
 \frac{\partial loss}{\partial\mathbf{M}}
 = \frac{\partial loss}{\partial\mathbf{y}} \frac{\partial\mathbf{y}}{\partial\mathbf{M}}
 = \frac{\partial loss}{\partial\mathbf{y}} \mathbf{x} \\
 \frac{\partial loss}{\partial\mathbf{b}}
 = \frac{\partial loss}{\partial\mathbf{y}} \frac{\partial\mathbf{y}}{\partial\mathbf{b}}
 = \frac{\partial loss}{\partial\mathbf{y}}
$$

In [8]:
def column_vecify(m):
    return m.reshape((len(m),1))

class AffinityLayer(Layer):
    """An affine transformation, which is the classic fully-connected layer with offsets"""
    def __init__(self, n, m):
        self.M = np.empty((m, n))
        self.b = np.empty((m, 1))
        self.randomize()
        
    def randomize(self):
        self.M[:] = np.random.randn(*self.M.shape)
        self.b[:] = np.random.randn(*self.b.shape)
        
    def __call__(self, x):
        self.input = x
        self.output = self.M @ x + self.b
        return self.output
    
    def backprop(self, output_delE):
        input_delE = self.M.T @ output_delE
        self.M -= np.einsum('ik,jk', output_delE, self.input) \
            if len(output_delE.shape) == 2 \
            else np.outer(output_delE, self.input)
        self.b -= column_vecify(np.sum(output_delE,axis=1)) \
            if len(output_delE.shape) == 2 \
            else output_delE
        return input_delE

### Map
Maps a scalar function on the inputs, for e.g. activation layers.

In [13]:
class MapLayer(Layer):
    """Map a scalar function on the input taken element-wise"""
    def __init__(self, fun, dfundx):
        self.vfun = np.vectorize(fun)
        self.vdfundx = np.vectorize(dfundx)

    def __call__(self, x):
        self.input = x
        return self.vfun(x)
    
    def backprop(self, output_delE):
        input_delE = self.vdfundx(self.input) * output_delE
        return input_delE

___

Uncomment and run to produce an importable nn.py:

To produce an importable `nn_v2.py`:
1. Save this notebook
1. Uncomment the `jupyter nbconvert` line below
1. Execute it.
1. Comment out the convert again
1. Save the notebook again in that form

In [9]:
### !jupyter nbconvert --to script nn_v2.ipynb

[NbConvertApp] Converting notebook nn_v2.ipynb to script
[NbConvertApp] Writing 5566 bytes to nn_v2.py
