# Notebook 07 - Inverted Dropout

We now have a fully functioning library that can make multi-layer perceptrons - a neural net that is a series of Dense layers. [edit]

One of the big problems deep neural networks face, as they can be highly overparametrized, is overfitting and lack of generalizability, causing it to perform poorly on unseen data. In the final "tutorial" notebook of this series, I wanted to introduce and implement a technique used to combat this issue - __dropout__.

The idea behind dropout is actually quite simple, but there are some nuances to cover and intuitions to develop.

Dropout works by randomly "dropping out" (lol) a number of neurons in a neural network during each training iteration at a user-defined rate. When a neuron is dropped out, the neuron along with its connections are not used in the forward or backword pass. This means that in each training iterations, different sets of neurons are active, which forces the network to be more robust and prevent over-reliance on any group of neurons.

Conceptually, some like to develop the intuition that using dropout approximates training a large number of neural networks with slightly different architectures at the same time.

[explain and add diagrams]

# Dropout Node

The only hyperparameter we'll implement in our `Dropout` layer is `dropout_rate` - the probability that any given neuron is activated.

We'll then define a parameter in each node `p`, which is the `p`robability that a neuron will be activated, this is calculated as `1 - dropout_rate`.

In [10]:
# kaitorch/layers.py

class Node:

    def __init__(self, dropout_rate: float):
        self.p = 1 - dropout_rate

    def parameters(self):
        return []

Then, we generate a random float between 0 and 1 - there will be a `p` chance that this number is less than `p`, and a `1-p` chance that the number is greater than `p`. Therefore, if the number is lesser than p, we will allow the neuron to activate, if not, it will be masked and set to 0.

However, let's say we have a dropout_rate of 0.5, implying only half of the nodes will be active - this will result in  the magnitude of values being propogated forward in the model to approximately halve, which [ explain why this is stupid ].

So, to offset this unwanted effect of dropout, we multiply active weights by `1/self.p`. This will help us preserve the magnitude of the total values being propogated forward and allow the model to [do dropout stuff].

In [11]:
def __call__(self, x, train):
    return x * (1/self.p) if random.random() <= self.p else 0

And finally, we only want dropout to be applied during model training to take advantage of the regularization effects. When we are using the model to make predictions, we want to make use of all the trained weights of the full model, which are [more regularized?] and less likely to be overfit.

In [12]:
# kaitorch/layers.py

def __call__(self, x, train):
    if train:
        return x * (1/self.p) if random.random() <= self.p else 0
    else:
        return x

Node.__call__ = __call__

# Dropout Layer

As mentioned earlier, the only hyperparameter we need to tune is dropout_rate. As with our dense layer, we also need to define `nins` (number of inputs), `nouts` (number of outputs), and `nodes` (list of dropout nodes). 

It's pretty trivial to define nins and nouts they will be identical (each input gets its own dropout Node, so the size of the dropout layer will be the same as the previous layer), but for consistency and clarity we'll add them.

In [18]:
# kaitorch/layers.py

from kaitorch.core import Module

class Dropout(Module):

    def __init__(self, dropout_rate: float=0.5):

        self.nins = None
        self.nouts = None
        self.nodes = None
        self.p = dropout_rate

        if self.p < 0 or self.p > 1:
            raise ValueError("p must be a probability")
            
    def parameters(self):
        return [p for node in self.nodes for p in node.parameters()]

    def __repr__(self):
        return f'Dropout(dropout_rate={self.p})'
    
Dropout.Node = Node

Since `nins` is entirely dependent on the output size of the previous layer, we will initialize this as `None`, and fill in this value after we define the architecture of our model and `__build__` it.

In [19]:
# kaitorch/layers.py

def __build__(self, nins):
    self.nins = nins
    self.nouts = nins
    self.nodes = [self.Node(self.p) for _ in range(self.nins)]
    
Dropout.__build__ = __build__

The last thing to do is implement the `__call__` method, which takes in the output of the previous layer as an input and calls the dropout node.

Unlike the Dense layer, we also need to pass in the `train` boolean parameter as we want [different performance lol]

In [20]:
# kaitorch/layers.py

def __call__(self, xs, train):
    outs = [n(xi, train) for n, x in zip(self.nodes, xs)]
    return unwrap(outs)

Dropout.__call__ = __call__

# Finishing our Sequential Class

We're almost done building KaiTorch! There's 2 things left on our agenda before we wrap up.

 - 1) Update our methods to incorporate the `Dropout` layer
 - 2) Build out methods similar to Keras for interfacing with the model (`fit, predict, evaluate`)

In [30]:
from kaitorch.models import Sequential

As mentioned above, unlike the Dense layer, calling the Dropout layer requires a `train` argument to specify if the current iteration is for training or not.

In [None]:
# kaitorch/layers.py

def __call__(self, x, train):
    for layer in self.layers:
        if isinstance(layer, Dropout):
            x = layer(x, train)
        else:
            x = layer(x)
    return unwrap(x)

Sequential.__call__ = __call__

Also mentioned above, the `nins` of a Dropout layer is initialized as `None` as it depends on the output size of the previous layer. So, when building our model, we need to apply a forward fill on the list of layer sizes to fill in the size of each Dropout layer.

In [31]:
# kaitorch/utils.py

def ffill(x: list):
    for i in range(1, len(x)-1):
        if x[i] is None:
            x[i] = x[i-1]
    return x

In [32]:
# kaitorch/layers.py

def build(self, input_size):

    if self.built:
        return

    self.layer_sizes.insert(0, input_size)
    self.layer_sizes = ffill(self.layer_sizes)

    for idx, layer in enumerate(self.layers):
        layer.__build__(self.layer_sizes[idx])

    self.built = True
    
Sequential.build = build

Next, we'll define a `run` method that `fit`, `predict`, and `evaluate` will all be able to use.

In [33]:
# kaitorch/layers.py

from tqdm import tqdm

def run(self, x, y=None, epoch=1, epochs=1, train=False):

    # This code just prints the progress bar - it's also the only external library used in KaiTorch
    postfix_type = 'Train' if train is True else ''

    tqdm_x = tqdm(
        x,
        ncols=160,
        desc=f"Epoch {epoch:>3}/{epochs}", 
        postfix='',
        bar_format='{l_bar}{bar:40}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}{postfix}]'
    )

    # Used to the predictions for every record in x
    y_pred = []

    # For every record in our dataset
    for x_record in tqdm_x:

        # Run a forward pass of the model
        y_pred.append(self.__call__(x_record, train=train))

        # If a y label is provided, calculate the loss value
        if y:
            run_loss = self.loss(y[:len(y_pred)], y_pred)
            tqdm_x.set_postfix_str(f"{postfix_type} Loss: {run_loss.data:.4f}")
        # Otherwise, don't
        else:
            run_loss = None
            tqdm_x.set_postfix_str(f"{postfix_type}")

    # This is the training loop introduced in Notebook 4
    if train:
        self.zero_grad()
        run_loss.backward()
        self.step()

    return y_pred, run_loss

Sequential.run = run

Almost done! We just need to implement fit, predict, and evaluate, which have some overlapping steps. This is a summary of what each one needs to do:


|  | fit() | predict() | evaluate() |
| --- | --- | --- | --- |
| Calculate Loss | Yes | No | Yes|
| Output Predictions | No | Yes | No |
| Train Model | Yes | No | No |

<center><i>[i just learnt you could make tables like this with markdown pretty cool right]</i></center>

So going row by row:

- __Calculate Loss__ is returned by `run()`, and we need to initialize `history` to store the loss
- __Output Predictions__ is also returned by `run()`, and we need to do some processing to make sure the output is in the format we want
- __Train Model__ just requires us to pass `train=True` to `run()`, as well as the current `epoch` and number of `epochs` for printing and training

All 3 functions will also need to apply `wrap` to our input to make sure it is an iterable, and will `build` the model if it hasn't been built yet.

In [34]:
# kaitorch/models.py

def fit(self, x, y, epochs=1):

    x = wrap(x)
    self.build(len(x[0]))

    history = {'loss': []}

    for epoch in range(1, epochs+1):

        y_pred, run_loss = self.run(x, y, epoch, epochs, train=True)
        history['loss'].append(run_loss.data)

    return history

def evaluate(self, x, y):

    x = wrap(x)
    self.build(len(x[0]))
    
    history = {'loss': []}

    y_pred, run_loss = self.run(x, y)
    history['loss'].append(run_loss.data)

    return history

def predict(self, x, as_scalar=False):

    x = wrap(x)
    self.build(len(x[0]))

    y_pred, run_loss = self.run(x)

    # as_scalar argument for user to specify if the output should be `Scalar` or numeric
    if as_scalar:
        return y_pred
    else:
        # if output of each run is a single Scalar | eg. binary classification
        if isinstance(y_pred[0], Scalar):
            return [y.data for y in y_pred]
        # if output of each run is a list of Scalars | eg. multi-class classification
        elif isinstance(y_pred[0][0], Scalar):
            return [[y.data for y in row] for row in y_pred]

Sequential.fit = fit
Sequential.evaluate = evaluate
Sequential.predict = predict