In [None]:
import torch
import torchvision
import numpy as np
from torch import nn

from helpers import Model, train, visualize_1d, visualize_2d

# Activations
Today we are learning about various activations used in neural networks. These activations are (typically) non-linear functions allowing the network to learn more complex internal representations.

## H(y) Step Function
In the first journal club of this module we used the step function. Which we learned in the second we are not able to use for training a model because it has 0 derivative almost everywhere

In [None]:
class StepFunction(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        y = torch.zeros_like(x)
        y[x > 0] = 1
        return y
    
visualize_1d(StepFunction())

# Sigmoid
How can we fix the derivative problem? Use a fuction that looks similar but is differrentable. For the step function this is the sigmoid function

In [None]:
visualize_1d(nn.Sigmoid(), min_x=-10, max_x=10)

## Max Function
For classification we would like our output to represent which class is most likely to be right answer one way of doing this is having the largest output set to 1 and all the others to 0.

In [None]:
class Max(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        y = torch.zeros_like(x)
        max_inds = torch.max(x, dim=-1)[1]
        for i in range(y.shape[-1]):
            y[max_inds == i, i] = 1
        return y

visualize_2d(Max())

## SoftMax
Again we made a function we cant use... But we use the same trick and use whats known as  the SoftMax function. This function looks kind of like the above function but smoothly transitions between the 2 maximum values

In [None]:
visualize_2d(nn.Softmax(dim=1))

## Training
We now have all the pieces we need to do training. We'll be building networks that predict 10 diffrent articles of clothing using the FMNIST dataset. This dataset is similar to the MNIST digit prediction dataset. The model were constructing will use whatever activations you give it. An `internal_activation` for using between hidden layers, and `final_activation` to use at the end before the loss function. The network has two hidden layers with 512 then 128 neurons. 

In [None]:
model = Model(nn.Sigmoid(), nn.Softmax(dim=1))
model = train(model)

# Your Turn
Can you beat the score above? Try out whatever activations you'd like and see if you can find anything interesting. And don't worry about rerunning to get better initializations. The random seeds are fixed to stay fair.

In [None]:
internal_activation = nn.Sigmoid() # Try out diffrent internal activations
final_activation = nn.Softmax(dim=1)  # If brave try to find a better function for feeding to cross entropy loss

model = Model(internal_activation, final_activation
model = train(model)