## Gradient Descent

Let us implement training a neural network using gradient descent in PyTorch.

PyTorch utilizes its autograd mechanism [docs](https://pytorch.org/docs/stable/notes/autograd.html) to calculate the gradients for every parameter in a computaton graph automatically, given an error.
For this we will:
- Build a neuron using PyTorch's API
- Calculate the output of a neuron given its current weights
- Calculate the error with a given label
- Let PyTorch figure out the gradients using .backward()
- apply the gradients with $w \leftarrow w - \alpha * w.grad$

In [None]:
# Let's start by importing the relevant packages
# matplotlib for plots
import matplotlib as mpl
from matplotlib import pyplot as plt
# pandas to read in some data
import pandas as pd
# numpy to build our first perceptron
import numpy as np
# Train test split to do validate our findings from the perceptron training
from sklearn.model_selection import train_test_split
# MinMaxScaler to normalise the data before inputting them to the perceptron
from sklearn.preprocessing import MinMaxScaler
# PyTorch for neural networks
import torch
import time
from torch import nn
%matplotlib inline
mpl.rcParams['figure.figsize'] = (16, 9)
import os
home = os.path.expanduser("~")
data = home + '/data/workshop_data/occupancy_data/datatraining.txt'
# Let us load the data from the previous example
df = pd.read_csv(data)

target = 'Occupancy'
features = [col for col in df.columns if target not in col and 'date' not in col]

In [None]:
x_train, x_val, y_train, y_val = train_test_split(df[features], df[target])
scaler = MinMaxScaler()
x_train = scaler.fit_transform(x_train)
x_val = scaler.transform(x_val)

## Build the artificial neuron using PyTorch's API
PyTorch abstracts neural networks using the nn.Module class. Every neural network has to subclass from it for PyTorch's mechanisms to work perfectly. In addition, every layer has to be a member of the network's class. Otherwise the weights do not appear as parameters of the network. Let us start by building a single neuron within a PyTorch module.

Building this in PyTorch is straight forward using an [nn.Linear](https://pytorch.org/docs/stable/nn.html#torch.nn.Linear) layer. This layer simply gets the number of inputs handed to it and the number of outputs expected. As we have 5 input features, we have 5 inputs. As we try to predict on output, so the layer needs to have 1 output. Additionally, we will need to add the sigmoid as an activation using [nn.Sigmoid](https://pytorch.org/docs/stable/nn.html#torch.nn.Sigmoid)

For the actual calculation, we will override the `forward` function of the module and for numerical stabillity we will need to calculate the input to the sigmoids activation (logits) separatly.

In [None]:
class Neuron(nn.Module):
    
    def __init__(self, number_of_inputs):
        super().__init__()
        # Build the neuron using nn.Linear
        self.neuron = 
        # use nn.Sigmoid as an activation function
        self.act = 
    
    def logit(self, inp):
        # Calculate the input to the activation function
        # Hint: Calculating the output of a layer can be
        # done by simply calling layer(inp) and the output
        # from the linear layer (the neuron) is the input to the activation function
    
    def forward(self, inp):
        # Use the output of the logits function to 
        # calculate the output of the whole network
    

Let us now grap a random selection of the training data, run them through the neuron, let PyTorch calculate the gradients and change the weights accordingly:
- We will use [nn.BCEWithLogitsLoss](https://pytorch.org/docs/stable/nn.html#torch.nn.BCEWithLogitsLoss) to calculate the error.
- We run .backward() on the loss to get the gradients for all parameters.
- We update the weights using `w = w - alpha * w.grad` 

In [None]:
# Build the loss
loss = 
# Build the neuron:
neuron = 
alpha = 5e-2
select = np.random.randint(0, len(x_train), 2048)
x = torch.from_numpy(x_train[select]).float()
y = torch.from_numpy(y_train.iloc[select].values).float().unsqueeze(1)
y_logits = neuron.logit(x)
err = loss(y_logits, y)
# Let PyTorch figure out the gradients using err.backward()
err.backward()
# Update the weights
for name, w in neuron.named_parameters():
    print('Parameter {}\n{}\nGradient {}'.format(name, w, w.grad))
    w = 


## Optimizer
As updating the weights can be done automatically as well, PyTorch implements [optimizers](https://pytorch.org/docs/stable/optim.html) that can take care of this for us. 
We will start with the simplest optimizer [Storchastic Gradient Descent](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD).

In [None]:
# Instantiate the optimizer, by passing all model 
# parameters (all parameters of the neuron) 
# to it and the learning rate alpha to it
optim = torch.optim.SGD(...)

They can reset the gradients of all associated parameters using `optim.zero_grad()` and can apply the gradients to the parameters using `optim.step()`.

In [None]:
def fit_batch(optim, loss, neuron, x, y):
    # Reset the gradients of all parameters:
    ...
    y_pred = neuron.logit(x)
    #print(y, y_pred, y.sum())
    err = loss(y_pred, y)
    #err = err * (y * 3 + 1)
    err.mean().backward()
    
    # Apply the gradients to all parameters 
    
    return y_pred

def eval_batch(neuron, x):
    y_pred = neuron.logit(x)
    return y_pred

In [None]:
# Let's run the actual optimization
start = time.time()  
for i in range(20):
    acc = None
    for i in range(200):
        select = np.random.randint(0, len(x_train), 2048)
        x = torch.from_numpy(x_train[select]).float()
        y = torch.from_numpy(y_train.iloc[select].values).float().unsqueeze(1)
        y_pred = fit_batch(optim, loss, neuron, x, y)
        if acc is None:
            acc = (y==(y_pred > .5).float()).float().mean()
        else:
            acc += (y==(y_pred > .5).float()).float().mean()
        #y_pred = y_pred.argmax(dim=-1)
        #acc += (y==y_pred).float().mean()
    print('accuracy {}'.format(acc/200))
print('Training time: {}'.format(time.time() - start))

## Move the neuron to the GPU
PyTorch tensors and modules allow us to call .cuda() on them to move the computations to the GPU.
This makes it really easy to perform any calculation on the GPU (which is super handy even if you do not use neural networks.


In [None]:
if torch.cuda.is_available():
    # Move the neuron to the GPU by calling to .cuda()
    neuron = Neuron(5)
    optim = torch.optim.SGD(neuron.parameters(), lr=5e-2)
    start = time.time()
    for i in range(20):
        acc = None
        for i in range(200):
            select = np.random.randint(0, len(x_train), 2048)
            # Move x and y to the GPU by calling .cuda() on them as well:
            x = torch.from_numpy(x_train[select]).float()
            y = torch.from_numpy(y_train.iloc[select].values).float().unsqueeze(1)
            y_pred = fit_batch(optim, loss, neuron, x, y)
            if acc is None:
                acc = (y==(y_pred > .5).float()).float().mean()
            else:
                acc += (y==(y_pred > .5).float()).float().mean()
        print('accuracy {}'.format(acc.data.cpu().numpy()/200))
    print('Training time: {}'.format(time.time() - start))
    

## Why is the GPU version slower?

Well, we need to move the data to the GPU and back. This costs us time. It normally pays off, as the computations take way longer than moving the data. In our current case the computation is very simple and the amount of data very small. This nothing the GPU is well suited for, because it can not use its advantage of performing a lot of computations in parallel.