## LB03.0 The Perceptron

In this lecture we are going to learn the basics of a standard artificial neural network (ANN). 

<img src="resources/LB03_perceptron.png" style="width: 400px;"/>

* The perceptron is the smallest computational unit in an artificial neural network. 
* It takes inputs $x_1, ..., x_d$ (features of the data set)
* Computes the activation: $act = \sum_{i=1}^{d} w_i \cdot x_i + w_0$
* The parameters of one perceptron are weights $w_i$ and bias $w_0$
* The output of one perceptron is $o = f(act)$ where $f()$ is a non-linear transfer function

In [None]:
# We will need this in order to get interactive plots
%matplotlib inline  
%matplotlib notebook
%pylab

# Importing the packages needed for this lecture
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

## LB03.1 Perceptron definition (50%)
In this task, you will define the perceptron as a python class - with all its important attributes and methods.

## LB03.1 a) Non linear transfer function
As already stated in LB03.0, every perceptron needs a non-linear transfer function. Define a python function that calculates:

<center>$\large logistic(x) = \frac{1}{1 + e^{-x}}$</center>

In [None]:
def logistic(x):
    # TODO: Return the result of the calculation
    return ...

## LB03.1 b) Perceptron class
Define a simple perceptron class which takes 2 arguments in the constructor: 
* `input_dim` (input dimensionality)
* `lr` (learning rate). 

The class should also implement the functions `forward()` and `backward()`.

#### `forward(inputs)`:
The function `forward()` should take the inputs (which are the features of the data sample) and compute the output of the perceptron. In order to calculate the output you will need to calculate the sum of the multiplication of weights with the inputs. Afterwards the result of the activation is going through a non-linear function.

$\normalsize  act = \sum_{i=1}^{d} w_i \cdot x_i + w_0$

$\normalsize o = f(act)$ where $o$ is output

#### `backward(inputs, label, output)`:
This function is taking `inputs`, `label` and `output` as an argument and updates the parameters of the perceptron using the computed gradient. 

##### Calculating the gradient: 


$\large \frac{\partial loss(x_1, x_2, \tau)}{w_I} = (o - \tau) \cdot (logistic(act) \cdot (1 - logistic(act)) \cdot x_I$ 

More information - S. Wegenkittl: Lecture on Machine Learning (Slides: 106ff)

##### Basic backpropagation: 

$\large par_1 = par_0 - \alpha_0 \cdot grad_0(loss)$ 


More information - S. Wegenkittl: Lecture on Machine Learning (Slides: 101ff)



In [None]:
class Perceptron(object):
    def __init__(self, input_dim, lr=0.01):
        # TODO: Initialize the learning rate
        self.lr = ...
        # TODO: Initialize the weights of the perceptron with uniformly distributed random values between -1 and 1
        # Every single input (feature) of your perceptron has an associated weight, thus
        # the shape of the variable `weights` should be (input_dim, )
        self.weights = ...
        # TODO: Initialize the bias of the perceptron with a uniformly distributed random value between -1 and 1
        # Every perceptron in the network has a bias associated to it, e.g. the bias should be a scalar value
        self.bias = ...

    def forward(self, inputs):
        # TODO: Compute the activation of the perceptron
        act = ...
        # TODO: Compute the output using your logistic function previously defined
        output = ...
        return output

    def backward(self, inputs, label, output):
        # TODO: Compute the loss according to S. Wegenkittl: Lecture on Machine Learning slide 106 
        loss = ...
        # TODO: Compute the gradient of the loss
        gradient = ...
        # TODO: Update the weights and the bias using the computed gradient and defined learning rate
        self.weights = ...
        self.bias = ...
        # TODO: Return the calculated loss
        return loss

## LB03.1 c) Training your perceptron
Use the functions `forward()` and `backward()` of the perceptron class in order to train and adjust the parameters of this simple network. The function `train()` will take `perceptron` which is the only building block of your simple network, `X` input data, `y` labels and `epochs` as its arguments.

In [None]:
def train(perceptron, X, y, epochs=100):
    # TODO: Define an empty array which will contain your losses over the epochs
    loss = ...
    # TODO: Repeat the training process epochs times
    for ...:
        # TODO: Loop through your training data and use the function forward() 
        # to get the output of the perceptron using its current parameters and the function 
        # backward() in order to adjust the weights according to the current gradient of the loss
        for ...:
            
        # TODO: Calculate the average loss over all samples in one epoch
        loss.append(...)
    # TODO: Return losses
    return loss

## LB03.1 d) Prediction
Use the function `forward()` of the perceptron class to make the prediction based on the input data. Because we use the logistic function as our output transfer function, `predict()` returns a percentage value that indicates the probability whether a sample is part of the <em>positive</em> class.


In [None]:
def predict(perceptron, X):
    return ...

## LB03.1 d) Plotting the loss curve
Define a function `plot_loss` which will take an array of losses `loss` over the epochs as an argument and plots losses over epochs.

In [None]:
def plot_loss(loss):
    

## LB03.2 Experiment using a self implemented perceptron (30%)
Now that the perceptron has been defined, we are going to use it in a classification task of two very simple data sets.

## LB03.2 a) 2d data set
Start with a simple 2d data set. Create `X` which consists of $x_1$ and $x_2$ and corresponding label `y` which consists of $y$ from the following table:

| $x_1$ | $x_2$ | $y$ |
| :-: | :-: | :-: |
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |

This data set will be used for the training of your perceptron. To be able to evaluate if our perceptron was able to complete its task, you should take a look at the loss curve.

In [None]:
# TODO: Create the X and y
X = np.array([
    ...
])
y = np.array([...])

In [None]:
# TODO: Create an object of class Perceptron with the wished dimensionality
perceptron = ...

# TODO: Train the created perceptron using the input data and labels for 1000 epochs
losses = ...

# TODO: Plot your losses over the epochs
plot_loss(...)

In [None]:
# TODO: Now you will use the predict() function
# For this simple case, we will use the same data we trained the classificator with 
# for the prediction
predict(...)

### Lets visualize the data and the decision boundary in a 2d plot

In [None]:
# Creating a matplotlib figure
fig = plt.figure()
# Using the axes() function to access the ax object
ax = plt.axes()

# Using the scatter() function to plot the data points in the figure
ax.scatter(X[:, 0], X[:, 1], c = y, cmap="Spectral");

# Creating the x values of the line
xx = np.linspace(-2,2,500)
# Calculating the corresponding y value
yy = -((perceptron.weights[0]*xx) + perceptron.bias)/perceptron.weights[1]

# Limiting the axes
plt.xlim((-0.5, 1.5))
plt.ylim((-0.5, 1.5))

# Plotting the decision boundary
ax.plot(xx, yy, 'g-', linewidth=1);


## LB03.2 b) 3d data set
Start with a simple 3d data set. Create `X` which consists of $x_1$, $x_2$ and $x_3$ and corresponding label `y` which consists of $y$ from the following table:

| $x_1$ | $x_2$ | $x_3$ | $y$ |
| :-: | :-: | :-: | :-: |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 1 | 0 |

Don't forget to train your perceptron and evaluate its learning progress!

In [None]:
# TODO: Construct the X and y
X = np.array([
    ...
])
y = np.array([...])

In [None]:
# TODO: Create a new perceptron object with given dimensionality
perceptron_3d = ...

# TODO: Train the newly created perceptron with 3d data for 1000 epochs
losses = ...

# TODO: Plot your losses over the epochs
plot_loss(...)

In [None]:
# TODO: Use the predict() function to obtain the associated labels
# For this simple case, we will use the same data we trained the classificator with 
# for the prediction
predict(...)

### Question: Did you get the expected results? Describe what you see.
(Hint: Think about the properties of a single perceptron and its abilities when separating a feature space. The plot in the cell below should also give you more information on this question.)

### Lets visualize the data and the decision boundary in a 3d plot

In [None]:
# Creating a matplotlib figure
fig = plt.figure()
# Using the axes() function to access the ax object
ax = plt.axes(projection = "3d")

# Creating the x range
xx = np.linspace(-0.5, 1.5, 50)
# Creating the y range
yy = np.linspace(-0.5, 1.5, 50)

# Creating meshgrid which is going to be used for the calculation 
# of the Z values of corresponding hyperplane
xx, yy = np.meshgrid(xx, yy)

# Calculating the Z values of the hyperplane using the weights of the perceptron
Z = -(perceptron_3d.weights[0]*xx + perceptron_3d.weights[1]*yy + perceptron_3d.bias)/perceptron_3d.weights[2]

# Limiting the axes
plt.xlim((-0.5, 1.5))
plt.ylim((-0.5, 1.5))

# Plotting the decision boundary hyperplane 
ax.plot_surface(xx, yy, Z, alpha=0.5)

# Using the scatter3D() function to plot the data points in the figure
ax.scatter3D(X[:, 0], X[:, 1], X[:, 2], c = y, cmap="Spectral");

## LB03.3 Experiment with a more complex artificial neural network (20%)

Because implementing more complex neural networks is not a task you would typically do in a jupyter notebook, we are going to rely on the `Keras` framework to do this for us.
To see if we can solve the 3d classification challenge from LB03.2, we will leave the inputs and labels the same, but use a more complex neural network (with hidden layers).

| $x_1$ | $x_2$ | $x_3$ | $y$ |
| :-: | :-: | :-: | :-: |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 1 | 0 |

For more information on the `Keras` libary refer to https://keras.io/.

In [None]:
# Importing the needed libraries
import keras
from keras.models import Sequential
from keras.layers import Dense

In [None]:
# TODO: Construct the X and y
X = np.array([
    ...
])
y = np.array([...])

## LB03.3 a) Construct a simple neural network

In this task you will construct a simple neural network with two layers. The number of nodes are to be decided on. How many nodes do you need in order to successfully classify this data set?

In [None]:
# TODO: Define the number of the nodes in the hidden layer
hidden_nodes = ...
# TODO: Define the dimension of the input data
input_dimension = ...
# TODO: Define the dimensionality of the output
output_dimension = ...
# TODO: Choose an appropriate transfer function for your network - https://keras.io/activations/
transfer_function = ...

# Creating a sequential model and adding two layers to the model
model = Sequential()
model.add(Dense(hidden_nodes, input_dim=input_dimension, activation=transfer_function))
model.add(Dense(output_dimension, activation=transfer_function))

### Question: What is the least number of nodes in the hidden layer which is able to solve this classification task? Why do we care so much about the number of nodes in the hidden layer?

## LB03.3 b) Model compiling
Before you can start training your model it is necessary to compile the model with whished loss function and optimizer.

In [None]:
# TODO: Choose an appropriate loss function for this problem - https://keras.io/losses/
loss_function = ...

# TODO: Choose an optimizer - https://keras.io/optimizers/ and S. Wegenkittl: Lecture on Machine Learning slide 116
optimizer = ...

model.compile(loss=loss_function, optimizer=optimizer)

## LB03.3 c) Fitting the model
It is time to train the compiled model. Use the function `fit()` with appropriate arguments to train your model for 5000 epochs. 

In [None]:
# TODO: Fit the model for 5000 epochs
history = model.fit(...)

In [None]:
# TODO: Use the plot_loss() function you previously defined to plot the loss of the fitted model
losses = ...
plot_loss(losses)

## LB03.3 d) Evaluation
You will now use the function `predict()` with appropriate arguments to obtain the labels of the data from the model. For this simple case you will use the training data for predictions.

In [None]:
# TODO: Predict the data
model.predict(...)