# Exercise 2: Perceptron Learning and Maximum Margin Classification

## Exercise 2.1: Perceptron Learning
Given $L$ training `samples` $\vec{x}_i \in \mathbb{R}^{N}$ and its class `labels` $s_i\in \lbrace 1 , -1 \rbrace$, we want to train a single artificial neuron, i.e. make it to automatically learn its `weights` $\vec{w} \in \mathbb{R}^{N}$ and its `threshold` $\theta \in \mathbb{R}$, such that

$$
\begin{equation}
    \sigma \left( \vec{x}_{i} \vec{w}- \theta \right) = s_{i}, \; \forall i=1,\dots,L
\end{equation}
$$

holds. The sigmoid function is defined as

$$
\begin{equation}
    \sigma(x) = 
        \left\{ 
            \begin{array}{rl}
                1, & \text{if } x \geq 0 \\
                -1, & \text{else}
            \end{array} 
        \right.
\end{equation}
$$

The `weights` $\vec{w}$ represents a normal vector of a linear hyperplane and `threshold` $\theta$ represents its (by $\lVert \vec{w} \rVert$ scaled) distance to the origin.

To simplify learning, we apply the _threshold trick_, i.e. we extend the `weights` by an additional component in the first dimension that represents the `threshold` $\theta$. The `samples` are likewise extended by a component with constant value $-1$ in the first dimension (see `help(np.column_stack)`). In this way, for an extended `sample` $\vec{x} \in \mathbb{R}^{N+1}$ the output of the neuron can be written as

$$
\begin{equation}
  y = \sigma \left( \vec{x} \vec{w} \right)
\end{equation}.
$$

During each learning `epoch` the given $L$ training `samples` are presented to the artificial neuron in random order. We use the perceptron learning rule to adapt the extended `weights` $\vec{w}_{t}$ to $\vec{w}_{t+1}$

$$
\begin{equation}
    \vec{w}_{t+1} = \vec{w}_{t} + \varepsilon (s_{i} - y_{i}) \vec{x}_{i}
\end{equation}
$$

where

$$
\begin{equation}
    y_{i} = \sigma \left( \vec{x}_{i} \vec{w}_{t} \right)
\end{equation}.
$$

Here, $\varepsilon \in \mathbb{R}^{+}$ denotes the `learning_rate` and $\vec{x}_{i}$ a randomly selected training sample.

Implement the learning rule in Python. Furthermore, there are two data files `data_2_1.npz` and `data_2_2.npz` on the website that contain the training sets. Apply your perceptron implementation several times to the example training sets (start with a `learning_rate` $\varepsilon=0.01$). Visualize the classification plane during the learning process.

__Hints__:
- In each `epoch` of the perceptron learning process a randomly selected training sample $\vec{x}_i$ (see `help(np.random.permutation)`) with class label $y_{i}$ is classified. Then, the `weights` are modified according to the learning rule.
- A single learning epoch might not be sufficient for obtaining a correct classification of all training samples. In this case, further learning epochs should be performed until a correct classification is obtained.

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from utils import utils_2 as utils
%matplotlib inline

In [None]:
def learn_perceptron(samples, labels, learning_rate, epochs):

    # TODO n_samples: number of training samples / n_features: number of features
    n_samples, n_features =

    # TODO: initialize extended weight vector (theta included) randomly
    weights =

    # TODO: extend features by '-1' column (threshold trick)
    samples =

    for epoch in range(epochs):
    
        # TODO: generate randomly permuted index array
        indexes =
        
        # iterate through all indexes in the index array
        for index in indexes:

            # TODO: select training sample and corresponding class label according to generated random permutation
            sample =
            label =
        
            # TODO: classify selected training sample with current weights
            classification =
        
            # TODO: adapt weight vector, i.e. apply perceptron learning rule
            weights =
            
            # yield weight vector and threshold
            yield (weights[1:], weights[0])

samples, labels = utils.load_data('data/data_2_1.npz')
animation = utils.Animation(samples, labels)
weights = list(learn_perceptron(samples, labels, 
                                learning_rate=0.01, 
                                epochs=2))
animation.play(weights)

## Exercise 2.2: The DoubleMinOver Learning Rule
From the lecture, you know that the DoubleMinOver (DMO) learning rule can be used for maximum margin classification. The DMO algorithm is summarized below. Note in particular that the `weights` $\vec{w} \in \mathbb{R}^{N}$ (i.e. the _threshold trick_ is not applied) and that an explicit `threshold` $\theta$ is used that is computed after learning has been completed.



for $t=1$ to $t_{\max}$
> $\vec{x}^{\min +} = \underset{\vec{x}_i \in X^{+}}{\operatorname{argmin}}
            s_{i} \vec{x}_{i} \vec{w}
            \left( X^{+} = \left\{ \vec{x}_{i} \mid s_{i} = 1 \right\} \right)$  
> $\vec{x}^{\min -} = \underset{\vec{x}_i \in X^{-}}{\operatorname{argmin}}
            s_{i} \vec{x}_{i} \vec{w}
            \left( X^{-} = \left\{ \vec{x}_{i} \mid s_{i} = -1 \right\} \right)$  
> $\vec{w} = \vec{w} + \vec{x}^{\min +} - \vec{x}^{\min -}$

$\theta = \frac{\vec{w}^\mathrm{T} \left(\vec{x}^{\min +} + \vec{x}^{\min -}\right)}{2}$

- Implement the DoubleMinOver algorithm in Python.
- Test your implementation on the two training data sets `data_2_1.npz` and `data_2_2.npz`.
- Compare your DMO learning results with your perceptron learning results. Run both learning algorithms several times. What differences do you observe in the behaviour of the two algorithms?

In [None]:
# TODO: implement the double-min-over learning rule
def learn_dmo(samples, labels, epochs):

    # TODO n_features: dimension of the feature vector
    n_features =

    # TODO: initialize weights (threshold not! included) randomly
    weights = np.random.rand(n_features)

    for epoch in range(epochs):
        
        # TODO: extract training samples of class +1
        samples_pos =
    
        # TODO: get x_min_pos
        x_min_pos =

        # TODO: extract training samples of class -1
        samples_neg =

        # TODO: get x_min_neg
        x_min_neg =

        # TODO: adapt weight vector, i.e. apply DMO learning rule
        weights =

        # TODO: calculate threshold
        threshold =
        
        # yield weight vector and threshold
        yield (weights, threshold)

samples, labels = utils.load_data('data/data_2_2.npz')
animation = utils.Animation(samples, labels)
weights = list(learn_dmo(samples, labels, 
                         epochs=50))
animation.play(weights)