# Problem Sheet 9

## Perceptron algorithm

## Task (a):
Consider the dataset $\mathbf{D} = \{(\mathbf{x}_1,y_1), (\mathbf{x}_2,y_2)\}$ from Problem Sheet 6, where
$$
\mathbf{x}_1 = \begin{bmatrix}-1 \\ 1\end{bmatrix}, \qquad \mathbf{x}_2 = \begin{bmatrix}2 \\ -1\end{bmatrix},
$$
and $y_1=1$, $y_2=-1$.

- Calculate iterations $\boldsymbol\theta_0\ldots,\boldsymbol\theta_k$ of the Perceptron algorithm until convergence ($y_i \langle \boldsymbol\theta_k, \mathbf{x}_i\rangle >0$ for both $i=1$ and $2$) for two choices of $i_0$ in the first step:
  - when $i_0=1$ is selected;
  - when $i_0=2$ is selected.
- Which of these two scenarios produces $\boldsymbol\theta_k$ collinear to the Support Vector Machine solution $\boldsymbol\theta^* = (-1/2,1/2)$?

## Solution:

Note that $\boldsymbol\theta_0=\mathbf{0}$, so in general any $i_0=1,\ldots,m$ can be chosen.
- $i_0=1$:
$$
\boldsymbol\theta_1 = y_1 \mathbf{x}_1 = \begin{bmatrix}-1 \\ 1\end{bmatrix},
$$
now $y_2 \langle \boldsymbol\theta_1, \mathbf{x}_2 \rangle = - (-2 - 1) = 3 > 0$, and the algorithm stops.
- $i_0=2$:
$$
\boldsymbol\theta_1 = y_2 \mathbf{x}_2 = \begin{bmatrix}-2 \\ 1\end{bmatrix},
$$
and $y_1 \langle \boldsymbol\theta_1, \mathbf{x}_1 \rangle = 2 + 1 = 3 > 0$, and the algorithm stops.

However, note that only $i_0=1$ gives $\boldsymbol\theta_1 = 2 \boldsymbol\theta^*$.

## Task (b) (Warm-up):

- Suggest a method to choose $i_k$ that gives the fastest estimated convergence of the Perceptron algorithm, that is, the largest practically computable lower bound of $\cos \angle (\boldsymbol\theta^*,\boldsymbol\theta_{k+1})$ in Theorem 4.38.

## Solution:

$\cos \angle (\boldsymbol\theta^*,\boldsymbol\theta_{k+1})$ depends on the ratio of two terms which in turn depends in $i_k$: $\langle \boldsymbol\theta^*, \boldsymbol\theta_{k+1} \rangle = \langle \boldsymbol\theta^*, \boldsymbol\theta_{k} \rangle + y_{i_k} \langle \boldsymbol\theta^*, \mathbf{x}_{i_k} \rangle$, which we cannot control though since we don't know $\boldsymbol\theta^*$, and $\|\boldsymbol\theta_{k+1}\|_2^2 = \|\boldsymbol\theta_k\|_2^2 + 2 y_{i_k} \langle \boldsymbol\theta_k, \mathbf{x}_{i_k} \rangle + \|\mathbf{x}_{i_k}\|_2^2$. So we obtain
$$
\cos \angle (\boldsymbol\theta^*,\boldsymbol\theta_{k+1}) = \frac{\langle \boldsymbol\theta^*, \boldsymbol\theta_{k} \rangle + y_{i_k} \langle \boldsymbol\theta^*, \mathbf{x}_{i_k} \rangle}{B \|\boldsymbol\theta_{k+1}\|_2} \ge \frac{\langle \boldsymbol\theta^*, \boldsymbol\theta_{k} \rangle + 1}{B \|\boldsymbol\theta_{k+1}\|_2},
$$
and to maximize the lower bound in the right hand side we need to minimize $\|\boldsymbol\theta_{k+1}\|_2$ over $i_k: i_k=1,\ldots,m$, $y_{i_k} \langle \boldsymbol\theta_k, \mathbf{x}_{i_k}\rangle \le 0$.

---

## Task 1

- Write a Python function `PerceptronM(X,y, K=100)` for the modified Perceptron algorithm designed in Task (b), with the same inputs `X`, `y` and `K` as in the original `Perceptron` function in the `Perceptron.ipynb` notebook. You can also test your function on the same example.

In [1]:
# Solution:
import numpy as np

def PerceptronM(X,y, K=100):
    theta = np.zeros(X.shape[1])
    for k in range(K):
        best_norm = np.inf
        best_theta = theta
        for i in range(y.size):
            if y[i] * X[i] @ theta <= 0:
                theta_next = theta + y[i] * X[i]
                if np.linalg.norm(theta_next) < best_norm:
                    best_norm = np.linalg.norm(theta_next)
                    best_theta = theta_next
        theta = best_theta
        if best_norm == np.inf:
            break
        print(f"iteration {k}")
    return theta

In [2]:
# Test
# Spam term-to-document dataset, X = {x_1,...,x_m}, with terms:
#               AND OFFER THE  OF SALE
Xh = np.array([[ 1 ,  1 ,  0 , 1 , 1],   # x_1   spam
               [ 0 ,  0 ,  1 , 1 , 0],   # x_2   not spam
               [ 0 ,  1 ,  1 , 0 , 0],   # x_3   spam
               [ 1 ,  0 ,  0 , 1 , 0],   # x_4   not spam
               [ 1 ,  0 ,  1 , 0 , 1],   # x_5   spam
               [ 1 ,  0 ,  1 , 1 , 0]    # x_6   not spam
             ])
y = np.array([1,-1,1,-1,1,-1])
X = np.hstack((np.ones((Xh.shape[0],1)), Xh)) # Affine -> Homogeneous form

theta = PerceptronM(X[:4], y[:4])
print("theta: " + str(theta))
print("predicted labels: " + str(np.sign(X @ theta)))

iteration 0
iteration 1
iteration 2
iteration 3
theta: [ 0.  0.  2.  0. -1.  1.]
predicted labels: [ 1. -1.  1. -1.  1. -1.]
