<a href="https://colab.research.google.com/github/spirosChv/imbizo2022/blob/main/exercises/exercise6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simulating dendrites - Part 6: The Perceptron Algorithm

In [None]:
# import packages
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tqdm import tqdm


In [None]:
# @title Make nicer plots -- Execute this cell
def mystyle():
  """
  Create custom plotting style.

  Returns
  -------
  my_style : dict
      Dictionary with matplotlib parameters.

  """
  # color pallette
  style = {
      # Use LaTeX to write all text
      "text.usetex": False,
      "font.family": "DejaVu Sans",
      "font.weight": "bold",
      # Use 16pt font in plots, to match 16pt font in document
      "axes.labelsize": 16,
      "axes.titlesize": 20,
      "font.size": 16,
      # Make the legend/label fonts a little smaller
      "legend.fontsize": 14,
      "xtick.labelsize": 14,
      "ytick.labelsize": 14,
      "axes.linewidth": 2.5,
      "lines.markersize": 10.0,
      "lines.linewidth": 2.5,
      "xtick.major.width": 2.2,
      "ytick.major.width": 2.2,
      "axes.labelweight": "bold",
      "axes.spines.right": False,
      "axes.spines.top": False
  }

  return style


plt.style.use("seaborn-colorblind")
plt.rcParams.update(mystyle())

Before start building the Perceptron Model, first we need to load the required packages and the data set. The data set is present in the sklearn datasets module.

In [None]:
X, y = datasets.make_blobs(n_samples=1000, n_features=2,
                           centers=2, cluster_std=2.2,
                           random_state=2)
# Plotting
fig = plt.figure(figsize=(10, 8))
plt.scatter(X[:, 0][y == 0], X[:, 1][y == 0], label='group1')
plt.scatter(X[:, 0][y == 1], X[:, 1][y == 1], label='group2')
plt.xlabel("feature 1")
plt.ylabel("feature 2")
plt.title('Random Classification Data with 2 classes')
plt.legend()
plt.show()

There are two classes, red and green, and we want to separate them by drawing a straight line between them. Or, more formally, we want to learn a set of parameters theta to find an optimal hyperplane(straight line for our data) that separates the two classes.

Let’s code the step function.

In [None]:
def step_func(z):
  return 1.0 if (z > 0) else 0.0

We can visually understand the Perceptron by looking at the above image. For every training example, we first take the dot product of input features and parameters, theta. Then, we apply the Unit Step Function to make the prediction(`y_hat`).

And if the prediction is wrong or in other words the model has misclassified that example, we make the update for the parameters theta. We don’t update when the prediction is correct (or the same as the true/target value y).

Let’s see what the update rule is.

## Perceptron Update Rule

The perception update rule is very similar to the Gradient Descent update rule. The following is the update rule:

\begin{equation}
w_j := w_j + \eta \left( y^{[i]} - \phi(x^{[i]}) \right)x^{[i]}_j
\end{equation}

where $j \in [1, D]$, and $i \in [1, N]$, $D$: number of features, $N$: number of samples.

<br>

**Note:** Even though the Perceptron algorithm may look similar to logistic regression, it is actually a very different type of algorithm, since it is difficult to endow the perceptron’s predictions with meaningful probabilistic interpretations, or derive the perceptron as a maximum likelihood estimation algorithm.

<br>

**Math behind the perceptron algorithm:**

We can distinguish the information flow into two phases:
- forward pass
- backward pass

During the forward pass, we calculate the activation of the output node ($\hat{y}$).

\begin{align}
z_i^{[j]} &= \sum_{i=1}^{D}w_ix_i^{[j]} + b \\
\hat{y}^{[j]} &= \phi \left( z_i^{[j]} \right)
\end{align}

where $i$ denotes the features, and $j$ the samples.

Then, we calculate a loss $\mathcal{L}$ (i.e., error) between our prediction $\hat{y}$ and the real target value $y$.

Here, we will use the squared error defined as:

\begin{equation}
\mathcal{L}(y, \hat{y}; w, b) = \left( y - \hat{y} \right)^2 = \left( y -  \phi \left( z_i^{[j]} \right) \right)^2 = \left( y - \phi \left( \sum_{i=1}^{D}w_ix_i^{[j]} + b\right) \right)^2
\end{equation}

In mathematics, a way to find the optimum (i.e., maximum or minimum) of any function with respect to a parameter is to calculate its derivative with respect to this parameter and try to minimize (or maximize) it. In our case, as we want to minimize the error between our predictions and the real target values, we will perform a minimization. Our parameters are the weights $w_i$ and the bias $b$.

Our aim is to calculate the partial derivatives with respect to the parameters. We are going to use the chain rule, which is a formula that expresses the derivative of the composition of two differentiable functions $f$ and $g$ in terms of the derivatives of $f$ and $g$.

If a variable $z$ depends on the variable $y$, which itself depends on the variable $x$ (that is, $y$ and $z$ are dependent variables), then $z$ depends on $x$ as well, via the intermediate variable $y$. In this case, the chain rule is expressed as

\begin{equation}
\frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx}
\end{equation}

During the backword pass, we apply the chain rule, as we want to find the $\frac{\partial{\mathcal{L}}}{\partial{w_i}}$, and $\frac{\partial{\mathcal{L}}}{\partial{b}}$.

\begin{align}
\frac{\partial{\mathcal{L}}}{\partial{w_i}} &= \frac{\partial{\mathcal{L}}}{\partial{\hat{y}}} \frac{\partial{\hat{y}}}{\partial{w_i}} = \frac{\partial{\mathcal{L}}}{\partial{\hat{y}}} \frac{\partial{\hat{y}}}{\partial{z_i}}\frac{\partial{z_i}}{\partial{w_i}} = -2(y-\hat{y}) \phi'(z_i)x_i \\ \\
\frac{\partial{\mathcal{L}}}{\partial{b}} &= \frac{\partial{\mathcal{L}}}{\partial{\hat{y}}} \frac{\partial{\hat{y}}}{\partial{b}} = -2(y-\hat{y})
\end{align}

In [None]:
class Perceptron():

  def __init__(self, lr=0.01, epochs=200, activ='threshold'):
    self.epochs = epochs
    self.lr = lr
    self.activ = activ

  def activ_func(self, z, mode='threshold', deriv=False):
    if mode == 'threshold':
      if deriv:
        return 1.0
      else:
        return np.where(z >= 0, 1, 0)
    elif mode == 'sigmoid':
      def sigmoid(z):
        return 1 / (1 + np.exp(-z))
      if deriv:
        return sigmoid(z) * (1 - sigmoid(z))
      else:
        return sigmoid(z)


  def fit(self, X, y):
    N, D = X.shape
    # Initialization of the weights
    w = np.zeros((D, 1))
    b = 0
    losses = []
    for i in tqdm(range(self.epochs)):
      loss = 0
      for idx, x_i in enumerate(X):
        z_i = x_i.T @ w + b
        yhat = self.activ_func(z_i, self.activ)

        loss += (y[idx] - yhat)**2

        # Update rule
        w -= self.lr*(-2*(y[idx] - yhat) * self.activ_func(z_i, self.activ, deriv=True) * x_i).reshape(-1, 1)
        b -= self.lr*(-2*(y[idx] - yhat))

      losses.append(loss/X.shape[0])

    return losses, w, b

  def predict(self, X, y, w, b):
    ypred = []
    for x_i in X:
      a_i = self.activ_func(x_i.T @ w + b, self.activ)
      ypred.append(1 if a_i >= 0.5 else 0)

    accuracy = np.sum(np.abs(ypred - y) == 0)/len(y)
    return ypred, accuracy

In [None]:
scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2,
                                                random_state=0)

Let's initialize a new perceptron class object. Then we will use that object we will call `fit` method on our training data to learn the best possible parameters. We will evaluate the model performance on the test data by calculating the testing accuracy.

In [None]:
# draw boundary line
def plot_decision_boundary(X, w, b):
  c = -b/w[1]
  m = -w[0]/w[1]
  x1min, x1max = [np.min(X[:, 0]), np.max(X[:, 0])]
  x2min, x2max = [np.min(X[:, 1]), np.max(X[:, 1])]
  xd = np.array([x1min, x1max])
  yd = m*xd + c
  # Plotting
  plt.figure(figsize=(10, 8))
  plt.scatter(X[y==0, 0], X[y==0, 1], label='group 1')
  plt.scatter(X[y==0, 1], X[y==1, 1], label='group 2')
  plt.xlabel("feature 1")
  plt.ylabel("feature 2")
  plt.title('Perceptron Algorithm')
  plt.plot(xd, yd, 'k-')
  plt.fill_between(xd, yd, alpha=0.1)
  plt.fill_between(xd, yd, alpha=0.1)
  plt.legend()
  plt.show()

In [None]:
percp = Perceptron(lr=1e-2, epochs=500, activ='sigmoid')

losses, w, b = percp.fit(Xtrain, ytrain)

plt.figure(figsize=(12, 8))
plt.plot(range(len(losses)), losses)
plt.xlabel('epoch')
plt.ylabel('loss (a.u.)')
plt.show()

ypred, accuracy = percp.predict(Xtest, ytest, w, b)
print(f"Accuracy on test set: {accuracy*100:.2f}%")

In [None]:
plot_decision_boundary(X, w, b)

In [None]:
# @title Explore hyperparameters

std = 4  # @param
lr = 0.1  # @param
epochs = 200  # @param
func = "sigmoid"  # @param

X, y = datasets.make_blobs(n_samples=150, n_features=2,
                           centers=2, cluster_std=std,
                           random_state=2)


scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2,
                                                random_state=0)

percp = Perceptron(lr=lr, epochs=epochs, activ=func)

losses, w, b = percp.fit(Xtrain, ytrain)

ypred, accuracy = percp.predict(Xtest, ytest, w, b)

plt.figure(figsize=(12, 8))
plt.plot(range(len(losses)), losses)
plt.xlabel('epoch')
plt.ylabel('loss (a.u.)')
plt.title(f'Accuracy: {accuracy*100:.2f}%')
plt.show()
plot_decision_boundary(X, w, b)

In [None]:
from tensorflow import keras

# Model / data parameters
num_classes = 2
input_shape = (28, 28, 1)

# Choose two classes, 0 and 1

# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

train_mask = np.isin(y_train, [0, 1])
test_mask = np.isin(y_test, [0, 1])

x_train, y_train = x_train[train_mask], y_train[train_mask]
x_test, y_test = x_test[test_mask], y_test[test_mask]

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

In [None]:
plt.figure(figsize=(10, 8))
idxs = np.random.choice(x_train.shape[0], 10, replace=False)
for i, idx in enumerate(idxs):
  plt.subplot(5, 5, i+1)
  plt.imshow(x_train[idx], 'gray')
  plt.xticks([])
  plt.yticks([])
plt.tight_layout()
plt.show()

In [None]:
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1]*x_train.shape[2])
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1]*x_test.shape[2])

In [None]:
percp = Perceptron(lr=1e-2, epochs=220, activ='sigmoid')

losses, w, b = percp.fit(x_train, y_train)

ypred, accuracy = percp.predict(x_test, y_test, w, b)

print(f"Accuracy on test set:{accuracy*100:.2f}%")

In [None]:
print(f"Accuracy on test set: {accuracy*100:.2f}%")

# Limitations of Perceptron Algorithm

- It is only a linear classifier, can never separate data that are not linearly separable.
- The algorithm is used only for Binary Classification problems.