# Simple Perceptron Implementation
_by Mihai Dan Nadăș (mihai.nadas@ubbcluj.ro), January 2025_

This notebook implements a version of the perceptron as introduced by Frank Rosenblatt's 1958 paper, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain."

We will use basic math concepts and aim to avoid linear algebra (i.e., working with vectors and matrices) as much as possible.

## Objective

The goal is to train a model with two weights, $w_{1}$, $w_{2}$ (one for each coordinate $x$, $y$ of a point defined as $(x, y)$), and one bias $b$, adapting the algebraic equation $y = mx + c$ (slope-intercept form of a line).

The model will address a simple classification task of a linearly separable dataset based on the following function:

$$
f: \mathbb{N} \to \mathbb{N}, \quad f(x) =
\begin{cases}
x, & \text{if } x \bmod 2 = 0, \\
2x, & \text{if } x \bmod 2 = 1.
\end{cases}
$$

## Dataset

First, we will generate a dataset using the Python Standard Library.

In [None]:
import random


def generate_dataset(num_items=20, start=0, stop=100):
    random.seed(42)
    dataset = []
    x1_values = set()
    while len(dataset) < num_items:
        x1 = random.randint(start, stop)
        if x1 in x1_values:
            continue
        x1_values.add(x1)
        x2 = x1 if x1 % 2 == 0 else 2 * x1
        y = (
            0 if x1 == x2 else 1
        )  # (x1, x2) is labeled as Class 0 if x1 is even, and Class 1 otherwise
        dataset.append((x1, x2, y))
    return dataset


dataset = generate_dataset()

# let's now split the dataset into training and test sets
train_ratio = 0.8
num_train = int(len(dataset) * train_ratio)
dataset_train, dataset_test = dataset[:num_train], dataset[num_train:]
print(f"Training set (n={len(dataset_train)}): {dataset_train}")
print(f"Test set (n={len(dataset_test)}: {dataset_test}")

## Visual Representation

In this section, we will use _Matplotlib_ and _pandas_ to visually represent the training and test datasets. Visual representation helps us understand the distribution and separation of data points within different classes, and it is a crucial step in understanding our model's performance.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd


def plot_datasets(train_dataset, test_dataset):
    # Combine datasets into a DataFrame for easier handling
    train_df = pd.DataFrame(train_dataset, columns=["x1", "x2", "class"])
    train_df["set"] = "Train"

    test_df = pd.DataFrame(test_dataset, columns=["x1", "x2", "class"])
    test_df["set"] = "Test"

    combined_df = pd.concat([train_df, test_df], ignore_index=True)

    # Define colors and markers
    colors = {0: "blue", 1: "red"}
    markers = {"Train": "o", "Test": "x"}

    # Plot each group using Matplotlib
    fig, ax = plt.subplots()
    for (dataset, cls), group in combined_df.groupby(["set", "class"]):
        ax.scatter(
            group["x1"],
            group["x2"],
            color=colors[cls],
            label=f"{dataset} Dataset, Class {cls}",
            s=30,
            marker=markers[dataset],
        )

    # Manage legend and labels
    handles, labels = ax.get_legend_handles_labels()
    by_label = dict(zip(labels, handles))
    ax.legend(by_label.values(), by_label.keys(), title="Dataset and Class", loc="best")
    ax.set_xlabel("x1")
    ax.set_ylabel("x2")
    ax.set_title("Training and Test Datasets")
    ax.grid(True)


plot_datasets(dataset_train, dataset_test)
plt.show()

## Defining a Linear Classifier

With our dataset prepared, we now turn to the mathematical foundation that enables our model to classify an input $x_{1}$ and $x_{2}$ as belonging to classes $0$ or $1$. This can be formalized as follows:

$c: \mathbb{N} \to \{0,1\}, \quad c(x_{1}, x_{2}) = \begin{cases} 
1, & \text{if } (x_{1}, x_{2}) \in \text{Class 1}, \\
0, & \text{if } (x_{1}, x_{2}) \in \text{Class 2}.
\end{cases}$

This classification is achieved using the algebraic equation of a line in a Cartesian coordinate system:

$z(x) = w_{1}x_{1} + w_{2}x_{2} + c,$

where:
- $w_{1}$ and $w_{2}$ are the weights determining the slope or the angle of the line relative to the $x$-axis,
- $c$ is the intercept, indicating the point where the line intersects the $y$-axis.

From the plotted graph above, the two classes are linearly separable, making Rosenblatt's Perceptron algorithm suitable for determining the linear separation boundary using weights $w_{1}$ and $w_{2}$.

To visualize, here's how a line defined by $w_{1}=1$, $w_{2}=0.5$, and $c=0$ would look on our earlier plot.

In [None]:
zx = lambda x1, x2, w1, w2, c: w1 * x1 + w2 * x2 + c


def plot_zx(w1, w2, c):
    x2 = lambda x1: (
        (-w1 * x1 - c) / w2 if w2 != 0 else -c / w1 if w1 != 0 else c
    )  # this is because the equation of the line is w1*x1 + w2*x2 + c = 0, hence x2 = (-w1*x1 - c) / w2
    x1_values = range(0, 101)
    x2_values = [x2(x1) for x1 in x1_values]
    plt.plot(x1_values, x2_values, label=f"{w1}x1+{w2}x2+{c}=0")
    plt.legend(loc="best")


def plot_datasets_and_zx(w1, w2, c):
    plot_datasets(dataset_train, dataset_test)
    plot_zx(w1, w2, c)


plot_datasets_and_zx(-1.5, 1.1, -10)

In this configuration, the datapoints are separated neatly. However, there are alternative configurations for $w_{1},\ w_{2},$ and $c$ that result in less ideal classification. For instance, with $w_{1}=0.1,\ w_{2}=0.1,$ and $c=0.5$, the separation boundary is not as effective.

In [None]:
plot_datasets_and_zx(0.1, 0.1, 0.5)

In this particular case, $z$ will not help classify any data points.

## Evaluating the Performance of the Classifier

Now that we have defined $z = w_{1}x_{1} + w_{2}x_{2} + c$ as our classifier's decision boundary and visually confirmed its efficacy with specific values, let's define a computational approach to evaluate its performance using an accuracy metric.

### Defining the Classifier

Before evaluating our classifier's performance, let's define it as follows:

$$
c: \mathbb{N} \times \mathbb{N} \to \{0,1\}, \quad
c(x_{1}, x_{2}) =
\begin{cases} 
1, & \text{if } z(x_{1}, x_{2}) \ge 0, \\
0, & \text{if } z(x_{1}, x_{2}) < 0.
\end{cases}
$$

This means that if a number $x$ is above the decision boundary defined by our function, it will be classified as $1$; otherwise, it will be classified as $0$.

Let's implement the classifier in code and then discuss how to evaluate its performance.

In [None]:
cx = lambda x1, x2, w1, w2, c: 1 if zx(x1, x2, w1, w2, c) >= 0 else 0


def accuracy(dataset, w1, w2, c):
    print(f"Calculating accuracy on training set using w1={w1}, w2={w2}, c={c}")
    correct = 0
    for x1, x2, y in dataset:
        if y == cx(x1, x2, w1, w2, c):
            correct += 1
    print(
        f"Resulting accuracy: {correct}/{len(dataset)}, or {correct/len(dataset)*100:.2f}%"
    )
    return correct / len(dataset)


# Applying the accuracy function to the training set using the two sets of weights and bias as shown above, in the first example
accuracy(dataset_train, -1.5, 1.1, -10)

# Applying the accuracy function to the training set using two sets of weights and bias as shown above, in the second example
accuracy(dataset_train, 0.1, 0.1, 0.5)

### Discussion on Accuracy

Earlier, we observed that different weights and bias values lead to varying accuracy levels. This variation occurs because each set of weights and bias defines a distinct decision boundary, impacting how well they classify the dataset into correct categories. The goal is to find the "optimal" values for these parameters to maximize classification performance. This optimization process is called _model training_.

## Model Training

Armed with the analysis insights, we will now begin training our model. This involves multiple iterations, adjusting the weights and bias values to improve accuracy. The purpose of these iterations is to discover the best combination of parameters that effectively classify our dataset, achieving enhanced accuracy for our classification task.

In [None]:
# First, let's initialize the weights and bias to zero
w1, w2, c = 0, 0, 0

# Let's now define the learning rate
learning_rate = 0.1

# Let's now define the number of epochs
num_epochs = 100

# Create a DataFrame to store the details of the epochs
epoch_details = pd.DataFrame(columns=["epoch", "x1", "x2", "y", "z", "y_hat", "w1", "w2", "c"])

# Let's now start the training loop
for epoch in range(num_epochs):
    print(f"Epoch {epoch+1}")
    for x1, x2, y in dataset_train:
        z = zx(x1, x2, w1, w2, c)
        y_hat = 1 if z >= 0 else 0
        w1 += learning_rate * (y - y_hat) * x1
        w2 += learning_rate * (y - y_hat) * x2
        c += learning_rate * (y - y_hat)
        print(f"  x1={x1}, x2={x2}, y={y}, z={z:.2f}, y_hat={y_hat}, w1={w1:.2f}, w2={w2:.2f}, c={c:.2f}")
        # Append the details to the DataFrame
        epoch_details = epoch_details.concat({
            "epoch": epoch + 1,
            "x1": x1,
            "x2": x2,
            "y": y,
            "z": z,
            "y_hat": y_hat,
            "w1": w1,
            "w2": w2,
            "c": c
        }, ignore_index=True)

# Display the DataFrame
epoch_details.head()