# Simple Perceptron Implementation

_by Mihai Dan Nadăș (mihai.nadas@ubbcluj.ro), January 2025_

This notebook implements a version of the perceptron inspired by Frank Rosenblatt's 1958 paper, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain".

We aim to introduce basic math concepts and avoid the complexities of linear algebra as much as possible.

## Objective

This notebook's goal is to train a model using two weights $w_{1},\ w_{2}$, corresponding to the coordinates $x,\ y$ of a point $(x,\ y)$, and a bias $b$. This setup is analogous to the standard line equation $y = mx + c$, where we focus on a simple classification task. 

We will work with a linearly separable dataset generated by this rule:

$
f: \mathbb{N} \to \mathbb{N}, \quad f(x) =
\begin{cases}
x & \text{if } x \bmod 2 = 0, \\
2x & \text{if } x \bmod 2 = 1.
\end{cases}
$

In simpler terms, for even numbers, the function outputs the same number. For odd numbers, it doubles them.

## Dataset

We will begin by generating a synthetic dataset using the Python Standard Library. This dataset will help us train and test our perceptron model.

In [None]:
import random


def generate_dataset(num_items=20, start=0, stop=100):
    random.seed(42)
    dataset = []
    x1_values = set()
    while len(dataset) < num_items:
        x1 = random.randint(start, stop)
        if x1 in x1_values:
            continue
        x1_values.add(x1)
        x2 = x1 if x1 % 2 == 0 else 2 * x1
        y = (
            0 if x1 == x2 else 1
        )  # (x1, x2) is labeled as Class 0 if x1 is even, and Class 1 otherwise
        dataset.append((x1, x2, y))
    return dataset


dataset = generate_dataset()

# let's now split the dataset into training and test sets
train_ratio = 0.8
num_train = int(len(dataset) * train_ratio)
dataset_train, dataset_test = dataset[:num_train], dataset[num_train:]
print(f"Training set (n={len(dataset_train)}): {dataset_train}")
print(f"Test set (n={len(dataset_test)}: {dataset_test}")

## Visual Representation

Let's create a visual representation of the training and test datasets using _Matplotlib_ and _pandas_. Visualizations help us better understand how our data is distributed and identify any apparent patterns. We will use different colors and markers to differentiate between the classes and datasets. 

This visualization will provide an intuitive understanding of how our datasets look and offer insight into any linear separation between the classes.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd


def plot_datasets(train_dataset, test_dataset):
    # Combine datasets into a DataFrame for easier handling
    train_df = pd.DataFrame(train_dataset, columns=["x1", "x2", "class"])
    train_df["set"] = "Train"

    test_df = pd.DataFrame(test_dataset, columns=["x1", "x2", "class"])
    test_df["set"] = "Test"

    combined_df = pd.concat([train_df, test_df], ignore_index=True)

    # Define colors and markers
    colors = {0: "blue", 1: "red"}
    markers = {"Train": "o", "Test": "x"}

    # Plot each group using Matplotlib
    fig, ax = plt.subplots()
    for (dataset, cls), group in combined_df.groupby(["set", "class"]):
        ax.scatter(
            group["x1"],
            group["x2"],
            color=colors[cls],
            label=f"{dataset} Dataset, Class {cls}",
            s=30,
            marker=markers[dataset],
        )

    # Manage legend and labels
    handles, labels = ax.get_legend_handles_labels()
    by_label = dict(zip(labels, handles))
    ax.legend(by_label.values(), by_label.keys(), title="Dataset and Class", loc="best")
    ax.set_xlabel("x1")
    ax.set_ylabel("x2")
    ax.set_title("Training and Test Datasets")
    ax.grid(True)


plot_datasets(dataset_train, dataset_test)
plt.show()

## Understanding Linear Classification

Now that we have our dataset ready, let's explore how our model can decide if a point $(x_{1}, x_{2})$ belongs to class 0 or class 1. Picture this as a decision-making process, much like how you might decide to wear a raincoat based on the weather forecast.

Given a point $(x_1, x_2)$, our task is to determine its class using a simple rule:

**Classification Criteria:**

If the point falls in a specific region on the graph (like the right side of a line), it belongs to one class; otherwise, it belongs to another.

### Mathematical Representation

The decision boundary is expressed as a line, using the formula:

$$
z(x) = w_{1}x_{1} + w_{2}x_{2} + c
$$

- **Weights ($w_{1}$ and $w_{2}$):** Imagine adjusting these like tuning a radio. They change the line's tilt or angle on the graph.
- **Bias ($c$):** Think of this as the shifting knob. It moves the line up or down.

### Visualizing Our Method

When visualizing, the line effectively slices the graph into two sections. If one side of the line is like being under an umbrella, any points there might belong to class 1; the other side belongs to class 0.

**Example:** Look at how this line behaves when $w_{1}=1, w_{2}=0.5,$ and $c=0$. Can you imagine how it sits on our graph to separate the points?

### Simplifying Our Goal

Ultimately, what we're doing is drawing lines to separate different colored points on our plot effectively—a bit like sorting apples from oranges based on their color.

Understanding this setup helps lay the groundwork for grasping more complex AI models—but for now, think of it as an art of line drawing that helps our computer "see" which category each point belongs to, much like label tagging.

In [None]:
zx = lambda x1, x2, w1, w2, c: w1 * x1 + w2 * x2 + c


def plot_zx(w1, w2, c):
    x2 = lambda x1: (
        (-w1 * x1 - c) / w2 if w2 != 0 else -c / w1 if w1 != 0 else c
    )  # this is because the equation of the line is w1*x1 + w2*x2 + c = 0, hence x2 = (-w1*x1 - c) / w2
    x1_values = range(0, 101)
    x2_values = [x2(x1) for x1 in x1_values]
    plt.plot(x1_values, x2_values, label=f"{w1}x1+{w2}x2+{c}=0")
    plt.legend(loc="best")


def plot_datasets_and_zx(w1, w2, c):
    plot_datasets(dataset_train, dataset_test)
    plot_zx(w1, w2, c)


plot_datasets_and_zx(-1.5, 1.1, -10)

In this configuration, the data points are clearly separated by the line. There are, however, different ways to choose $w_{1},\ w_{2},$ and $c$ that result in less effective separations. For instance, when $w_{1}=0.1,\ w_{2}=0.1,$ and $c=0.5$, the decision boundary becomes less efficient, demonstrating how different configurations can impact the classification results.

In [None]:
plot_datasets_and_zx(0.1, 0.1, 0.5)

In this particular case, $z$ will not help classify any data points.

## Evaluating the Performance of the Classifier

Now that we have defined $z = w_{1}x_{1} + w_{2}x_{2} + c$ as our classifier's decision boundary, and visually confirmed its effectiveness for some chosen parameter values, let's develop a computational approach to measure its performance using the concept of _accuracy_.

### Defining the Classifier

Before assessing our classifier's performance, we need to define it mathematically:

$
c: \mathbb{N} \to \{0,1\}, \quad
c(x_{1},x_{2}) =
\begin{cases} 
1, & \text{if } z(x_{1},x_{2}) \geq 0, \\
0, & \text{if } z(x_{1},x_{2}) < 0.
\end{cases}
$

This classification rule implies that if the calculated value $z(x_{1}, x_{2})$ is greater than or equal to zero, the input is classified as $1$; otherwise, it is classified as $0$.

Let's move on to implement the classifier in code and subsequently discuss how to evaluate its performance.

In [None]:
cx = lambda x1, x2, w1, w2, c: 1 if zx(x1, x2, w1, w2, c) >= 0 else 0


def accuracy(dataset, w1, w2, c):
    print(f"Calculating accuracy on training set using w1={w1}, w2={w2}, c={c}")
    correct = 0
    for x1, x2, y in dataset:
        if y == cx(x1, x2, w1, w2, c):
            correct += 1
    print(
        f"Resulting accuracy: {correct}/{len(dataset)}, or {correct/len(dataset)*100:.2f}%"
    )
    return correct / len(dataset)


# Applying the accuracy function to the training set using the two sets of weights and bias as shown above, in the first example
accuracy(dataset_train, -1.5, 1.1, -10)

# Applying the accuracy function to the training set using two sets of weights and bias as shown above, in the second example
accuracy(dataset_train, 0.1, 0.1, 0.5)

### Understanding Accuracy

We've seen that changing weights and bias can lead to varying accuracy results. This happens because different weights and bias define distinct decision boundaries, which may classify the dataset more or less effectively. The key challenge is to identify the "optimal" set of parameters that best separates the data into its classes. This is achieved through a process called _model training_.

## Model Training

Now, we will apply what we've learned to train our perceptron model. This involves iteratively adjusting the weights and bias to improve the accuracy. Through this process, we aim to discover the combination of parameters that most effectively categorizes our dataset.

In [None]:
# First, let's initialize the weights and bias to zero
w1, w2, c = 0, 0, 0

# Let's now define the learning rate
learning_rate = 0.1

# Let's now define the number of epochs
num_epochs = 100

# Create a DataFrame to store the details of the epochs
epoch_details = pd.DataFrame(columns=["epoch", "x1", "x2", "y", "z", "y_hat", "w1", "w2", "c"])

# Let's now start the training loop
for epoch in range(num_epochs):
    print(f"Epoch {epoch+1}")
    for x1, x2, y in dataset_train:
        z = zx(x1, x2, w1, w2, c)
        y_hat = 1 if z >= 0 else 0
        w1 += learning_rate * (y - y_hat) * x1
        w2 += learning_rate * (y - y_hat) * x2
        c += learning_rate * (y - y_hat)
        print(f"  x1={x1}, x2={x2}, y={y}, z={z:.2f}, y_hat={y_hat}, w1={w1:.2f}, w2={w2:.2f}, c={c:.2f}")
        # Append the details to the DataFrame
        epoch_details = epoch_details.concat({
            "epoch": epoch + 1,
            "x1": x1,
            "x2": x2,
            "y": y,
            "z": z,
            "y_hat": y_hat,
            "w1": w1,
            "w2": w2,
            "c": c
        }, ignore_index=True)

# Display the DataFrame
epoch_details.head()