For P2, we've been instructed to implement the perceptron learning rule to the Perceptron that was implemented earlier. The python source code contains this functionality now. All that remains is once again confirming it's functionality through some examples/tests.

This time, these are the requirements:

<ol>
<li>Having a perceptron teach itself to be an AND-gate.</li>
<li>Having a perceptron teach itself to be an XOR-gate.</li>
<li>Having a perceptron teach itself to classify the <i>Iris-dataset</i></li>
</ol>

From now on, I won't be using an "automated test framework", as per the valid criticism of the Teacher's assistant, because that arguably needed to be tested on it's own to be valid.

I will borrow a modification of last assignment's "binary_input_space" function. I'll also import some packages, and the python source code, then I'll work through the required tests one by one.

We'll also set the random seed used for initializing Perceptron with random weights/bias values, so that the results of running this notebook are reproducible. Even though it's only required for the third assignment, I'll set the random seed at my student number.

In [617]:
from typing import List, Tuple

import itertools
import random

random.seed(1792206)

import ml

In [618]:
def binary_input_space(length: int) -> List[Tuple[int]]:
    """Compute all possible binary input combinations with a certain length.

    Args:
        length: length of the combinations.

    Returns:
        All possible binary input combinations of a certain length."""
    return list(itertools.product([0, 1], repeat=length))

<h2>AND-gate</h2>

First, we'll initialize a perceptron with random values for it's (2) weights and bias, and a learning rate of 0.1 (assignment specifications).

In [619]:
p_and = ml.Perceptron.random_instance(weights_amount=2, learning_rate=0.1)

We'll construct the binary input space, and add the expected output for an AND-gate to it.

In [620]:
and_input_space = binary_input_space(2)
and_input_space

[(0, 0), (0, 1), (1, 0), (1, 1)]

In [621]:
and_targets = [a and b for a, b in and_input_space]
and_targets

[0, 0, 0, 1]

Then we'll have the perceptron's learn using the entire dataset, until it's converged. We use the mean squared error as loss-metric, which is a floating-point number. Because floating-point maths can be unreliable, we'll set the threshold after which we assume convergence at a very small, but non-0 number. Because the training dataset consists of only 4 records, the mse cannot reach a value that would be below this value, without the perceptron's learning having converged.

In [622]:
p_and.learn_until_loss(inputs=and_input_space,
                       targets=and_targets,
                       loss_target=0.00000000001)

0.0

We indeed reach a loss of 0. This leaves the parameters of the perceptron at:

In [623]:
p_and

Perceptron: b: -0.7076041591715243. w: [0.5727410173077363, 0.19976015294456012] )

<h2>XOR-gate</h2>

Next, we'll (attempt) to have a perceptron teach itself to function as a XOR-gate. Considering the fact that perceptrons can only correctly solve problems that consist of a (binary) target variable that can be seperated with 1 linear seperation, we know that an entirely correctly configured perceptron could never actually mimic the functionality of a XOR-gate.

We start by initializing another perceptron.

In [624]:
p_xor = ml.Perceptron.random_instance(weights_amount=2, learning_rate=0.1)

And defining the input space and targets of our training set.

In [625]:
xor_input_space = binary_input_space(2)
xor_input_space

[(0, 0), (0, 1), (1, 0), (1, 1)]

In [626]:
xor_targets = [a ^ b for a, b in xor_input_space]
xor_targets

[0, 1, 1, 0]

Like mentioned before, learning until results are pointless is futile. Although we won't limit ourselves by learning until a certain imperfect loss is reached. We'll learn by iterating through the dataset 1000 times, after which if convergence hasn't been reached, we probably never will.

In [627]:
p_xor.learn_iterations(inputs=xor_input_space,
                       targets=xor_targets,
                       iterations=1000)

p_xor.loss(xor_input_space, xor_targets)

0.5

Like expected, the perceptron isn't doing a terribly good job of being a XOR-gate.

We would expect a perceptron to be able to do better. With a single linear separation, creating (something like) an OR-gate should be possible, with would leave us with only a 0.25 MSE, because only 1 output would be wrong. Even though with the right parameters and some luck it might be possible to reach this point, there is no reason to expect this.

This leaves us with the following parameters:

In [628]:
p_xor

Perceptron: b: 0.03999106408891609. w: [-0.1014128753624709, -0.03368069936568971] )

<h2>IRIS-dataset</h2>

The iris dataset is a simple dataset, best described by looking at it.

In [629]:
from sklearn.datasets import load_iris

iris = load_iris(as_frame=True).frame

iris.head(5)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


We've been asked to do 2 things: train a perceptron to distinguish between target variables 0 and 2, and to train a perceptron to distinguish between target variables 1 and 2. We'll create 2 seperate datasets for these assigments. Then we'll split these into their respective inputs and outputs.

In [630]:
iris_i = iris[(iris["target"] == 0) | (iris["target"] == 2)]
iris_ii = iris[(iris["target"] == 1) | (iris["target"] == 2)]

iris_i_target = iris_i["target"].values.tolist()
iris_i_input = iris_i.drop(columns="target").values.tolist()

iris_ii_target = iris_ii["target"].values.tolist()
iris_ii_input = iris_ii.drop(columns="target").values.tolist()

<h4>Iris I: Distinguising between 0 and 2</h4>

The target variable is either 0 or 2. Our perceptron is implemented to work with 0 or 1. Therefore, we need to replace all the 2s with 1s.

In [631]:
iris_i_target = [1 if num == 2 else 0 for num in iris_i_target]

Because we're not sure whether convergence will be reached, we'll initialize a perceptron and train it for 1000 iterations using the entire dataset. Because perceptron

In [632]:
p_iris_i = ml.Perceptron.random_instance(weights_amount=4, learning_rate=0.1)

p_iris_i.learn_iterations(inputs=iris_i_input,
                          targets=iris_i_target,
                          iterations=1000)

p_iris_i.loss(inputs=iris_i_input,
              targets=iris_i_target)

0.0

It appears as though our perceptron is able to reach convergence, which means the classes are linearly separable. This leaves us with the following perceptron:

In [633]:
p_iris_i

Perceptron: b: -0.7617766038964625. w: [0.06113663314870588, -0.9477548793919288, 0.6743390138402527, 1.0258695150659176] )

<h4>Iris II: Distinguishing between 1 and 2</h4>

We'll follow the same procedure as we did before, first renaming our target variable to 0s and 1s, and then training for a certain amount of iterations.

Our target variable consists of 1s and 2s. Therefore, we can subtract one from everything:

In [634]:
iris_ii_target = [num - 1 for num in iris_ii_target]

In [635]:
p_iris_ii = ml.Perceptron.random_instance(weights_amount=4, learning_rate=0.1)

p_iris_ii.learn_iterations(inputs=iris_ii_input,
                           targets=iris_ii_target,
                           iterations=1000)

p_iris_ii.loss(inputs=iris_ii_input,
              targets=iris_ii_target)

0.04

We are still left with a mean squared error of 0.04, which means our perceptron wasn't able to converge. This means the classes are not linearly separable, and a single perceptron can't predict them perfectly.