For P4, we've implemented the backpropagation algorithm  for Sigmoid neurons, which is implemented in the Python source code that we'll import in a minute. All that remains is verifying the implementation to a series of use-cases, from easy to difficult. We'll perform the following tasks through backpropagation-trained fully connected neural networks:

1. Create an AND-gate.
2. Create a XOR-gate.
3. Create a half-adder.
4. Classify the Iris dataset.
5. Classify the Digit dataset.

I won't be using any fancy testing frameworks because I've been steered away from that explicitly.

### 0: Setup.

I'll import my own source code through the ml package, and import some (builtin) libraries. We'll also lock down the random seed, so that we can guarantee reproducible results.

In [129]:
import random
random.seed(2022)

import itertools
from typing import List, Tuple

import ml

I'll also _borrow_ a function from a previous assignment for testing purposes.

In [130]:
def binary_input_space(length: int) -> List[Tuple[int]]:
    """Compute all possible binary input combinations with a certain length.

    Args:
        length: length of the combinations.

    Returns:
        All possible binary input combinations of a certain length."""
    return list(itertools.product([0, 1], repeat=length))

### 1: Creating an AND-gate

Similarly to earlier Perceptron assignments, one Neuron should be enough to create an AND-gate. To show the components a network consists of, I'll initialize this network explicitly without fancy weight initializations.

In [131]:
_and_neuron = ml.OutputNeuron(
    weights=[-1, 1],
    bias=0,
    learning_rate=1,
    activation_function=ml.sigmoid
)

_and_layer = ml.NeuronLayer(
    [_and_neuron]
)

and_network = ml.NeuronNetwork(
    [_and_layer]
)

We'll define some inputs and outputs that the AND-gate would be expected to take in, and return.

In [132]:
binary_2 = binary_input_space(2)
binary_2

[(0, 0), (0, 1), (1, 0), (1, 1)]

In [133]:
and_target = [
    [0],
    [0],
    [0],
    [1]
]

Let's train the network in the simplest way you can imagine: for 1000 epochs.

In [134]:
and_network.train(
    binary_2,
    and_target,
    iterations=1000
)

And review how it did.

In [135]:
for inp, target in zip(binary_2, and_target):
    prediction = and_network.feed_forward(inp)
    print(f"Input: {inp}. Network made prediction {prediction} (rounded {round(prediction[0])}). Should be {target}.")

Input: (0, 0). Network made prediction [0.00023702819264548002] (rounded 0). Should be [0].
Input: (0, 1). Network made prediction [0.05512421225431237] (rounded 0). Should be [0].
Input: (1, 0). Network made prediction [0.05531142025660785] (rounded 0). Should be [0].
Input: (1, 1). Network made prediction [0.9350968562675207] (rounded 1). Should be [1].


We can easily see that the network gave outputs corresponding to an AND-gate for all inputs.

### 2: Creating a XOR-gate.

The XOR-gate tests our implementation of the backpropagation algorithm more, because (just like with the Perceptron networks) this requires using a hidden layer. To showcase the more high-level controls of the ml package, we'll construct a network with random weights through a convenience method. We can also use Xavier initialization here to make sure we don't get stuck on the fading gradient problem.

In [136]:
xor_network = ml.NeuronNetwork.build_network(
    layer_layout=[2, 2, 1],
    learning_rate=1,
    activation_function=ml.sigmoid,
    xavier_initialization=True
)

We can use the same inputs as for the AND-gate, though we do need some new targets:

In [137]:
xor_target = [
    [0],
    [1],
    [1],
    [0]
]

Because the example assignments use this method, and I used those to debug my algorithm and I know it works, we'll also train this network for 1000 epochs:

In [138]:
xor_network.train(
    binary_2,
    xor_target,
    iterations=1000
)

Let's see how it did:

In [139]:
for inp, target in zip(binary_2, xor_target):
    prediction = xor_network.feed_forward(inp)
    print(f"Input: {inp}. Network made prediction {prediction} (rounded {round(prediction[0])}). Should be {target}.")

Input: (0, 0). Network made prediction [0.11130851601286407] (rounded 0). Should be [0].
Input: (0, 1). Network made prediction [0.8274363743744205] (rounded 1). Should be [1].
Input: (1, 0). Network made prediction [0.8172625164665333] (rounded 1). Should be [1].
Input: (1, 1). Network made prediction [0.24237938520746624] (rounded 0). Should be [0].


Again, the network was able to learn how to perform it's task.

### 3: Creating a half adder.

By now, you know the procedure. Like in P1, we'll be using 2 layers: a hidden layer with 3 Neurons, and an output layer with 2 Neurons. We can use the convenience method again:

In [140]:
ha_network = ml.NeuronNetwork.build_network(
    layer_layout=[2, 3, 2],
    learning_rate=1,
    activation_function=ml.sigmoid,
    xavier_initialization=True
)

With the same 2-length binary input space, we use the following target:

In [141]:
ha_target = [
    [0, 0],
    [0, 1],
    [0, 1],
    [1, 0]
]

In [142]:
ha_network.train(
    binary_2,
    ha_target,
    iterations=1000
)

In [143]:
for inp, target in zip(binary_2, ha_target):
    prediction = ha_network.feed_forward(inp)
    rounded_prediction = list(map(round, prediction))
    print(f"Input: {inp}. Network made prediction {prediction} (rounded {rounded_prediction}). Should be {target}.")

Input: (0, 0). Network made prediction [0.00030338392106541123, 0.042582553286565475] (rounded [0, 0]). Should be [0, 0].
Input: (0, 1). Network made prediction [0.027922475243593384, 0.9555367891841755] (rounded [0, 1]). Should be [0, 1].
Input: (1, 0). Network made prediction [0.0269761997382146, 0.9546387092874814] (rounded [0, 1]). Should be [0, 1].
Input: (1, 1). Network made prediction [0.9535109460899509, 0.055095633390166494] (rounded [1, 0]). Should be [1, 0].


Again, the network was able to figure it out.

### 4: Classifying the IRIS-dataset:

Like in assignment 2, we'll be attempting to classify the Iris-dataset. This time however, we have the power of a fully connected network that's able to learn: therefore, we can attempt to classify all classes, instead of between 2 at a time. We'll import SKLearn, but just to import (and gain some insight into) the dataset.

In [144]:
from sklearn.datasets import load_iris

iris = load_iris(as_frame=True).frame

iris.head(5)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


#### Target

In [145]:
iris["target"].unique()

array([0, 1, 2])

The target variable has 3 permutations, which we'll be attempting to classify. This means we'll have 3 neurons in the output layer. We'll one-hot encode these so that we can actually use our network to classify them.

In [146]:
iris_target = [[0, 0, 0] for record in iris["target"]]

for new_target, old_target in zip(iris_target, iris["target"]):
    new_target[old_target] = 1

#### Features

In [147]:
iris.dtypes

sepal length (cm)    float64
sepal width (cm)     float64
petal length (cm)    float64
petal width (cm)     float64
target                 int32
dtype: object

In [148]:
iris_features_df = iris.drop(columns="target")
iris_features_df.head(2)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2


We have 4 numerical inputs, which are already the proper datatype, which we can directly use. For good measure, we'll turn these into Python lists.

In [149]:
iris_features = []
for _, record in iris_features_df.iterrows():
    iris_features.append(tuple(val for val in record))

#### Train/Test split

To make sure we don't over fit our network, we can split the dataset into training and testing sets. I'm still not quite sure whether we're allowed to use external modules for things like this, therefore, we'll be using the random module manually for this purpose.

First, we combine the features and target, and shuffle that list.

In [150]:
iris_combined_list = list(zip(iris_features, iris_target))
random.shuffle(iris_combined_list)

Then we split the features and target again.

In [151]:
iris_features, iris_target = zip(*iris_combined_list)
iris_features, iris_target = list(iris_features), list(iris_target)

Having randomly shuffled the features/targets, we can split these into a training and testing set. How many datapoints do we have?

In [152]:
iris.shape

(150, 5)

150 records. Using 120 for training and 30 for testing seems like a good proportion.

In [153]:
_split_i = 30

iris_features_train = iris_features[_split_i:]
iris_features_test = iris_features[:_split_i]

iris_target_train = iris_target[_split_i:]
iris_target_test = iris_target[:_split_i]



#### Creating the network

Like mentioned before, we have 4 inputs, and 3 outputs. Considering the Perceptrons in P2 were almost able to classify between individual classes, I don't expect we'll need a terribly complicated network. Therefore, we'll be using the following architecture:

In [154]:
iris_network = ml.NeuronNetwork.build_network(
    [4, 8, 8, 3],
    0.01,
    ml.sigmoid,
    xavier_initialization=True
)

iris_network.train(
    iris_features_train,
    iris_target_train,
    iterations=1000
)

In [155]:
iris_outputs = [iris_network.feed_forward(inp) for inp in iris_features_test]

Ofcourse, our Network still outputs floats. We'll round the outputs to actually interpret the predictions, and view the accuracy over the test-set.

In [156]:
iris_outputs = list(map(ml.round_iterable, iris_outputs))

ml.accuracy(iris_outputs, iris_target_test)

1.0

We have a 100% accuracy, which is not entirely unusual.

### Classifying the Digit dataset.

The Digit dataset, similar to the MNIST dataset, has handwritten digits. The Digit dataset has 8x8 images however. We can import it through SKLearn.

In [157]:
from sklearn.datasets import load_digits

digits = load_digits(as_frame=True).frame
digits.head(5)

Unnamed: 0,pixel_0_0,pixel_0_1,pixel_0_2,pixel_0_3,pixel_0_4,pixel_0_5,pixel_0_6,pixel_0_7,pixel_1_0,pixel_1_1,...,pixel_6_7,pixel_7_0,pixel_7_1,pixel_7_2,pixel_7_3,pixel_7_4,pixel_7_5,pixel_7_6,pixel_7_7,target
0,0.0,0.0,5.0,13.0,9.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,6.0,13.0,10.0,0.0,0.0,0.0,0
1,0.0,0.0,0.0,12.0,13.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,11.0,16.0,10.0,0.0,0.0,1
2,0.0,0.0,0.0,4.0,15.0,12.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,3.0,11.0,16.0,9.0,0.0,2
3,0.0,0.0,7.0,15.0,13.0,1.0,0.0,0.0,0.0,8.0,...,0.0,0.0,0.0,7.0,13.0,13.0,9.0,0.0,0.0,3
4,0.0,0.0,0.0,1.0,11.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,2.0,16.0,4.0,0.0,0.0,4


#### Target

In [158]:
digits_unique = digits["target"].unique()
amount_of_digits = digits_unique.shape[0]
digits_unique

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

There are 10 different digits to be classified, which means our network will have an output layer consisting of 10 neurons. Like with the iris dataset, we'll one-hot encode these, so that the network treats these as being ordinal.

In [159]:
digit_target = [[0 for _ in range(amount_of_digits)] for _ in digits["target"]]

for new_target, old_target in zip(digit_target, digits["target"]):
    new_target[old_target] = 1

#### Features


In [160]:
digits.dtypes

pixel_0_0    float64
pixel_0_1    float64
pixel_0_2    float64
pixel_0_3    float64
pixel_0_4    float64
              ...   
pixel_7_4    float64
pixel_7_5    float64
pixel_7_6    float64
pixel_7_7    float64
target         int32
Length: 65, dtype: object

The datatypes are already correct again. Let's convert these features to Python native types, because that is what the ml package was developed for.

In [161]:
digit_features_df = digits.drop(columns="target")
digit_features = []
for _, record in digit_features_df.iterrows():
    digit_features.append(tuple(val for val in record))

#### Train/Test split

Even though the assigment doesn't specify it, I will split the dataset into seperate training and testing datasets, because there's just nothing interesting about an overfitted network. We'll go through the same procedure as with the iris dataset, so I won't need to ellaborate in the individual steps much.

In [162]:
digit_combined_list = list(zip(digit_features, digit_target))
random.shuffle(digit_combined_list)

In [163]:
digit_features, digit_target = zip(*digit_combined_list)
digit_features, digit_target = list(digit_features), list(digit_target)

In [164]:
digits.shape

(1797, 65)

We'll use a bit more then 20% as test set, so we'll take 400 records.

In [165]:
_split_i = 400

digit_features_train = digit_features[_split_i:]
digit_features_test = digit_features[:_split_i]

digit_target_train = digit_target[_split_i:]
digit_target_test = digit_target[:_split_i]

#### Building the network

Classifying digits is a more complicated task then classifying the earlier iris dataset. Therefore, I'll be using a more extensive network. The network has 8 x 8 = 64 inputs, a hidden layer with 64 neurons, a hidden layer with 20 neurons, and an output layer with 10 neurons, representing all one-hot encoded targets. Once again, we'll be using xavier initialization.

In [166]:
digit_network = ml.NeuronNetwork.build_network(
    [64, 64, 20, 10],
    0.1,
    ml.sigmoid,
    xavier_initialization=True
)

digit_network.train(
    digit_features_train,
    digit_target_train,
    iterations=100
)

Training the network took approximately. 15 minutes. Once again, we'll be using accuracy to review the network's performance on the test-dataset. Let's feed it to the network.

In [167]:
digit_outputs = [digit_network.feed_forward(inp) for inp in digit_features_test]

We'll round the outputs once again, and fetch the accuracy.

In [168]:
digit_outputs = list(map(ml.round_iterable, digit_outputs))

ml.accuracy(digit_outputs, digit_target_test)

0.965

96.5%, that's wonderful! For my own curiosity: I wonder what accuracy it would get on the training dataset.

In [170]:
digit_train_outputs = [digit_network.feed_forward(inp) for inp in digit_features_train]

digit_train_outputs = list(map(ml.round_iterable, digit_train_outputs))

ml.accuracy(digit_train_outputs, digit_target_train)

0.9992841803865425

99%. This could mean that the network remembers (almost) all training examples, although we can't be sure. The network does have significantly less neurons then there are training examples, but who knows.

This could be explored by checking the accuracy, or a loss metric, on both the train and test datasets after every n epochs, and seeing whether the test dataset performance stagnates or worsens as the train dataset performance keeps increasing, but that's not quite in scope of this assignment.