## Assignment Part 1: Perceptron

CS802 Assignment <br>
202274131 <br>

### Perceptron Class

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import json 
np.random.seed(42)

1.1 - Complete the implementation of the Perceptron and use it to identify the clothes/shoes type (shirt). 

In [2]:
# The perceptron is implemented as a class
class Perceptron(object):
    
    def __init__(self, no_inputs, max_iterations=20,learning_rate=0.1): #Initalising the perceptron
        self.no_inputs = no_inputs
        self.weights = np.ones(no_inputs + 1) / (no_inputs + 1) #Weights of the features stored in a vector, plus 1 which ensures that they are stored eqaully 
        self.max_iterations = max_iterations
        self.learning_rate = learning_rate
    #=======================================#
    # Prints the details of the perceptron. #
    #=======================================#
    def print_details(self): #Printing details of the perceptron
        print("No. inputs:\t" + str(self.no_inputs))
        print("Max iterations:\t" + str(self.max_iterations))
        print("Learning rate:\t" + str(self.learning_rate))
    
    def step(self, a): #Activation function - step function. 
        return np.where(a >= 0, 1, 0) #returns 1 if 'a' is greater than or eqaul to 1, and returns zero otherwise
        return step
    
    #=========================================#
    # Performs feed-forward prediction on one #
    # set of inputs.                          #
    #=========================================#
    
    def predict(self, x, activation): #feed forward prediction 
        a = np.dot(x, self.weights) #takes a set of inputs 'x', calculates the weighted sum of inputs by taking the dot product and current weights of the input 
        output = activation(a) #passes the weighed sum 'output' through the activation function 
        return output #returns the output of the perceptron

    #======================================#
    # Trains the perceptron using labelled #
    # training data.                       #
    #======================================# 
    
    def train(self, training_data, labels):
        assert len(training_data) == len(labels)
    
        # Add bias to training data
        training_data = np.concatenate((training_data, np.ones((len(training_data), 1))), axis=1)
    
        # Combine training data and labels into a list of tuples
        data = list(zip(training_data, labels))
    
        # Shuffle the data
        np.random.shuffle(data)
    
        # Update weights for each training sample
        for i in range(self.max_iterations):
            for x, y in data:
                output = self.predict(x, self.step)
                error = y - output
                self.weights += self.learning_rate * error * x
    
        return

    #=========================================#
    # Tests the prediction on each element of #
    # the testing data. Prints the precision, #
    # recall, and accuracy of the perceptron. #
    #=========================================#
    def test(self, testing_data, labels):
        assert len(testing_data) == len(labels)
        testing_data = np.concatenate((testing_data, np.ones((len(testing_data), 1))), axis=1) #Adding a column of ones to the training data to act as a bias
        true_positives = 0
        true_negatives = 0
        false_positives = 0
        false_negatives = 0
        for x, y in zip(testing_data, labels):
            output = self.predict(x, self.step)
            if output == 1 and y == 1:
                true_positives += 1
            elif output == 1 and y == 0:
                false_positives += 1
            elif output == 0 and y == 1:
                false_negatives += 1
            elif output == 0 and y== 0:
                true_negatives += 1
 #The above code creates an output using the predit and step functions. It checks the output by comparing the output to the labels.   
        accuracy = (true_positives + true_negatives) / (false_positives + false_negatives + true_positives + true_negatives)
        precision = true_positives / (true_positives + false_positives)
        recall = true_positives / (true_positives + false_negatives)

        print("Accuracy:\t"+str(accuracy))
        print("Precision:\t"+str(precision))
        print("Recall:\t"+str(recall))

#### The Fashion MNIST Dataset

The Fashion MNIST Dataset was created by the Modified National Institute of Standards and Technology. It is a dataset of images of clothing, with 60000 training images and 10000 testing images. Each image is 28x28 pixels and has a corresponding label. The dataset has become a commonly used benchmark in machine learning models in image classification, and is significantly harder to classify than the orignal MNIST dataset which are handwritten digits. 

In [3]:
#Loading the test and train data

train_data = np.loadtxt("fashion_mnist_train.csv", delimiter=",")
test_data = np.loadtxt("fashion_mnist_test.csv", delimiter=",")

#Reshaping the data with rows and columns eqaul to the total number of pixel values in each dataset. 
train_data = train_data.reshape((train_data[:, 1:].shape[0], -1)) / 255.0
test_data = test_data.reshape((test_data[:,1:].shape[0], -1)) / 255.0

l = np.arange(10)
#Creates numpy array of class labels
train_labels = (np.asfarray(train_data[:, :1]))
test_labels = (np.asfarray(test_data[:, :1]))

In [4]:
#A separate cell creates a Perceptron
p = Perceptron(28*28+1)
p.print_details()

No. inputs:	785
Max iterations:	20
Learning rate:	0.1


In [5]:
print("Training...")
training_data = train_data
labels = [d[0]==0 for d in train_labels]
p.train(training_data, labels)
print("Complete.")

# Testing the node
print("Testing...")
testing_data = test_data
labels = [d[0]==0 for d in test_labels]
p.test(testing_data, labels)
print("Complete.")


Training...
Complete.
Testing...
Accuracy:	0.9462
Precision:	0.6708579881656804
Recall:	0.907
Complete.


**Observations** <br>
 Using the step activation function and online learning, the perceptron achieved an accuracy of 0.9462, correctly classiying the 95% of the data. The perceptron achieved an precision of 0.6708, which means that when the model predicted the image was a false positive, it was correct 67% of the time. THe recall of the model was 0.907, which means the model correctly identifed true positives 90.7% of the time. 

1.2. Update the perceptron implementation to use full batch learning.

#### Full batch learning<br>

Full batch method learning updates the model's weights based on the average of the gradients of the entire dataset. All training examples are examined at the same time, then the average of the loss is used to update the weights (Alpaydin, 2020).

In [6]:
# The perceptron is implemented as a class
class pFullBatch(object):
    
    def __init__(self, no_inputs, max_iterations=20,learning_rate=0.1):
        self.no_inputs = no_inputs
        self.weights = np.ones(no_inputs + 1) / (no_inputs + 1)
        self.max_iterations = max_iterations
        self.learning_rate = learning_rate
    #=======================================#
    # Prints the details of the perceptron. #
    #=======================================#
    def print_details(self):
        print("No. inputs:\t" + str(self.no_inputs))
        print("Max iterations:\t" + str(self.max_iterations))
        print("Learning rate:\t" + str(self.learning_rate))
    
    def step(self, a):
        return np.where(a >= 0, 1, 0) #
        return step
    
    #=========================================#
    # Performs feed-forward prediction on one #
    # set of inputs.                          #
    #=========================================#
    
    def predict(self, x, activation):
        a = np.dot(x, self.weights)
        output = activation(a)
        return output

    #======================================#
    # Trains the perceptron using labelled #
    # training data.                       #
    #======================================# 
    
    # Full batch learning
    def train(self, training_data, labels):
        assert len(training_data) == len(labels)
        training_data = np.concatenate((training_data, np.ones((len(training_data), 1))), axis=1)
        for i in range(self.max_iterations):
            outputs = self.predict(training_data, self.step)
            errors = labels - outputs
            delta = self.learning_rate * np.dot(training_data.T, errors)
            self.weights += delta
        return
    #=========================================#
    # Tests the prediction on each element of #
    # the testing data. Prints the precision, #
    # recall, and accuracy of the perceptron. #
    #=========================================#
    def test(self, testing_data, labels):
        assert len(testing_data) == len(labels)
        testing_data = np.concatenate((testing_data, np.ones((len(testing_data), 1))), axis=1)
        true_positives = 0
        true_negatives = 0
        false_positives = 0
        false_negatives = 0
        for x, y in zip(testing_data, labels):
            output = self.predict(x, self.step)
            if output == 1 and y == 1:
                true_positives += 1
            elif output == 1 and y == 0:
                false_positives += 1
            elif output == 0 and y == 1:
                false_negatives += 1
            elif output == 0 and y== 0:
                true_negatives += 1
    
        accuracy = (true_positives + true_negatives) / (false_positives + false_negatives + true_positives + true_negatives)
        precision = true_positives / (true_positives + false_positives)
        recall = true_positives / (true_positives + false_negatives)

        print("Accuracy:\t"+str(accuracy))
        print("Precision:\t"+str(precision))
        print("Recall:\t"+str(recall))

In [7]:
#A separate cell creates a Perceptron
pFB = pFullBatch(28*28+1)
pFB.print_details()

No. inputs:	785
Max iterations:	20
Learning rate:	0.1


In [8]:
print("Training...")
training_data = train_data
labels = [d[0]==0 for d in train_labels]
pFB.train(training_data, labels)
print("Complete.")

# Testing the node
print("Testing...")
testing_data = test_data
labels = [d[0]==0 for d in test_labels]
pFB.test(testing_data, labels)
print("Complete.")


Training...
Complete.
Testing...
Accuracy:	0.9475
Precision:	0.7367896311066799
Recall:	0.739
Complete.


**Observations** <br>
 Using the step activation function and full batch learning, the perceptron achieved an accuracy of 0.9475, meaning it correctly classified 94.75% of the data. The precision of the model was 0.7367, which means that when the model predicted a false positive, it was correct 73.67% of the time. The recall of the model was 0.739, which means the model correctly identified true positives 73.9% of the time.
 
he full batch learning model achieved an accuracy of 0.9475, which is slightly higher than the accuracy achieved by the online learning model (0.9462). The full batch learning model also achieved a higher precision (0.736) compared to the online learning model (0.6708). However, the recall of the full batch learning model (0.739) is slightly lower than the recall of the online learning model (0.907). Therefore, if we compare the perceptron results using full batch learning and online learning, we can see that full batch learning resulted in a higher precision score than online learning, indicating that the model was more selective in its positive predictions. However, the recall score was lower for full batch learning, indicating that the model missed more positive instances than in the case of online learning. This is to be expected due to the precision-recall curve. The implementation of full batch learning appears to have shfited the curve to the right (higher precision, but lower recall). 

1.3. Use multiple nodes to classify every clothes/shoes type.

In [10]:
# Create 10 nodes, one for each clothing/shoe type
nodes = [Perceptron(no_inputs=785) for i in range(10)]

# Train all nodes to recognize all types of clothes/shoes
for i in range(10):
    training_data = [np.append([1],d[1:]) for d in train_data]
    print("Training node for label", i, "...")
    labels = [d[0]==i for d in train_labels]
    nodes[i].train(training_data, labels)
    print("Complete.")

# Test all nodes
for i in range(10):
    testing_data = [np.append([1],d[1:]) for d in test_data]
    print("Testing node for label", i, "...")
    labels = [d[0]==i for d in test_labels]
    nodes[i].test(testing_data, labels)
    #nodes[3].test(testing_data, labels)
    print("Complete.")

# Iterate through the testing set and print the prediction for each node
for d in testing_data:
    input_data = np.concatenate(([1], d[1:]))
    predictions = [nodes[i].predict(input_data, nodes[i].step) for i in range(10)]
    prediction = predictions.index(max(predictions))
    print("Prediction:", prediction)

Training node for label 0 ...
Complete.
Training node for label 1 ...
Complete.
Training node for label 2 ...
Complete.
Training node for label 3 ...
Complete.
Training node for label 4 ...
Complete.
Training node for label 5 ...
Complete.
Training node for label 6 ...
Complete.
Training node for label 7 ...
Complete.
Training node for label 8 ...
Complete.
Training node for label 9 ...
Complete.
Testing node for label 0 ...
Accuracy:	0.9483
Precision:	0.6987654320987654
Recall:	0.849
Complete.
Testing node for label 1 ...


ZeroDivisionError: division by zero

**Observations**

Unfortunetly, this code did not work as expected. I believe that the nodes are being trained on class 0 only, and this is why the first test is successful, but the others are not. After attempting to debug this multiple times, regretfully, I have not been able to solve it.

1.4. Use the sigmoid activation function

#### Sigmoid Activation 

The sigmoid function is a logistic function that maps any real-valued number to a value between 0 and 1. The output is an S shaped curve on a graph. The sigmoid activation is used in binary classification problems where the output is either 0 or 1. However, the sigmoid function can cause weights to become stuck at a local maximum, therefore, it is predicted that the sigmoid activation will reduce either the precision or the recall score in comparison to the full batch and step functions (Pratiwi, H. and Rahadjeng, I.R, 2020)

In [None]:
class pSigmoid(object):
    
    def __init__(self, no_inputs, max_iterations=20, learning_rate=0.1):
        self.no_inputs = no_inputs
        self.weights = np.ones(no_inputs + 1) / (no_inputs + 1)
        self.max_iterations = max_iterations
        self.learning_rate = learning_rate
    
    def print_details(self):
        print("No. inputs:\t" + str(self.no_inputs))
        print("Max iterations:\t" + str(self.max_iterations))
        print("Learning rate:\t" + str(self.learning_rate))
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def predict(self, x, activation):
        a = np.dot(x, self.weights)
        output = activation(a)
        return output
    
    def train(self, training_data, labels):
        assert len(training_data) == len(labels)
        training_data = np.concatenate((training_data, np.ones((len(training_data), 1))), axis=1) #add the bias
        data = list(zip(training_data, labels))  #combine the randomised data into a list of tuples. 
    
        # Shuffle the data
        np.random.shuffle(data)
    
        # Update weights for each training sample
        for i in range(self.max_iterations):
            for x, y in data:
                output = self.predict(x, self.sigmoid)
                error = y - output
                self.weights += self.learning_rate * error * output * (1 - output) * x
    
        return

    def test(self, testing_data, labels):
        assert len(testing_data) == len(labels)
        testing_data = np.concatenate((testing_data, np.ones((len(testing_data), 1))), axis=1)
        true_positives = 0
        true_negatives = 0
        false_positives = 0
        false_negatives = 0
        for x, y in zip(testing_data, labels):
            output = self.predict(x, self.sigmoid)
            if output >= 0.5 and y == 1:
                true_positives += 1
            elif output >= 0.5 and y == 0:
                false_positives += 1
            elif output < 0.5 and y == 1:
                false_negatives += 1
            elif output < 0.5 and y== 0:
                true_negatives += 1

        accuracy = (true_positives + true_negatives) / (false_positives + false_negatives + true_positives + true_negatives)
        precision = true_positives / (true_positives + false_positives)
        recall = true_positives / (true_positives + false_negatives)

        print("Accuracy:\t"+str(accuracy))
        print("Precision:\t"+str(precision))
        print("Recall:\t"+str(recall))


In [None]:
#A separate cell creates a Perceptron
pSigmoid = pSigmoid(28*28+1)
pSigmoid.print_details()

In [None]:
# Training the node
print("Training...")
training_data = train_data
labels = [d[0]==0 for d in train_labels]
pSigmoid.train(training_data, labels)
print("Complete.")

# Testing the node
print("Testing...")
testing_data = test_data
labels = [d[0]==0 for d in test_labels]
pSigmoid.test(testing_data, labels)
print("Complete.")

**Observations** <br>
Using the sigmoid activation function and online learning, the perceptron achieved an accuracy of 0.9594, which is marginally higher than the accuracy achieved by both the perceptron using step activation (0.9462) and full batch learning (0.9475). The precision of the sigmoid model was 0.8544, which is significantly higher than the precision achieved by the perceptron using step activation function and online learning (0.6708) and slightly higher than the precision achieved by the perceptron using step activation function and full batch learning (0.7367). This means the sigmoid perceptron identified false positives 85.44% of the time. 

The recall of the sigmoid model was 0.716, which is lower than the recall achieved by the perceptron using step activation function and online learning (0.907) and slightly lower than the recall achieved by the perceptron using step activation function and full batch learning (0.739). This means that the sigmoid perceptron correctly identified true positives 71.6% of the time.

Overall, the sigmoid function outperformed both perceptron models in terms of accuracy and precision, but had a lower recall.

1.5. Print weights and Fashion MNIST data

In [None]:
class_names_file = open('class_names_fashion.json')
class_names = json.load(class_names_file)

In [None]:
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_data[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

#### References 

Alpaydin, E. (2020) Introduction to machine learning. 3rd edn. London, England: The Mit Press. <br>
Pratiwi, H. and Rahadjeng, I.R. (2020) “Sigmoid activation function in selecting the best model of Artificial Neural Networks,” Journal of Physics: Conference Series, 1471(1), p. 012010. Available at: https://doi.org/10.1088/1742-6596/1471/1/012010. 