# Introduction to neural nets

The input is a set of feature vectors and the known answers for each vector - targets. Neural nets take input data that is observable, recordable, and by extension knowable and creates a logical model that can predict any future data.

The neural network is simply a set of one or more weights which can be multiplied by the input vector to output a prediction or a class label. Thus, neural nets are good __both for the prediction and for the classification__. 

Is the prediction always correct? No, the network can make mistakes. But it can learn from its mistakes.<br>

How does the network learn?<br>

Trial and error! First, it tries to make a prediction. Then, it sees whether it was too high or too low. Finally, it changes the weight (up or down) to predict more
accurately the next time it sees the same input. The algorithm consists of several iterations (*epochs) of **predict-compare-learn**. 

## 1. Perceptron

We will demonstrate this idea with the simplest possible neural network which contains a single processing neuron. This self-learning network is called a *perceptron*. 

A feature vector is presented to this neuron. For each feature there is a weight, and the network predicts a target variable by multiplying each feature value by its weight and producing a weighted sum. 

For a simple case of binary classification, we will use an activation function *sign*: if the weighted sum is positive the neuron predicts class yes(1), and if the sum is negative it predicts no(0).

<figure>
    <img src="images/perceptron.png" title="Simple Perceptron for 2D feature vector" width="400px">
    <figcaption>Fig 1. Simple perceptron for classifying a vector of 2 features. Tranforms a weighted sum of features into a class using the <i>sign</i> function.</figcaption>
</figure>

## 1.1. Implementation of a simple Perceptron

In [73]:
import numpy as np


class Perceptron:
    """ Basic Perceptron"""

    def __init__(self, inputs, targets):
        """ Constructor - setups dimensions and initializes weights"""
        # Set up network size
        if np.ndim(inputs) > 1:
            self.nIn = np.shape(inputs)[1]
        else:
            self.nIn = 1

        if np.ndim(targets) > 1:
            self.nOut = np.shape(targets)[1]
        else:
            self.nOut = 1

        self.nData = np.shape(inputs)[0]

        # Initialise network weights - random guess
        self.weights = np.random.rand(self.nIn + 1, self.nOut) * 0.1 - 0.05

        
    # Use this to predict target for a given feature vector 'point'
    def predict(self, point):
        input_with_bias = np.concatenate((point, -np.ones((1, 1))), axis=1)
        activations = self.forward(input_with_bias)
        print("Prediction for input {} is {}:".format(point, activations))

        
    # Train network
    def train(self, inputs, targets, eta, nIterations):
        """ Train the thing """
        # Add one bias node
        inputs = np.concatenate((inputs, -np.ones((self.nData, 1))), axis=1)

        # Training: activation function - sign       
        activations = self.forward(inputs)
        for n in range(nIterations):
            print("Iteration: ", n+1)
            print("Current weights:", self.weights.tolist())

            print("Current activations:",  activations.tolist())
            print("Targets:", targets.tolist())
            
            # need to take an absolute value of difference - to aoid cancelling out
            total_error = np.sum(abs(activations - targets))
            print("Error:", total_error)
            if total_error == 0:
                print()
                break

            print()
            # update weights - each weight is updated separately            
            self.weights -= eta * np.dot(np.transpose(inputs), activations - targets)
            activations = self.forward(inputs)
        
        print("Final weights:", self.weights.tolist())
        print("Targets:", targets.tolist())
        print("Final predictions:", activations.tolist())

    
    def forward(self, inputs):
        """ Run the network forward """
        # Compute activations
        activations = np.dot(inputs, self.weights)

        # Threshold the activations with sign function
        return np.where(activations > 0, 1, 0)

    def confusion_matrix(self, inputs, targets):
        """Confusion matrix"""

        # Add the bias node
        inputs = np.concatenate((inputs, -np.ones((self.nData, 1))), axis=1)

        outputs = np.dot(inputs, self.weights)

        nClasses = np.shape(targets)[1]

        if nClasses == 1:
            nClasses = 2
            outputs = np.where(outputs > 0, 1, 0)
        else:
            # 1-of-N encoding
            outputs = np.argmax(outputs, 1)
            targets = np.argmax(targets, 1)

        cm = np.zeros((nClasses, nClasses))
        for i in range(nClasses):
            for j in range(nClasses):
                cm[i, j] = np.sum(np.where(outputs == i, 1, 0) * np.where(targets == j, 1, 0))

        print("Confusion matrix:")
        print(cm)
        print("Accuracy:", np.trace(cm) / np.sum(cm))



## 1.2. Learning concept of OR
<figure align="middle">
    <img src="images/perceptron_OR.png" title="Decision boundary for OR" width="400px">
    <figcaption>Fig 2. Perceptron can learn a decision boundary for OR data.</figcaption>
</figure>

In [74]:
a = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])

# set up the size of the input vector and initial random weights
p = Perceptron(a[:, 0:2], a[:, 2:])

Prediction before learning:

In [75]:
print("Before training:")
test = np.array([[0, 0]])
p.predict(test)
print()
test = np.array([[0.9, 0]])
p.predict(test)
print()
test = np.array([[0, 0.8]])
p.predict(test)

Before training:
Prediction for input [[0 0]] is [[0]]:

Prediction for input [[0.9 0. ]] is [[0]]:

Prediction for input [[0.  0.8]] is [[1]]:


Now learns from data:

In [76]:
p.train(a[:, 0:2], a[:, 2:], 0.05, 20)

Iteration:  1
Current weights: [[-0.04662414475094431], [0.04577975898086402], [0.01732915930833126]]
Current activations: [[0], [1], [0], [0]]
Targets: [[0], [1], [1], [1]]
Error: 2

Iteration:  2
Current weights: [[0.05337585524905569], [0.09577975898086402], [-0.08267084069166875]]
Current activations: [[1], [1], [1], [1]]
Targets: [[0], [1], [1], [1]]
Error: 1

Iteration:  3
Current weights: [[0.05337585524905569], [0.09577975898086402], [-0.03267084069166874]]
Current activations: [[1], [1], [1], [1]]
Targets: [[0], [1], [1], [1]]
Error: 1

Iteration:  4
Current weights: [[0.05337585524905569], [0.09577975898086402], [0.01732915930833126]]
Current activations: [[0], [1], [1], [1]]
Targets: [[0], [1], [1], [1]]
Error: 0

Final weights: [[0.05337585524905569], [0.09577975898086402], [0.01732915930833126]]
Targets: [[0], [1], [1], [1]]
Final predictions: [[0], [1], [1], [1]]


In [77]:
p.confusion_matrix(a[:, 0:2], a[:, 2:])

Confusion matrix:
[[1. 0.]
 [0. 3.]]
Accuracy: 1.0


Now can classify any input of OR:

In [78]:
print("After training:")
test = np.array([[0, 0]])
p.predict(test)
print()
test = np.array([[0.9, 0]])
p.predict(test)
print()
test = np.array([[0, 0.8]])
p.predict(test)

After training:
Prediction for input [[0 0]] is [[0]]:

Prediction for input [[0.9 0. ]] is [[1]]:

Prediction for input [[0.  0.8]] is [[1]]:


## 1.3. Learning concept of AND
<figure align="middle">
    <img src="images/perceptron_AND.png" title="Decision boundary for AND" width="400px">
    <figcaption>Fig 3. Perceptron can learn a decision boundary for AND data.</figcaption>
</figure>

In [90]:
a = np.array([[0, 0, 0], [0, 1, 0], [1, 0, 0], [1, 1, 1]])

# new perceptron: set up the size of the input vector and initial random weights
p = Perceptron(a[:, 0:2], a[:, 2:])

print("Before training:")
test = np.array([[0, 0]])
p.predict(test)
print()
test = np.array([[0.8, 0]])
p.predict(test)
print()
test = np.array([[0, 0.9]])
p.predict(test)

Before training:
Prediction for input [[0 0]] is [[1]]:

Prediction for input [[0.8 0. ]] is [[1]]:

Prediction for input [[0.  0.9]] is [[1]]:


In [91]:
p.train(a[:, 0:2], a[:, 2:], 0.05, 20)

Iteration:  1
Current weights: [[0.03133017968549076], [-0.004459860729580503], [-0.040396705223364554]]
Current activations: [[1], [1], [1], [1]]
Targets: [[0], [0], [0], [1]]
Error: 3

Iteration:  2
Current weights: [[-0.01866982031450924], [-0.054459860729580506], [0.10960329477663547]]
Current activations: [[0], [0], [0], [0]]
Targets: [[0], [0], [0], [1]]
Error: 1

Iteration:  3
Current weights: [[0.03133017968549076], [-0.004459860729580503], [0.059603294776635465]]
Current activations: [[0], [0], [0], [0]]
Targets: [[0], [0], [0], [1]]
Error: 1

Iteration:  4
Current weights: [[0.08133017968549076], [0.0455401392704195], [0.009603294776635463]]
Current activations: [[0], [1], [1], [1]]
Targets: [[0], [0], [0], [1]]
Error: 2

Iteration:  5
Current weights: [[0.03133017968549076], [-0.004459860729580503], [0.10960329477663547]]
Current activations: [[0], [0], [0], [0]]
Targets: [[0], [0], [0], [1]]
Error: 1

Iteration:  6
Current weights: [[0.08133017968549076], [0.045540139270419

In [92]:
p.confusion_matrix(a[:, 0:2], a[:, 2:])

Confusion matrix:
[[3. 0.]
 [0. 1.]]
Accuracy: 1.0


In [93]:
print("After training:")

test = np.array([[0, 0]])
p.predict(test)
print()
test = np.array([[0.8, 0]])
p.predict(test)
print()
test = np.array([[0, 0.9]])
p.predict(test)

After training:
Prediction for input [[0 0]] is [[0]]:

Prediction for input [[0.8 0. ]] is [[0]]:

Prediction for input [[0.  0.9]] is [[0]]:


## 1.5. Concept of exclusive OR (XOR)
<figure align="middle">
    <img src="images/perceptron_XOR.png" title="XOR is not linearly separable" width="400px">
    <figcaption>Fig 4. Decision boundary for $x_1$ XOR $x_2$.</figcaption>
</figure>

In [95]:
a = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0], [1, 1, 1]])

# new perceptron: set up the size of the input vector and initial random weights
p = Perceptron(a[:, 0:2], a[:, 2:])

test = np.array([[0, 0]]) 
print("Before training:")
p.predict(test)

Before training:
Prediction for input [[0 0]] is [[0]]:


In [96]:
p.train(a[:, 0:2], a[:, 2:], 0.05, 20)

Iteration:  1
Current weights: [[0.021673173059180956], [0.04324579911162746], [0.028458060169640295]]
Current activations: [[0], [1], [0], [1]]
Targets: [[1], [0], [0], [1]]
Error: 2

Iteration:  2
Current weights: [[0.021673173059180956], [-0.006754200888372544], [0.028458060169640295]]
Current activations: [[0], [0], [0], [0]]
Targets: [[1], [0], [0], [1]]
Error: 2

Iteration:  3
Current weights: [[0.07167317305918096], [0.04324579911162746], [-0.07154193983035971]]
Current activations: [[1], [1], [1], [1]]
Targets: [[1], [0], [0], [1]]
Error: 2

Iteration:  4
Current weights: [[0.021673173059180956], [-0.006754200888372544], [0.028458060169640295]]
Current activations: [[0], [0], [0], [0]]
Targets: [[1], [0], [0], [1]]
Error: 2

Iteration:  5
Current weights: [[0.07167317305918096], [0.04324579911162746], [-0.07154193983035971]]
Current activations: [[1], [1], [1], [1]]
Targets: [[1], [0], [0], [1]]
Error: 2

Iteration:  6
Current weights: [[0.021673173059180956], [-0.0067542008883

Perceptron cannot learn XOR: in 2 dimensions XOR vectors are not linearly-separable.


Copyright &copy; 2022 Marina Barsky. All rights reserved.