# Introduction to neural nets

The input is a set of feature vectors and the known answers for each vector - targets. Neural nets take input data that is observable, recordable, and by extension knowable and transform it into a logical model that can predict any future data.

The neural network is simply a one or more weights which can be multiplied by the input vector to output a prediction or a class label. Thus, neural nets are good both for the prediction and for the classification. 

Is the prediction always correct? No, the network can make mistakes. But it can learn from its mistakes.

How does the network learn?

Trial and error! First, it tries to make a prediction. Then, it sees whether it was too high or too low. Finally, it changes the weight (up or down) to predict more
accurately the next time it sees the same input. This can be thought of as a **predict-compare-learn** paradigm. 

## 1. Perceptron

We will demonstrate this idea with the simplest possible neural network which contains a single neuron. This is called a *perceptron*. 

A feature vector is shown to this neuron. For each feature there is a weight, and the network predicts a target variable by multiplying each feature value by its weight and producing a weighted sum. 

For a simple case of binary classification, we will use an activation function *sign*: if the weighted sum is positive the neuron predicts class yes(1), and if the sum is negative it predicts no(0).

<figure>
    <img src="images/perceptron.png" title="Simple Perceptron for 2D feature vector" width="400px">
    <figcaption>Fig 1. Simple perceptron for classifying a vector of 2 features. Tranforms a weighted sum of features into a class using the <i>sign</i> function.</figcaption>
</figure>

## 1.1. Implementation of a simple Perceptron

In [None]:
import numpy as np


class Perceptron:
    """ A basic Perceptron"""

    def __init__(self, inputs, targets):
        """ Constructor - setups dimensions and initializes weights"""
        # Set up network size
        if np.ndim(inputs) > 1:
            self.nIn = np.shape(inputs)[1]
        else:
            self.nIn = 1

        if np.ndim(targets) > 1:
            self.nOut = np.shape(targets)[1]
        else:
            self.nOut = 1

        self.nData = np.shape(inputs)[0]

        # Initialise network weights - random guess
        self.weights = np.random.rand(self.nIn + 1, self.nOut) * 0.1 - 0.05

        
    # Use this to predict target for a given feature vector 'point'
    def predict(self, point):
        input_with_bias = np.concatenate((point, -np.ones((1, 1))), axis=1)
        activations = self.forward(input_with_bias)
        print("Prediction for input {} is {}:".format(point, activations))

        
    # Train network
    def train(self, inputs, targets, eta, nIterations):
        """ Train the thing """
        # For each feature vector add one bias node
        inputs = np.concatenate((inputs, -np.ones((self.nData, 1))), axis=1)

        # Training: activation function - sign       
        activations = self.forward(inputs)
        for n in range(nIterations):
            print("Iteration: ", n+1)
            print("Current weights:", self.weights.tolist())

            print("Current activations:",  activations.tolist())
            print("Targets:", targets.tolist())
            print()

            total_error = np.sum(activations - targets)
            if total_error == 0:
                break

            # update weights
            self.weights -= eta * np.dot(np.transpose(inputs), activations - targets)
            activations = self.forward(inputs)

        print("Targets:", targets.tolist())
        print("Final predictions:", activations.tolist())

    
    def forward(self, inputs):
        """ Run the network forward """
        # Compute activations
        activations = np.dot(inputs, self.weights)

        # Threshold the activations
        return np.where(activations > 0, 1, 0)

    def confusion_matrix(self, inputs, targets):
        """Confusion matrix"""

        # Add the bias node
        inputs = np.concatenate((inputs, -np.ones((self.nData, 1))), axis=1)

        outputs = np.dot(inputs, self.weights)

        nClasses = np.shape(targets)[1]

        if nClasses == 1:
            nClasses = 2
            outputs = np.where(outputs > 0, 1, 0)
        else:
            # 1-of-N encoding
            outputs = np.argmax(outputs, 1)
            targets = np.argmax(targets, 1)

        cm = np.zeros((nClasses, nClasses))
        for i in range(nClasses):
            for j in range(nClasses):
                cm[i, j] = np.sum(np.where(outputs == i, 1, 0) * np.where(targets == j, 1, 0))

        print("Confusion matrix:")
        print(cm)
        print("Accuracy:", np.trace(cm) / np.sum(cm))



## 1.2. Learning concept of OR
<figure align="middle">
    <img src="images/perceptron_OR.png" title="Decision boundary for OR" width="400px">
    <figcaption>Fig 2. Perceptron can learn a decision boundary for OR data.</figcaption>
</figure>

In [None]:
a = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])

# set up the size of the input vector and initial weights
p = Perceptron(a[:, 0:2], a[:, 2:])

Prediction before learning:

In [None]:
print("Before training:")
test = np.array([[0, 0]])
p.predict(test)
print()
test = np.array([[5, 0]])
p.predict(test)
print()
test = np.array([[0, 3]])
p.predict(test)

Now learns from data:

In [None]:
p.train(a[:, 0:2], a[:, 2:], 0.25, 10)

In [None]:
p.confusion_matrix(a[:, 0:2], a[:, 2:])

Now can classify any input of OR:

In [None]:
print("After training:")
test = np.array([[0, 0]])
p.predict(test)
print()
test = np.array([[5, 0]])
p.predict(test)
print()
test = np.array([[0, 3]])
p.predict(test)

## 1.3. Learning concept of AND
<figure align="middle">
    <img src="images/perceptron_AND.png" title="Decision boundary for AND" width="400px">
    <figcaption>Fig 3. Perceptron can learn a decision boundary for AND data.</figcaption>
</figure>

In [None]:
a = np.array([[0, 0, 0], [0, 1, 0], [1, 0, 0], [1, 1, 1]])

test = np.array([[0, 1]]) 
print("Before training:")
p.predict(test)

In [None]:
p.train(a[:, 0:2], a[:, 2:], 0.25, 10)

In [None]:
p.confusion_matrix(a[:, 0:2], a[:, 2:])

In [None]:
print("After training:")
p.predict(test)

## 1.4. Concept of $x_1$ AND NOT $x_2$

<figure align="middle">
    <img src="images/perceptron_AND_NOT.png" title="Decision boundary for AND NOT" width="200px">
    <figcaption>Fig 4. Perceptron should be able to learn a decision boundary for $x_1$ AND NOT $x_2$.</figcaption>
</figure>

## 1.5. Concept of exclusive OR (XOR)
<figure align="middle">
    <img src="images/perceptron_XOR.png" title="XOR is not linearly separable" width="400px">
    <figcaption>Fig 5. Decision boundary for $x_1$ XOR $x_2$.</figcaption>
</figure>

In [None]:
a = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0], [1, 1, 1]])

test = np.array([[0, 0]]) 
print("Before training:")
p.predict(test)

In [None]:
p.train(a[:, 0:2], a[:, 2:], 0.25, 10)

Perceptron cannot learn XOR: in 2 dimensions XOR vectors are not linearly-separable.


Copyright &copy; 2020 Marina Barsky. All rights reserved.