# The Perceptron Network

__The network consists of:__
- input nodes, i.e., the dimension (number of features) in the input vector
- The neurons
    - $w_{ij}$, where the $i$ runs over the number of features and $j$ runs over the number of neurons.
    - So $w_{32}$ is the weight that connects input node 3 to neuron 2
    - activation of each neuron $j$ using activation function $g$. $  $  $g = \Sigma_{i=0}^{m}(w_{ij}x_{i})$
    - Loss function $\frac{1}{2}(y_{j}-t_{j})^{2}$, where $y = w \cdot x $

__How learning occurs?__
- We use rule for updating a weight $w_{ij}$ as follows: $w_{ij} = w_{ij} −η(y_{j} −t_{j})·x_{i}$, <br>
- where $(y_{j}−t_{j})·x_{i}$ is the gradient of the loss function.

<img src="perceptron.png" width=500 height=500 />

#### What is the Bias Input?
- To deal with the case where all of the inputs to a neuron are zero

# The Perceptron Learning Algorithm

<img src="pAlgorithm.png" width=500 height=500 />

#### The Learning Rate η
- the value of the learning rate decides how fast the network learns
- the parameter η controlling how much to change the weights by. 
- setting it to 1, the weights change a lot whenever there is a wrong answer, which tends to make the network unstable, so that it never settles down.
- The cost of having a small learning rate is that the weights need to see the inputs more often before they change significantly, so that the network takes longer to learn

## An Example of Perceptron Learning: Logic Functions (the logical OR)
### Linear Separability

<img src="or.png" width=500 height=500 />

## Implementation

In [84]:
import numpy as np

# Make some training data

# OR Function
#a = np.array([[0,0,0],[0,1,0],[1,0,0],[1,1,1]])

# XOR Function
a = np.array([[0,0,0],[0,1,1],[1,0,1],[1,1,0]])

X = a[:,0:2]
T = a[:,2:]

nSamples = X.shape[0]
nOutputs = T.shape[1]

In [85]:
X.shape

(4, 2)

In [86]:


# Set parameters of neural network
eta = 0.25

# Initialize weights to uniformly distributed values between small normally-distributed between -0.1 and 0.1
W = 0.1 * 2 * (np.random.uniform(size=(1 + X.shape[1], nOutputs)) - 0.5)

# Add constant column of 1's
def addOnes(A):
    return np.insert(A, 0, 1, axis=1)

X1 = addOnes(X)


# Take nSteps steepest descent steps in gradient descent search
nSteps = 30

for step in range(nSteps):

    # Forward pass on training data
    Y = X1 @ W 
    Y = np.where(Y>0,1,0)

    # Error in output
    error = T - Y

    W = W + eta * X1.T @ error
    
Ytest = np.where((X1 @ W) > 0,1,0)  #!! Forward pass in one line

print('Accuracy = ', (sum(Ytest==T)/nSamples)[0])


Accuracy =  0.5


## A decision boundary separating two classes of data

<img src="bp.png" width=400 height=400 />

## Different decision boundaries computed by a Perceptron with four neurons

<img src="mcp.png" width=400 height=400 />

### The Exclusive Or (XOR) Function (non-linear function)

<img src="xor.png" width=500 height=500 />