In [None]:
%matplotlib inline

# Perceptron Learning Algorithm

This is NOT covered in class, but it is in Hayken's book and Jim Keller's Introduction to Computational Intelligence book. 

I am adding this here to show you an early algorithm to learn the Perceptron (before backpropagation, evolutionary algorithms, etc.).

Below, **the assumption is that our two classes are separable**. If not, then refer to the Pocket Algorithm (https://en.wikipedia.org/wiki/Perceptron or https://direct.mit.edu/books/book/1919/chapter-abstract/52703/Perceptron-Learning-and-the-Pocket-Algorithm?redirectedFrom=fulltext).

First, how do we make a data set? (synthetic)

In [None]:
#include our libs
import numpy as np 
import matplotlib.pyplot as plt 

# lets make some normally distributed data
mean = [0, 0] # let the mean (aka where that center of the distribution is at!) be located at (0,0)
cov = [[1, 0], [0, 1]] # make the covariance matrix an identity matrix
                       # covariance matrix: https://en.wikipedia.org/wiki/Covariance_matrix
                       # identity matrix: https://en.wikipedia.org/wiki/Identity_matrix
                       # this says, varies 1 in the x direction and 1 in the y direction (its circular)
                       # and there is no "rotation" of our normal distribution 
# now, make that 2D (so (x,y)) data set of 100 points
# note: the T is for transpose (https://en.wikipedia.org/wiki/Transpose)
x, y = np.random.multivariate_normal(mean, cov, 100).T

# plot it!
plt.plot(x, y, 'x')
plt.axis('equal')
plt.show()

Now, lets make a two class data set, the above was just one Gaussian distribution

In [None]:
# class 1
c1_mean = [-2,-3] # notice that I moved the "center"
c1_cov = [[1, 0], [0, 1]]
c1_x = np.random.multivariate_normal(c1_mean, c1_cov, 100)

# class 2
c2_mean = [6,1] # same, different center
c2_cov = [[4, 0], [0, 4]] # this time, its bigger because 4 variance vs 1
c2_x = np.random.multivariate_normal(c2_mean, c2_cov, 100)

# plot it
plt.plot(c1_x[:,0], c1_x[:,1], 'bx')
plt.plot(c2_x[:,0], c2_x[:,1], 'ro')
plt.axis('equal')
plt.show()

OK, back to the Perceptron 

Lets pick an initial (random) weight vector and see what we get ...

In [None]:
from numpy import linalg as LA
import math

# make a random weight vector and bias
w = ( np.random.rand(2) - 0.5 ) # np.random.rand is in [0,1], so this line makes 
w = w / LA.norm(w) # LA.norm gives the magnitude, so / mag gives us a unit length vector
print(w)

# plot the weight vector (remember, decision boundary is perpendicular to it)
scaleweight = 3.0 # aka, controls how "big"/thick the lines are below that I draw
plt.plot([0,scaleweight*w[0]], [0,scaleweight*w[1]], 'k')
ww = np.zeros(2)
Rad = math.pi / 180.0
ww[0] = w[0] * math.cos(90.0 * Rad) - w[1] * math.sin(90.0 * Rad)
ww[1] = w[0] * math.sin(90.0 * Rad) + w[1] * math.cos(90.0 * Rad)
plt.plot([0,scaleweight*ww[0]], [0,scaleweight*ww[1]], 'g')

# plot the data set
plt.plot(c1_x[:,0], c1_x[:,1], 'bx')
plt.plot(c2_x[:,0], c2_x[:,1], 'ro')
plt.axis('equal')
plt.show()

# make data set
X = np.concatenate((c1_x, c2_x), axis=0) 
l1 = np.ones(c1_x.shape[0])
l2 = np.zeros(c2_x.shape[0])
L = np.concatenate((l1, l2), axis=0) 

OK, odds are that did not automatically come up with a Perceptron that separates your two classes

If it did, just re-run it, you want it to not work to start here ;-)

Now, lets run a Perceptron learning algorithm

In [None]:
%matplotlib inline
import time
import pylab as pl
from IPython import display

t = 0 # time step
learn_rate = 2 # learning rate

# put a max number of iterations
for k in range(50):
      
    MistakeDecisions = np.zeros(X.shape[0])
    DecisionSigns = np.zeros(X.shape[0])
    
    # fire each
    for j in range(X.shape[0]):
        v = np.dot(X[j,:],w)
        if(v >= 0): 
            DecisionSigns[j] = 1
        else:
            DecisionSigns[j] = 0
        if(L[j] != DecisionSigns[j]):
            MistakeDecisions[j] = 1
            
    # time to quit?
    if( np.sum(MistakeDecisions) == 0 ):
        print('Stopped at iteration ' + str(k))
        break
   
    # update 
    for j in range(X.shape[0]):
        if( MistakeDecisions[j] == 1 ):
            w = w + learn_rate * ((L[j] - DecisionSigns[j]) * X[j,:])
    
    t = t + 1
    
    # plot the decision boundary 
    ww = w / LA.norm(w)
    
    pl.clf()
    scaleweight = 3.0
    plt.plot([0,scaleweight*ww[0]], [0,scaleweight*ww[1]], 'k')

    www = np.zeros(2)
    Rad = math.pi / 180.0
    www[0] = ww[0] * math.cos(90.0 * Rad) - ww[1] * math.sin(90.0 * Rad)
    www[1] = ww[0] * math.sin(90.0 * Rad) + ww[1] * math.cos(90.0 * Rad)
    plt.plot([0,scaleweight*www[0]], [0,scaleweight*www[1]], 'g')    
    
    # plot the data set
    pl.plot(c1_x[:,0], c1_x[:,1], 'bx')
    pl.plot(c2_x[:,0], c2_x[:,1], 'ro')
    
    # animation, so pause it!
    display.clear_output(wait=True)
    display.display(pl.gcf())
    time.sleep(0.5)   
    
########################################################
########################################################

print('*****************************************')
print('Final Plot')
print('*****************************************')

# plot the decision boundary 
ww = w / LA.norm(w)
pl.clf()
scaleweight = 3.0
plt.plot([0,scaleweight*ww[0]], [0,scaleweight*ww[1]], 'k')

www = np.zeros(2)
Rad = math.pi / 180.0
www[0] = ww[0] * math.cos(90.0 * Rad) - ww[1] * math.sin(90.0 * Rad)
www[1] = ww[0] * math.sin(90.0 * Rad) + ww[1] * math.cos(90.0 * Rad)
plt.plot([0,scaleweight*www[0]], [0,scaleweight*www[1]], 'g')    
    
# plot the data set
pl.plot(c1_x[:,0], c1_x[:,1], 'bx')
pl.plot(c2_x[:,0], c2_x[:,1], 'ro')

# Q&A   

Address the following

1) If the patterns overlap, what happens?

2) Can you come up with another way to draw the decision surface?

3) Can you figure out how to include a bias term?