<center>
<h1>Mustererkennung und Machine Learning</h1>

<h3> Wintersemester 2017/2018, 6th Exercise Sheet</h3>
<h4>Konstantin Jaehne, Luis Herrmann; Dozent: Raúl Rojas</h4>

<hr style='height:1px'>
</center>

# Exercise 1

First of all, we read in the data required for training and testing. The data samples are contain five features and the flower name as class identifier, where the name can take 3 values. We read the data line by line from file, separating into 3 different lists according to class identifier. Then, for each list, we perform a random permutation of the element order and select 80% of the data samples for training, 20% for testing.

In [1]:
%matplotlib inline
import sys
import random as rd
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

folderpath = '../'
filename = 'iris.data'
data = pd.read_csv(folderpath+filename, header=None).as_matrix()
data_isetosa = []
data_iversicolor = []
data_ivirginica = []
for sample in data:
    if(sample[4] == 'Iris-setosa'):
        data_isetosa.append(np.insert(np.array(sample[0:4], dtype=float),0,1))
    elif(sample[4] == 'Iris-versicolor'):
        data_iversicolor.append(np.insert(np.array(sample[0:4], dtype=float),0,1))
    elif(sample[4] == 'Iris-virginica'):
        data_ivirginica.append(np.insert(np.array(sample[0:4], dtype=float),0,1))

rdindices = rd.sample(range(len(data_isetosa)), len(data_isetosa))
setosa_train = np.vstack(data_isetosa)[rdindices[:int(0.8*len(data_isetosa))]]
setosa_test = np.vstack(data_isetosa)[rdindices[int(0.8*len(data_isetosa)):]]

rdindices = rd.sample(range(len(data_iversicolor)), len(data_iversicolor))
versicolor_train = np.vstack(data_iversicolor)[rdindices[:int(0.8*len(data_isetosa))]]
versicolor_test = np.vstack(data_iversicolor)[rdindices[int(0.8*len(data_isetosa)):]]

rdindices = rd.sample(range(len(data_ivirginica)), len(data_ivirginica))
virginica_train = np.vstack(data_ivirginica)[rdindices[:int(0.8*len(data_isetosa))]]
virginica_test = np.vstack(data_ivirginica)[rdindices[int(0.8*len(data_isetosa)):]]

We then define a binary perceptron that works as specified in the lecture:

In [2]:
class BinaryPerceptron:
    def __init__(self, data=None, labels=None, itlimit=40, keepbest=False, metric=0):
        """
        data:
            Type: List/Tuple
            A list/tuple (l1,l2), where l1 and l2 are lists containing vectors assigned to class 1 and 2, respectively.
        labels (optinal):
            Type: List/Tuple
            A list/tuple (i1,i2) of two printable identifiers associated with each class
        itlimit (optional):
            Type: Integer
            Maximum number of iterations, before training is forcefully terminated.
        keepbest (optional):
            Type: Boolean
            If specified, during training, the algorithm will store the best perceptron vector after each iteration and
            once training is complete, keep the best perceptron in terms of correctly classiifed training samples.
        metric (optional):
            Type: Integer
            If specified, will determine the metric to be used for determining what perceptron is best.
            0: Highest number of correct classifications
            1: Smallest total angle sum
        """
        if(len(labels) == 2):
            self.labels = labels
        else:
            raise Exception('You must provide exactly two labels!')
        if(len(data) != 2):
            raise Exception('Data does not follow specified format!')
        if(len(data[0]) == 0 or len(data[1]) == 0):
            raise Exception('Data lists must contain at least one sample!')
        if(not(data is None)):
            self.train(data, itlimit=itlimit, keepbest=keepbest)
            
    def train(self, data, itlimit=40, keepbest=False, metric=0):
        def terminationCheck():
            v = np.dot(self.w, self.PN.transpose())
            return(all(v[:self.Plen] >= 0) and all(v[self.Plen:] < 0))
        def updateBest():
            if(metric == 0):
            #The vector is best when the total sum of angles becomes minimal
                if(np.sum(np.dot(self.w, self.PN.transpose())) < np.sum(np.dot(self.w_best, self.PN.transpose()))):
                    self.w_best = self.w
            elif(metric == 1):
            #The perceptron is better when the number of correct classifications is higher:
                v = np.dot(self.w, self.PN.transpose())
                v_best = np.dot(self.w_best, self.PN.transpose())
                if(sum(v[:self.Plen]) >= 0) + sum(v[self.Plen:] < 0) > sum(v_best[:self.Plen] >= 0) + sum(v[self.Plen:]< 0):
                    self.w_best = self.w
        
        self.Plen = len(data[0])
        self.Nlen = len(data[1])
        self.PN = np.vstack([data[0], data[1]])
        self.PN = self.PN / np.sum(self.PN, axis=1)[:,np.newaxis]
        self.w = np.random.rand(len(self.PN[0]))
        if(keepbest):
            self.w_best = self.w
        t = 0
        while(t < itlimit):
            selectionInt = rd.randint(0, len(self.PN)-1)
            xw = np.dot(self.PN[selectionInt], self.w)
            if(selectionInt < self.Plen):
                if(xw < 0):
                    self.w += self.PN[selectionInt]
                    t += 1
                    if(keepbest):
                        updateBest()
                    sys.stdout.write(str(t) + ' iterations.\r')
                else:
                    if(terminationCheck()):
                        break
            else:
                if(xw >= 0):
                    self.w -= self.PN[selectionInt]
                    t += 1
                    if(keepbest):
                        updateBest()
                    sys.stdout.write(str(t) + ' iterations.\r')
                else:
                    if(terminationCheck()):
                        break
        
        if(keepbest):
               self.w = self.w_best
        sys.stdout.write('\nTraining complete!\n')
                    
    def classify(self, x):
        return(self.labels[int(np.dot(self.w,x) < 0)])

In [3]:
C_setosa_versicolor = BinaryPerceptron([setosa_train, versicolor_train], labels = ('Iris-setosa','Iris-versicolor'))
C_setosa_virginica = BinaryPerceptron([setosa_train, virginica_train] ,labels= ('Iris-setosa', 'Iris-virginica'))

1 iterations.2 iterations.3 iterations.4 iterations.5 iterations.6 iterations.7 iterations.8 iterations.9 iterations.10 iterations.11 iterations.12 iterations.
Training complete!
1 iterations.2 iterations.3 iterations.4 iterations.
Training complete!


In [4]:
from IPython.display import display, HTML
def runtest(C, data_Class0, data_Class1):
    confusionMat = np.zeros([2, 2], dtype=int)
    for sample in data_Class0:
        confusionMat[0, int(C.classify(sample) != C.labels[0])] += 1
    for sample in data_Class1:
        confusionMat[1, int(C.classify(sample) == C.labels[1])] += 1
    print('-The confusion matrix is given by:')
    html = pd.DataFrame(confusionMat,index=C.labels, columns=C.labels).to_html()
    display(HTML(html))
    print('-The error rate is: ' + str(1-sum(np.diag(confusionMat))/(len(data_Class0) + len(data_Class1))) + '\n')

In [5]:
runtest(C_setosa_versicolor, setosa_test, versicolor_test)
runtest(C_setosa_virginica, setosa_test, virginica_test)

-The confusion matrix is given by:


Unnamed: 0,Iris-setosa,Iris-versicolor
Iris-setosa,10,0
Iris-versicolor,0,10


-The error rate is: 0.0

-The confusion matrix is given by:


Unnamed: 0,Iris-setosa,Iris-virginica
Iris-setosa,10,0
Iris-virginica,0,10


-The error rate is: 0.0



# Exercise 2

In order to accomplish the second exercise, we already expanded the BinaryPerceptron class definition. Note that since the data is non-separable, the algorithm will not converge and setting an iteration limit is thus obligatory. Furthermore, note that one has to set considerably higher iteration numbers before finding a perceptron vector that separates the data well enough. Using a limit of 900 will rarely yield results with error rates lower that 0.3, sometimes finding error rates as low as 0.0, but this depends on the initial data separation and on the inital random perceptron guess.

In [6]:
C_versicolor_virginica = BinaryPerceptron([versicolor_train, virginica_train], ('Iris-versicolor','Iris-virginica'), 900, True)
runtest(C_versicolor_virginica, versicolor_test, virginica_test)

900 iterations.
Training complete!
-The confusion matrix is given by:


Unnamed: 0,Iris-versicolor,Iris-virginica
Iris-versicolor,9,1
Iris-virginica,0,10


-The error rate is: 0.05



As a further observation, for this data set, whether we use lowest total angle sum or highest number of correct classifications did not appear to have any effect on the quality of classification. Here an example:

In [7]:
C_versicolor_virginica.train([versicolor_train, virginica_train], 900, True, 1)
runtest(C_versicolor_virginica, versicolor_test, virginica_test)

411 iterations.
Training complete!
-The confusion matrix is given by:


Unnamed: 0,Iris-versicolor,Iris-virginica
Iris-versicolor,9,1
Iris-virginica,1,9


-The error rate is: 0.1

