# Perceptron: application to the Iris dataset
## Luiz Fernando Rodrigues

Here I wanted to play around with the Iris dataser from sklearn. The idea is to perform a pairwise classification using perceptrons, and then joining everything in the end, with not unanimously classified point being set randomly. We evaluate $E_{in}$ according to Prof. Dr. Abu-Mostafa's course ("Learning from data").

### OBS: some variable names are in Portuguese

Step 1: importing libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()



# Questão 3
Este exercício trabalha com o dataset *Iris* do *sklearn*. O dataset apresenta 4 características com 3 diferentes classes, e um total de 150 pontos (50 por classe). O exercício é:
## Item a) 
Fazer um PLA do tipo *pocket* com critério de parada;
## Item b) 
Rodar o PLA para cada par de classes separadamente, avaliando $E_{in}$;
## Item c) 
Criar um novo classificador $g(\vec{x})$ que tenha como saída a moda dos três classificadores do item b) (caso seja única), ou um valor aleatório entre $\lbrace 0,1,2 \rbrace$ caso contrário. Como há um fator de aleatoriedade, rode 1000 experimentos para estimar $E_{in}$, comparando o resultado com o do item b).

### Os itens estão respondidos em conjunto abaixo.

In [2]:
# Save data and target features in variables
x = iris.data
y = iris.target

# Changing numpy's sign function, so 0 has output of 1
def sinal(arg):
    if np.sign(arg) == 1:
        return 1
    if np.sign(arg) == 0:
        return 1
    if np.sign(arg) == -1:
        return -1
    
# Creates a list of misclassified points
def wrong(x_set, y_set, w_handle):
    misclassified = []
    for j in range(0,100):
        if sinal(np.inner(w_handle,x_set[j])) != y_set[j]:
            misclassified.append(j)
    return misclassified

# Inserts the dummy variable x_0 = 1 so we can implement the perceptron
x = np.insert(x, 0, 0, axis = 1)

# We will use a 2D array to store the hypotheses of each perceptron procedure (since we will compare 0 to 1, 0 to 2 and 1 to 2)
w = np.zeros((3,np.size(x,axis=1)))
# We will also have an array of e_in values
e_in = np.zeros(3)

Our perceptrons will be of a _pocket_ style, preserving the best result throughout the procedure, and setting a limit on the number of iterations so we do not fall into endless loops in case of misclassified points.

### Defining our perceptron ###

In [3]:
def perceptron(x_set, y_set):
    k = 0 #iteration counter
    w_handle = np.zeros(np.size(x,axis=1)) #auxiliary variable
    misclassified_indexes = wrong(x_set, y_set, w_handle)
    # Initializing the number of wrong guesses, to assess whether I keep a w in the pocket or not
    n_misclassified = len(misclassified_indexes)
    while len(misclassified_indexes) > 0:
        index = np.random.choice(misclassified_indexes) # Choose a wrongly classified point randomly
        w_handle = w_handle + y_set[index]*x_set[index]
        k = k+1
        misclassified_indexes = wrong(x_set, y_set, w_handle)
        if len(misclassified_indexes) <= n_misclassified: # If we improve the result, update the pocket
            w_pocket = w_handle
            n_misclassified = len(misclassified_indexes) # Updates
        #Critério de parada: parar após 1000 iterações 
        if k > 1000:
            print('Stopped: limit of 1000 iterations.')
            break
    e_in_pocket = n_misclassified/100
    return w_pocket, e_in_pocket

### Perceptron 1: classes 0 and 1 ###

In [4]:
# Selects data of y being either 0 or 1, and converts 0 -> -1 so we can use the perceptron
y_set01 = y[y != 2]
x_set01 = x[:-50]

for i in range(0, len(y_set01)):
    if y_set01[i] == 0:
        y_set01[i] = -1

# The dataset is already ordered, so we do not need to filter anything

w[0], e_in[0] = perceptron(x_set01, y_set01)
print('The w vector for y = 0 and y = 1 is:')
print(w[0])
print('\n')
print('E_in (0-1) = ' + str(e_in[0]))

The w vector for y = 0 and y = 1 is:
[ 0.  -1.5 -6.   7.6  3.7]


E_in (0-1) = 0.0


### Perceptron 2: classes 0 and 2 ###

In [5]:
# Selects data of y being either 0 or 1, and converts 0 -> -1 so we can use the perceptron
y_set02 = y[y != 1]
x_set02 = np.zeros((100,5))

#Loop para tirar os termos do "meio" do dataset, de classe 1
for j in range(0,50):
    x_set02[j] = x[j]
    x_set02[j+50] = x[j+100]

for i in range(0,len(y_set02)):
    if y_set02[i] == 0:
        y_set02[i] = -1
    if y_set02[i] == 2:
        y_set02[i] = 1

# The dataset is already ordered, so we do not need to filter anything

w[1], e_in[1] = perceptron(x_set02, y_set02)
print('The w vector for y = 0 and y = 2 is:')
print(w[1])
print('\n')
print('E_in (0-2) = ' + str(e_in[1]))

The w vector for y = 0 and y = 2 is:
[ 0.  -2.5 -4.2  5.8  2.9]


E_in (0-2) = 0.0


### Perceptron 3: classes 1 and 2 ###

In [6]:
#Seleciona os dados de com y sendo 1 ou 2, e converte 1 -> -1 e 2 -> 1 para poder usar o PLA
y_set12 = y[y != 0]
x_set12 = x[50:]

for i in range(0, len(y_set12)):
    if y_set12[i] == 1:
        y_set12[i] = -1
    if y_set12[i] == 2:
        y_set12[i] = 1
        
# The dataset is already ordered, so we do not need to filter anything

w[2], e_in[2] = perceptron(x_set12, y_set12)
print('The w vector for y = 1 and y = 2 is:')
print(w[2])
print('\n')
print('E_in (1-2) = ' + str(e_in[2]))

Stopped: limit of 1000 iterations.
The w vector for y = 1 and y = 2 is:
[  0.  -67.7 -69.7  91.5 107.5]


E_in (1-2) = 0.03


### Observation

Since for the two first perceptrons we had no misclassified points, we can infer that the data is linearly separable between 0 and 1, and 0 and 2. Things get more complicated between 1 and 2, but we can safely say the the data of class 0 is more clearly defined for this feature space. Also, we have only $E_{in,1-2} = 3\%$, so it is also not the end of the world. We can repeat the process to check whether the misclassified points are the same.

In [7]:
# We will create a class array for each classifier (perceptron for each class pair)
# Class pairs: 0,1 ; 0,2 ; 1,2
y01 = np.zeros(150)
y02 = np.zeros(150)
y12 = np.zeros(150)

# Calculates the classification and maps -1 and 1, according to each case, to 0, 1 or 2
for i in range(0,150):
    y01[i] = sinal(np.inner(w[0],x[i]))
    if y01[i] == -1:
        y01[i] = 0
    elif y01[i] == 1:
        y01[i] = 1
    y02[i] = sinal(np.inner(w[1],x[i]))
    if y02[i] == -1:
        y02[i] = 0
    elif y02[i] == 1:
        y02[i] = 2
    y12[i] = sinal(np.inner(w[2],x[i]))
    if y12[i] == -1:
        y12[i] = 1
    elif y12[i] == 1:
        y12[i] = 2

# Calculates the new classifier, finds E_in and repeats the process 1000 times
e_in_multiple = np.zeros(1000)
for i in range(0,1000):
    g = np.zeros(150) # New classification vector
    for j in range(0,150):
        # Finds the mode between the classifiers
        if (y01[j] == y02[j]):
            g[j] = y01[j]
        elif (y01[j] == y12[j]):
            g[j] = y12[j]
        elif (y02[j] == y12[j]):
            g[j] = y02[j]
        # If there is no mode, it just randomly picks a class
        else:
            g[j] = int(3*np.random.random_sample())
            #print('Random pick for point ' + str(j) + ' in repeat number ' + str(i)) # Tells when it had to pick randomly
        if (g[j] != y[j]):
            e_in_multiple[i] = e_in_multiple[i] + 1
            # Out of curiosity, let's verify which points are wrong to see if those are the same from the perceptron (case 1-2)
            # And we will do it for three repeats (the first, the last and the middle one), to check if it is the same
            if i == 0 or i == 500 or i == 999:
                print('Misclassified point in repetition ' + str(i))
                print(x[j])
                print(y[j])
                print('')

print('Mean of misclassified points: ' + str(np.mean(e_in_multiple)))
print('Standard deviation: ' + str(np.std(e_in_multiple)))
unique_e_in = np.unique(e_in_multiple, return_counts=True)
print('Unique values: ' + str(unique_e_in[0]))
print('Unique counts: ' + str(unique_e_in[1]))
print()

Misclassified point in repetition 0
[0.  5.9 3.2 4.8 1.8]
1

Misclassified point in repetition 0
[0.  6.3 2.5 4.9 1.5]
1

Misclassified point in repetition 0
[0.  6.  2.7 5.1 1.6]
1

Misclassified point in repetition 500
[0.  5.9 3.2 4.8 1.8]
1

Misclassified point in repetition 500
[0.  6.3 2.5 4.9 1.5]
1

Misclassified point in repetition 500
[0.  6.  2.7 5.1 1.6]
1

Misclassified point in repetition 999
[0.  5.9 3.2 4.8 1.8]
1

Misclassified point in repetition 999
[0.  6.3 2.5 4.9 1.5]
1

Misclassified point in repetition 999
[0.  6.  2.7 5.1 1.6]
1

Mean of misclassified points: 3.0
Standard deviation: 0.0
Unique values: [3.]
Unique counts: [1000]



### Discussion
As we can see above, two to three points were misclassified by our sequential perceptron approach, and basically the same three points. Considering how simple our approach was on classifying, this 2% error (3/150) indicates that the data is mostly linearly separable.